html - Java Http Request that only returns certain elements I want -
is there method in java make http request webpage response specific elements want instead of whole document?
for example, if request <div>
called "example"
, response element , not rest of fluff exists on page, not need.
most methods looked at, involve getting entire html page , parsing it. want @ page , pluck out div want , have response. pages dealing contain lot of advert content want ignore.
http has nothing content of page, protocol governs server requests , responses.
i understand want do, you've asked wrong question. don't worry http, protocol governs server requests , responses (get, put, post, head, options).
the problem describing can handled after retrieval of content completed. need working document object model (dom) foundation of xml , xhtml. means need familiarize dom, , maybe xpath , xsl well.
the functionality asking can implemented in many ways, boils down sequence of non-trivial operations:
- retrieve page content url (including negotiating encodings, http redirects , protocol changes).
- clean non-well-formed content (i.e., unclosed or improperly nested tags, e.g., using jtidy).
- parse page content dom.
- traverse dom find nodes interested in (e.g., via dom or xpath).
- build output dom (e.g. via org.w3c.dom classes).
- write output dom file (combination of java.io , org.w3c.dom).
while possible implement scratch, there few open source projects have functionality, try jsoup: java html parser.
Comments
Post a Comment