html - Java Http Request that only returns certain elements I want -


is there method in java make http request webpage response specific elements want instead of whole document?

for example, if request <div> called "example", response element , not rest of fluff exists on page, not need.

most methods looked at, involve getting entire html page , parsing it. want @ page , pluck out div want , have response. pages dealing contain lot of advert content want ignore.

http has nothing content of page, protocol governs server requests , responses.

i understand want do, you've asked wrong question. don't worry http, protocol governs server requests , responses (get, put, post, head, options).

the problem describing can handled after retrieval of content completed. need working document object model (dom) foundation of xml , xhtml. means need familiarize dom, , maybe xpath , xsl well.

the functionality asking can implemented in many ways, boils down sequence of non-trivial operations:

  1. retrieve page content url (including negotiating encodings, http redirects , protocol changes).
  2. clean non-well-formed content (i.e., unclosed or improperly nested tags, e.g., using jtidy).
  3. parse page content dom.
  4. traverse dom find nodes interested in (e.g., via dom or xpath).
  5. build output dom (e.g. via org.w3c.dom classes).
  6. write output dom file (combination of java.io , org.w3c.dom).

while possible implement scratch, there few open source projects have functionality, try jsoup: java html parser.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -