java - Is there a way to manipulate partial HTML pages using JSoup -


i developing utility where, have traverse through set of html files , manipulate them.

jsoup wonderful job in parsing , manipulating files complete (i.e. have <html> ... </html> tags).

however partial pages i.e. page wound contain markup like,

<div id="leftnav">...</div> 

it parses correctly when doc.tostring() or doc.outerhtml() called, returns full html (it wraps partial html content in <html> <body> ... </body> </html> tags)

this problem me, can please let me know if such api there in jsoup not sanitize / clean html content in such manner ?

thanks.

you can use xml parser:

create new xml parser. parser assumes no knowledge of incoming tags , not treat html, rather creates simple tree directly input.

in other words: doesn't create typical html structure (html, body, head etc.) , takes input is.

here's how use it:

// using connect() document doc = jsoup.connect("<url>").parser(parser.xmlparser()).get();  // using parse() document doc = jsoup.parse("<html>", "<base url>", parser.xmlparser()); 

Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -