java - Is there a way to manipulate partial HTML pages using JSoup -
i developing utility where, have traverse through set of html files , manipulate them.
jsoup wonderful job in parsing , manipulating files complete (i.e. have <html> ... </html>
tags).
however partial pages i.e. page wound contain markup like,
<div id="leftnav">...</div>
it parses correctly when doc.tostring()
or doc.outerhtml()
called, returns full html (it wraps partial html content in <html> <body> ... </body> </html>
tags)
this problem me, can please let me know if such api there in jsoup not sanitize / clean html content in such manner ?
thanks.
you can use xml parser:
create new xml parser. parser assumes no knowledge of incoming tags , not treat html, rather creates simple tree directly input.
in other words: doesn't create typical html structure (html, body, head etc.) , takes input is.
here's how use it:
// using connect() document doc = jsoup.connect("<url>").parser(parser.xmlparser()).get(); // using parse() document doc = jsoup.parse("<html>", "<base url>", parser.xmlparser());
Comments
Post a Comment