c# - Using HTMLAgilityPack Extract text, which is not between tags and comes after specific node -
html code:
<b> car </b> <br></br> car can drive. <br></br> <br></br>
c# code:
htmlagilitypack.htmldocument doc = new htmlweb().load("http://website.com/x.html"); if (doc != null) { htmlnode link = doc.documentnode.selectsinglenode("//b[contains(text(), 'car')]"); webbrowser1.documenttext = link.innertext; webbrowser1.allownavigation = true; webbrowser1.scripterrorssuppressed = true; webbrowser1.visible = true; }
what manage get: car
i need get:
car
car can drive.
any suggestions? have tried adding next nodes, gave nullreferenceexceptions : "//b[contains(text(), 'car')/br]" , "//b[contains(text(), 'car')/br/br]"
thanks in advance. ps.i avoid regex..
xpath case-sensitive (see here more on this: is possible ignore case using xpath , c#? ) plus second phrase contains 'car' not child b element. have work this:
htmldocument doc = new htmlweb().load("http://website.com/x.html"); foreach (htmlnode node in doc.documentnode.selectnodes("//text()[contains(translate(., 'abcdefghijklmnopqrstuvwxyz', 'abcdefghijklmnopqrstuvwxyz'), 'car')]")) { console.writeline(node.innertext); }
in console application, output this:
car car can drive.
Comments
Post a Comment