Powershell Regex to replace XML tag values -


i'm trying parse following xml file using powershell without loading xml document using [xml] since document contain errors.

<data>   <company>walter & cooper</company>   <contact_name>patrick o'brian</contact_name> </data> 

to load document need fix errors replacing special characters follows

& &amp; < &lt; ' &apos; etc.. 

i know find , replace characters in document

(get-content $filename) | foreach-object {   $_-replace '&', '&amp;' `     -replace "'", "&apos;" `     -replace '"', '&quot;'} | set-content $filename 

but replace characters everywhere in file, i'm interest in checking characters inside xml tags <company> , replacing them xml safe entities resultant text valid document can load using [xml].

something should work each character need replace:

$_-replace '(?<=\w)(&)(?=.*<\/.*>)', '&amp' `   -replace '(?<=\w)(')(?=.*<\/.*>)', '&apos;' `   -replace '(?<=\w)(")(?=.*<\/.*>)', '&quot;' `   -replace '(?<=\w)(>)(?=.*<\/.*>)', '&gt;' `   -replace '(?<=\w)(\*)(?=.*<\/.*>)', '&lowast;' } | set-content $filename 

which positive look-behind non-word character, capturing group followed positive look-ahead.

examples:

updated: http://regex101.com/r/ay8iv3 | original: http://regex101.com/r/yo7wb1


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -