Powershell Regex to replace XML tag values -
i'm trying parse following xml file using powershell without loading xml document using [xml] since document contain errors.
<data> <company>walter & cooper</company> <contact_name>patrick o'brian</contact_name> </data>
to load document need fix errors replacing special characters follows
& & < < ' ' etc..
i know find , replace characters in document
(get-content $filename) | foreach-object { $_-replace '&', '&' ` -replace "'", "'" ` -replace '"', '"'} | set-content $filename
but replace characters everywhere in file, i'm interest in checking characters inside xml tags <company> , replacing them xml safe entities resultant text valid document can load using [xml].
something should work each character need replace:
$_-replace '(?<=\w)(&)(?=.*<\/.*>)', '&' ` -replace '(?<=\w)(')(?=.*<\/.*>)', ''' ` -replace '(?<=\w)(")(?=.*<\/.*>)', '"' ` -replace '(?<=\w)(>)(?=.*<\/.*>)', '>' ` -replace '(?<=\w)(\*)(?=.*<\/.*>)', '∗' } | set-content $filename
which positive look-behind non-word character, capturing group followed positive look-ahead.
examples:
updated: http://regex101.com/r/ay8iv3 | original: http://regex101.com/r/yo7wb1
Comments
Post a Comment