perl - Regex: Match text from nested parenthesis. -
how regex give me output
se,dc(fr(lo)),km(ji)(hn),...
string az(se)(dc(fr(lo)))(km(ji)(hn))...
could tell me how write regex obtaining text between parantheses can achieve result 1 above without using external package/library learning purpose.
this quite classic example recursive regex:
\(((?:[^()]++|\((?1)\))*+)\)
explanation
let break down regex:
\( # literal ( ( # start of capturing group 1 (?: # start of non-capturing group [^()]++ # match characters other () | # or \((?1)\) # recursively match bracketed () content )*+ # end of non-capturing group, , repeat whole group 0 or more times. ) # end of capturing group 1 \) # literal )
the 2 literal brackets ()
@ beginning , end make sure match text inside bracket. without them, instead match portions of text balanced brackets.
the (?:[^()]++|\((?1)\))*+
part describes pattern inside pair of brackets:
- there can sequences of non-bracket
()
characters - or bracketed
(...)
portion, starts(
, followed(?:[^()]++|\((?1)\))*+
(due effect of(?1)
subroutine call) , ends)
.
and there can 0 or many instances of non-bracket sequences , bracketed (...)
portions interleaved each other.
the (?1)
called subroutine call, allows match sub-pattern delimited capturing groups. in case, since (?1)
inside capturing group 1, creates recursive effect.
demo
my $str = "az(se)(dc(fr(lo)))(km(ji)(hn))(()aaa(()())(ff(dd)aa))"; @arr = $str =~ /\(((?:[^()]++|\((?1)\))*+)\)/g; print join("\n", @arr)
output
se dc(fr(lo)) km(ji)(hn) ()aaa(()())(ff(dd)aa)
Comments
Post a Comment