perl - Regex: Match text from nested parenthesis. -


how regex give me output

se,dc(fr(lo)),km(ji)(hn),... string az(se)(dc(fr(lo)))(km(ji)(hn))...

could tell me how write regex obtaining text between parantheses can achieve result 1 above without using external package/library learning purpose.

this quite classic example recursive regex:

\(((?:[^()]++|\((?1)\))*+)\) 

explanation

let break down regex:

\(              # literal ( (               # start of capturing group 1   (?:           # start of non-capturing group      [^()]++    # match characters other ()      |          # or      \((?1)\)   # recursively match bracketed () content   )*+           # end of non-capturing group, , repeat whole group 0 or more times. )               # end of capturing group 1 \)              # literal ) 

the 2 literal brackets () @ beginning , end make sure match text inside bracket. without them, instead match portions of text balanced brackets.

the (?:[^()]++|\((?1)\))*+ part describes pattern inside pair of brackets:

  • there can sequences of non-bracket () characters
  • or bracketed (...) portion, starts (, followed (?:[^()]++|\((?1)\))*+ (due effect of (?1) subroutine call) , ends ).

and there can 0 or many instances of non-bracket sequences , bracketed (...) portions interleaved each other.

the (?1) called subroutine call, allows match sub-pattern delimited capturing groups. in case, since (?1) inside capturing group 1, creates recursive effect.

demo

demo

my $str = "az(se)(dc(fr(lo)))(km(ji)(hn))(()aaa(()())(ff(dd)aa))"; @arr = $str =~ /\(((?:[^()]++|\((?1)\))*+)\)/g; print join("\n", @arr) 

output

 se dc(fr(lo)) km(ji)(hn) ()aaa(()())(ff(dd)aa) 

Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -