regex - extracting sentence containing 2 words from a text file in java -


i'm trying extract sentence containing 2 words text file. have used regex shown in code below.

file doc = new file("d:\\myfile.txt");  bufferedreader br = null;  system.out.println("enter regex pattern matched"); scanner keyboard = new scanner(system.in); string regxpat = keyboard.nextline();     string line;   br = new bufferedreader(new filereader(doc));        pattern p = pattern.compile(regxpat, case_insensitive);      while ((line = br.readline()) != null)    {      try     {         matcher m = p.matcher(line);         m.find();          system.out.print(m.group().tostring());      }             catch (illegalstateexception e)      {     }     continue;    } //i tried regex= "(he)*([.&&[^\.]]*?)milan(.*?)\." 

if text is:

"...thomas edison scientist. invented bulb. born in milan, ohio, , grew in port huron, michigan. seventh , last child of samuel ogden edison, jr...." 
  • i want sentence(sentence boundary full stop followed space) words 'he'and'milan' i.e 3rd sentnce(the order not important.any sentence both words needed)
  • i tried regex pattrn above , many others
  • but extracts part of sentence after 'milan' or 2 sentences starting first 'he'
  • please suggest method task done using regex or other method in java

(i working on extracting relation pattern between 2 entities: in case relation pattern "born in" b/w entities "edison" , "milan". need such sentences above numerous related text files or web documents [like biographies on edison or first 500 links google on "edison milan"] futher processing)

my suggestion not expect regular expression processing, , process text 1 step @ time.

i want sentence (sentence boundary full stop followed space).

fine. use string split method sentences. use full stop (period) followed 1 or more spaces regular expression. i'll leave construction of regular expression you.

with words 'he' , 'milan'

fine. write method input words , add them list<string>.

write method go through string array created split method, splitting sentence words. again, i'll leave construction of regular expression you.

when find sentence first word, loop through word list, checking see if words in list in sentence split on word boundaries. if find words, found matching sentence. if don't find words, continue next sentence.

once you've looped through split string array of sentences, either have 1 sentence, more 1 sentence, or no sentences contain list of words.


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -