java - WordNetSimalarity in large dataset of synsets -
i use wordnet similarity java api measure similarity between 2 synsets such:
public class wordnetsimalarity { private static ilexicaldatabase db = new nictwordnet(); private static relatednesscalculator[] rcs = { new hirststonge(db), new leacockchodorow(db), new lesk(db), new wupalmer(db), new resnik(db), new jiangconrath(db), new lin(db), new path(db) }; public static double computesimilarity( string word1, string word2 ) { ws4jconfiguration.getinstance().setmfs(true); double s=0; ( relatednesscalculator rc : rcs ) { s = rc.calcrelatednessofwords(word1, word2); // system.out.println( rc.getclass().getname()+"\t"+s ); } return s; }
main class
public static void main(string[] args) { long t0 = system.currenttimemillis(); file source = new file ("tagsfiltered.txt"); file target = new file ("fich4.txt"); arraylist<string> slist= new arraylist<>(); try { if (!target.exists()) target.createnewfile(); scanner scanner = new scanner(source); printstream psstream= new printstream(target); while (scanner.hasnext()) { slist.add(scanner.nextline()); } (int = 0; < slist.size(); i++) { (int j = i+1; j < slist.size(); j++) { psstream.println(slist.get(i)+" "+slist.get(j)+" "+wordnetsimalarity.computesimilarity(slist.get(i), slist.get(j))); } } psstream.close(); } catch (exception e) {e.printstacktrace(); } long t1 = system.currenttimemillis(); system.out.println( "done in "+(t1-t0)+" msec." ); }
my database contain 595 synsets that's mean method computesimilarity
called (595*594/2) time compute similarity between 2 words spend more 5000 ms
! finalize task need @ least 1 week !!
my question how reduce period !
how ameliorate performances??
i don't think language issue.
you can parallelism. think candidate map reduce , hadoop.
Comments
Post a Comment