algorithm - Detecting unusual path patterns in a gigantic directed graph -


i have gigantic directed graph (100m+ nodes) of nodes, multiple path instance records between sets of nodes. path taken between 2 nodes may vary, i'd find paths share multiple intermediary nodes except major deviation.

for example, have 10 instances of path between node , node h. 9 of ten path instances travel through nodes c,d,e,f - 1 of instances travels through c,d,z,e,f - want find "odd" instance.

any ideas how begin approach such problem? existing analytical frameworks might suited task?

details based on comments:

  • a pir (path instance record) list of nodes traveled through associated edge traversal times per edge.
  • currently, raw pir records in plain string format - obviously, want store differently based on how choose analyze it.
  • this not route solving problem - never need find possible paths; need analyze taken paths (each of pir).
  • the list of subpaths needs generated pirs.

an example of pir like: nodea;300;nodeb;600;nodec;100;noded;100;nodef

this translates path of a->b-c->d->f; cost/time of each vertice number - instance, cost 300 go a->b, 600 go b->c, , 100 go d->f. cost/time of each traversal differ each time traversal made. so, instance, in 1 pir, may cost 100 go a->b, in next may cost 150 go a->b.

go through list of paths , break them sets based on start , end node. example paths start node , end node b in same set. can same thing subsequences of paths. example every path subsequence a,b,c,d , start node y , end node k in same set. reversing paths required example, don't have set paths k y , set paths y k. can check if subsequence common enough followed checking if path(s) don't have subsequence if there subsequence within path sufficiently close original sequence based on edit distance. if interested in path, can calculate edit distance of path , subsequence, subtract difference in length, , check if result low enough. it's best use subsequence of path such starts , ends same node desired subsequence.

for example, algorithm reach set of paths containing subsequence c,d,e,f, , find there 9 of them. exceeds amount required subsequence common enough (and long enough, want sequences of @ least length k), check paths not included. in case, there one. note, either directly or indirectly, only removal of z, make sequence c,d,z,e,f c,d,e,f. passes (currently vague) requirements "odd", , path containing c,d,z,e,f added list of paths returned.


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -