performance - Faster way to do a correspondence replace operation in python? -


i not sure i'm using right term this---i'd call merge operation maybe? simple matching?

i have 2 dictionaries. 1 of them contains list of tag ids. other 1 correspondence between tag ids , tag id names. want match ids , include tag names in first dict.

so, first dictionary looks this:

>>> myjson [ {"tags" : ["1","3"],"otherdata" : "blah"}, {"tags" : ["2","4"],"otherdata" : "blah blah"} ] 

second dictionary looks this:

>>> tagnames [ {"id": "1", "name":"bassoon"}, {"id": "2", "name":"banjo"}, {"id": "3", "name":"paw paw"}, {"id": "4", "name":"foxes"} ] 

to replace tag ids in myjson tag id names, doing this:

data = [] j in myjson:     d = j     d['tagnames'] = [i['name'] in tagnames y in d['tags'] if y==i['id']]     data.append(d) 

my desired output this:

>>> data [ {"tags" : ["1","3"],"otherdata" : "blah", "tagname" : ["bassoon","paw paw"]}, {"tags" : ["2","4"],"otherdata" : "blah blah", "tagname": ["banjo","foxes"]} ] 

i'm getting right output, seems slow. it's doing full iterations of each element in myjson x full iterations of each element in tagnames (is m x n? n x n?) every time , that slow, maybe there smarter syntax or tricks speeding up? walk array once instead of n times?

oooh, also, cool if suggest way assignment slick map or functional approach rather outer forloop.

you want transform tagnames list dictionary:

tagnames_map = {t['id']: t['name'] t in tagnames} 

now can find matching tagnames faster; code made in-place changes, i'll simplify to:

for d in myjson:     d['tagnames'] = [tagnames_map[t] t in tagnames_map.viewkeys() & d['tags']] 

the dict.viewkeys() method returns dictionary view object acts set. intersect set against list of tags, resulting in sequence of tags listed in tagnames_map. doing don't have worry tags missing map.

if using python 3, use tagnames_map.keys() directly; in python 3 .keys(), .values() , items() methods have been changed return dictionary view objects.

if wanted make copy instead, using d.copy():

data = [] d in myjson:     d = d.copy()     d['tagnames'] = [tagnames_map[t] t in tagnames_map.viewkeys() & d['tags']]     data.append(d) 

dict.copy() creates shallow copy; mutable values not copied, new dict reference same values. not altering values here fine.

running against sample input gives:

>>> pprint(data) [{'otherdata': 'blah', 'tagnames': ['bassoon', 'paw paw'], 'tags': ['1', '3']},  {'otherdata': 'blah blah',   'tagnames': ['banjo', 'foxes'],   'tags': ['2', '4']}] 

Comments

Popular posts from this blog

c# - Operator '==' incompatible with operand types 'Guid' and 'Guid' using DynamicExpression.ParseLambda<T, bool> -