Python pandas dataframe: filter columns using a list? -
i have dataframe large: 100000 rows * 10000 cols
now i'm given list of labels (call list1) not match labels of columns in dataframe, match part of these labels. example, label in dataframe might "string1,d111" , labels in list1 might "d111".
so want find out these corresponding columns using list1, , sum these columns, efficient way this?
dataframe: string1,d111 string2,d222 string3,d333 ...... stringn,dnnn 1 .. .. .. .. 2 3 4 5 6 ... list1: d111, d333,...dxxx
in [28]: df = dataframe(randn(10,10),columns=[ 'c_%s' % in range(3)] + ['d_%s' % in range(3) ] + ['e_%s' % in range(4)]) in [3]: df.filter(regex='d_|e_') out[3]: d_0 d_1 d_2 e_0 e_1 e_2 e_3 0 -0.022661 -0.504317 0.279227 0.286951 -0.126999 -1.658422 1.577863 1 0.501654 0.145550 -0.864171 -0.374261 -0.399360 1.217679 1.357648 2 -0.608580 1.138143 1.228663 0.427360 0.256808 0.105568 -0.037422 3 -0.993896 -0.581638 -0.937488 0.038593 -2.012554 -0.182407 0.689899 4 0.424005 -0.913518 0.405155 -1.111424 -0.180506 1.211730 0.118168 5 0.701127 0.644692 -0.188302 -0.561400 0.748692 -0.585822 1.578240 6 0.475958 -0.901369 -0.734969 1.090093 1.297208 1.140128 0.173941 7 -0.679514 -0.790529 -2.057733 0.420175 1.766671 -0.797129 -0.825583 8 -0.918645 0.916237 0.992001 -0.440573 -1.875960 -1.223502 0.084821 9 1.096687 -1.414057 -0.268211 0.253461 -0.175931 1.481261 -0.200600
Comments
Post a Comment