python - pandas - select/mask the first n elements by value -
starting 1 single dataframe:
i,a,b,c,d,e,f a,1,3,5,6,4,2 b,3,4,7,1,0,0 c,1,3,5,2,0,7
i keep/mask first 3 elements in rows value keeping order of columns, resulting dataframe appears as:
i,a,b,c,d,e,f a,0,0,5,6,4,0 b,3,4,7,0,0,0 c,0,3,5,0,0,7
so far i've been able sort dataframe with:
a = df.values
and
a.sort(axis=1)
so that:
[[1 1 2 3 4 5] [0 0 1 1 3 4] [0 1 1 3 5 7]]
obtaining sorted numpy array, loosing information columns.
you can rank values row-wise , filter them , call fillna
:
in [248]: df[df.rank(axis=1, method='min')>3].fillna(0) out[248]: b c d e f 0 0 0 0 5 6 4 0 1 0 3 4 7 0 0 0 2 0 0 3 5 0 0 7
you can concat 'i' column back:
in [268]: pd.concat([df['i'], df[df.rank(axis=1, method='min')>3].fillna(0)[df.columns[1:]]], axis=1) out[268]: b c d e f 0 0 0 5 6 4 0 1 b 3 4 7 0 0 0 2 c 0 3 5 0 0 7
output intermediate dfs:
in [269]: df.rank(axis=1, method='min') out[269]: b c d e f 0 1 3 5 6 4 2 1 4 5 6 3 1 1 2 2 4 5 3 1 6 in [270]: df.rank(axis=1, method='min')>3 out[270]: b c d e f 0 false false true true true false 1 true true true false false false 2 false true true false false true
Comments
Post a Comment