python - pandas - get values from multindex columns -
i have following dataframe df:
h,nu,city,code,code2 0.965392,15,madrid,es,es 0.920614,15,madrid,it,es 0.726219,16,madrid,tn,es 0.739119,17,madrid,fr,es 0.789923,55,dublin,mt,en 0.699239,57,dublin,en,en 0.890462,68,dublin,ar,en 0.746863,68,dublin,pt,en 0.789923,55,milano,it,it 0.699239,57,milano,es,it 0.890462,68,milano,ar,it 0.746863,68,milano,pt,it
i add new column hcode
, each city
, h
value corresponding code
mapped code2
string, resulting dataframe appears as:
h,nu,city,code,code2,hcode 0.965392,15,madrid,es,es,0.965392 0.920614,15,madrid,it,es,0.965392 0.726219,16,madrid,tn,es,0.965392 0.739119,17,madrid,fr,es,0.965392 0.789923,55,dublin,mt,en,0.699239 0.699239,57,dublin,en,en,0.699239 0.890462,68,dublin,ar,en,0.699239 0.746863,68,dublin,pt,en,0.699239 0.789923,55,milano,it,it,0.789923 0.699239,57,milano,es,it,0.789923 0.890462,68,milano,ar,it,0.789923 0.746863,68,milano,pt,it,0.789923
so far tried groupby city , code2, no results.
you can groupby
on 'city' , 'code2', call first
on , reset index resulting in following:
in [172]: gp = df.groupby(['city','code2'])['h'].first().reset_index() gp out[172]: city code2 h 0 dublin en 0.789923 1 madrid es 0.965392 2 milano 0.789923
then perform left merge on original df , select 'h_y' column, name comes fact columns clash , ffill
this:
in [173]: df['hcode'] = df.merge(gp, left_on=['city', 'code'], right_on=['city', 'code2'], how='left')['h_y'].ffill() df out[173]: h nu city code code2 hcode 0 0.965392 15 madrid es es 0.965392 1 0.920614 15 madrid es 0.965392 2 0.726219 16 madrid tn es 0.965392 3 0.739119 17 madrid fr es 0.965392 4 0.789923 55 dublin mt en 0.965392 5 0.699239 57 dublin en en 0.789923 6 0.890462 68 dublin ar en 0.789923 7 0.746863 68 dublin pt en 0.789923 8 0.789923 55 milano 0.789923 9 0.699239 57 milano es 0.789923 10 0.890462 68 milano ar 0.789923 11 0.746863 68 milano pt 0.789923
result of merge
show produces:
in [165]: df.merge(gp, left_on=['city', 'code'], right_on=['city', 'code2'])['h_y'] out[165]: 0 0.965392 1 0.789923 2 0.789923 name: h_y, dtype: float64
edit
ok, iiuc can group before filter group 'code2' equals 'code' , use merge against:
in [200]: gp = df.groupby('city') mask = gp.apply(lambda x: x['code2'] == x['code']) lookup = df.loc[mask[mask].reset_index(level=0).index] lookup out[200]: h nu city code code2 5 0.699239 57 dublin en en 0 0.965392 15 madrid es es 8 0.789923 55 milano in [202]: df['hcode'] = df.merge(lookup, left_on=['city', 'code'], right_on=['city', 'code2'], how='left')['h_y'].ffill() df out[202]: h nu city code code2 hcode 0 0.965392 15 madrid es es 0.965392 1 0.920614 15 madrid es 0.965392 2 0.726219 16 madrid tn es 0.965392 3 0.739119 17 madrid fr es 0.965392 4 0.789923 55 dublin mt en 0.965392 5 0.699239 57 dublin en en 0.699239 6 0.890462 68 dublin ar en 0.699239 7 0.746863 68 dublin pt en 0.699239 8 0.789923 55 milano 0.789923 9 0.699239 57 milano es 0.789923 10 0.890462 68 milano ar 0.789923 11 0.746863 68 milano pt 0.789923
Comments
Post a Comment