How to merge two pandas DataFrames in Python? -
i trying join 2 pandas data frames inner join.
my_df = pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['mydate']) however getting following error:
keyerror: 'mydate' typeerror: integer required i believe joining on dates valid, cannot make simple join work?
df2 created using following
df2 = idf.groupby(lambda x: (x.year,x.month,x.day)).mean() can please advise? lot.
df1 type object id object date object value float64 type id date value 0 car pstat001 15/07/15 42 1 bike pstat001 16/07/15 42 2 bike pstat001 17/07/15 42 3 bike pstat004 18/07/15 42 4 bike pstat001 19/07/15 32 df2 mydate object val1 float64 val2 float64 val3 float64 mydate val1 val2 val3 0 (2015,7,13) 1074 1871.666667 2800.777778 1 (2015,7,14) 347.958333 809.416667 1308.458333 2 (2015,7,15) 202.625 597.375 1008.666667 3 (2015,7,16) 494.958333 1192 1886.916667 df1.info()
<class 'pandas.core.frame.dataframe'> int64index: 3040 entries, 0 3039 data columns (total 4 columns): type 3040 non-null object id 3040 non-null object date 3040 non-null object value 3040 non-null float64 dtypes: float64(1), object(3) memory usage: 118.8+ kb df2.info()
<class 'pandas.core.frame.dataframe'> int64index: 16 entries, 0 15 data columns (total 4 columns): mydate 16 non-null object val1 16 non-null float64 val2 16 non-null float64 val3 16 non-null float64 dtypes: float64(3), object(1) memory usage: 640.0+ bytes
your date columns not datetime dtype, df1 looks str whilst other tuple need convert these first , merge work:
in [75]: df1['date'] = pd.to_datetime(df1['date']) df1.info() <class 'pandas.core.frame.dataframe'> int64index: 5 entries, 0 4 data columns (total 4 columns): type 5 non-null object id 5 non-null object date 5 non-null datetime64[ns] value 5 non-null int64 dtypes: datetime64[ns](1), int64(1), object(2) memory usage: 200.0+ bytes in [76]: import datetime dt df2['mydate'] = df2['mydate'].apply(lambda x: dt.datetime(x[0], x[1], x[2])) df2.info() <class 'pandas.core.frame.dataframe'> int64index: 4 entries, 0 3 data columns (total 4 columns): mydate 4 non-null datetime64[ns] val1 4 non-null float64 val2 4 non-null float64 val3 4 non-null float64 dtypes: datetime64[ns](1), float64(3) memory usage: 160.0 bytes in [78]: my_df= pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['mydate']) my_df out[78]: type id date value mydate val1 val2 \ 0 car pstat001 2015-07-15 42 2015-07-15 202.625000 597.375 1 bike pstat001 2015-07-16 42 2015-07-16 494.958333 1192.000 val3 0 1008.666667 1 1886.916667
Comments
Post a Comment