How to merge two pandas DataFrames in Python? -


i trying join 2 pandas data frames inner join.

my_df = pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['mydate']) 

however getting following error:

keyerror: 'mydate' typeerror: integer required 

i believe joining on dates valid, cannot make simple join work?

df2 created using following

df2 = idf.groupby(lambda x: (x.year,x.month,x.day)).mean() 

can please advise? lot.

df1 type    object id      object date    object value   float64       type    id          date       value 0   car     pstat001    15/07/15    42 1   bike    pstat001    16/07/15    42 2   bike    pstat001    17/07/15    42 3   bike    pstat004    18/07/15    42 4   bike    pstat001    19/07/15    32  df2  mydate  object val1    float64 val2    float64 val3    float64      mydate     val1         val2           val3 0   (2015,7,13) 1074        1871.666667    2800.777778 1   (2015,7,14) 347.958333  809.416667     1308.458333 2   (2015,7,15) 202.625     597.375        1008.666667 3   (2015,7,16) 494.958333  1192           1886.916667 

df1.info()

<class  'pandas.core.frame.dataframe'>               int64index: 3040    entries,    0    3039 data    columns (total  4   columns):    type    3040    non-null    object       id      3040    non-null    object       date    3040    non-null    object       value   3040    non-null    float64      dtypes: float64(1), object(3)            memory  usage:  118.8+  kb   

df2.info()

<class  'pandas.core.frame.dataframe'>               int64index: 16  entries,    0    15 data    columns (total  4   columns):    mydate  16  non-null    object       val1    16  non-null    float64      val2    16  non-null    float64      val3    16  non-null    float64      dtypes: float64(3), object(1)            memory  usage:  640.0+  bytes    

your date columns not datetime dtype, df1 looks str whilst other tuple need convert these first , merge work:

in [75]: df1['date'] = pd.to_datetime(df1['date']) df1.info()  <class 'pandas.core.frame.dataframe'> int64index: 5 entries, 0 4 data columns (total 4 columns): type     5 non-null object id       5 non-null object date     5 non-null datetime64[ns] value    5 non-null int64 dtypes: datetime64[ns](1), int64(1), object(2) memory usage: 200.0+ bytes  in [76]: import datetime dt df2['mydate'] = df2['mydate'].apply(lambda x: dt.datetime(x[0], x[1], x[2])) df2.info()  <class 'pandas.core.frame.dataframe'> int64index: 4 entries, 0 3 data columns (total 4 columns): mydate    4 non-null datetime64[ns] val1      4 non-null float64 val2      4 non-null float64 val3      4 non-null float64 dtypes: datetime64[ns](1), float64(3) memory usage: 160.0 bytes  in [78]:     my_df=  pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['mydate']) my_df  out[78]:    type        id       date  value     mydate        val1      val2  \ 0   car  pstat001 2015-07-15     42 2015-07-15  202.625000   597.375    1  bike  pstat001 2015-07-16     42 2015-07-16  494.958333  1192.000               val3   0  1008.666667   1  1886.916667  

Comments

Popular posts from this blog

searchKeyword not working in AngularJS filter -

sequelize.js - Sequelize: sort by enum cases -

user interface - how to replace an ongoing process of image capture from another process call over the same ImageLabel in python's GUI TKinter -