sql server - RData takes longer to load than querying the database again -


i running rstudio server on 256gb ram server, , ms-sql-server 2012 on another. db contains data allows me build graph ~100 million nodes , ~150 million edges.

i have timed how long takes build graph data:

  • 1st select query = ˜22m rows = 12 minutes = df1 (dataframe1)
  • 2nd select query = ˜30m rows = 8 minutes = df2
  • 3rd select query = ˜32m rows = 8 minutes = df3
  • 4th select query = ˜63m rows = 70 minutes = df4
  • edges = rbind(df1, df2, df3, df4) = 6 minutes
  • mygraph = graph.data.frame(edges) = 30 minutes

so little on 2 hours. since data quite stable, figured speed things saving mygraph disk. when tried load it, wouldn't. gave after 4 hour wait, thinking had gone wrong.

so reboot server, delete .rstudio folder , start over, time saving dataframes each sql query plus edges dataframe, in both rdata , rds formats (save() , saverds(), compress = false everytime). after each save, timed load() , readrds() times of 5 dataframes. times pretty same load() , readrds():

  • df1 = 1.1 gb file = 1 minute
  • df2 = 1.4 gb file = 2 minutes
  • df3 = 1.7 gb file = 6 minutes
  • df4 = 3.1 gb file = 13 minutes
  • edges = 6.8 gb file = 21 minutes

good enough, thought. today when started new session , tried load(df1) make changes it, again got feeling wrong. after 20 minutes waiting load, gave up. memory, disk , cpu shouldn't issues, i'm 1 using server. have reboot server , deleted .rstudio folder, thinking maybe in there hanging session, dataframe still won't load. while load() supposedly running, iotop shows no disk activity , ps

ps -c rsession -o %cpu,%mem,cmd %cpu %mem cmd 99.5 0.3 /usr/lib/rstudio-server/bin/rsession -u myusername

i have no idea try next. makes no sense me loading rdata file take longer querying sql database lives on different server. , if did, why fast when timing load() , readrds() times after saving dataframes?

it's first time ask here @ stackoverflow, sorry if forgot mention important able answer question. if did, please let me know.


edit: additional info requested brandon in comments. os centos 7. dataframes contain lists of edges in first 2 columns (col1=node1; col2=node2) , 2 additional columns edge attributes. columns strings, varying between 5 , 14 characters long. have added approximate number of rows of each dataframe original post. thanks!


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -