Iterate over column names in an R data frame in order to change their type -


library(lubridate)  # data build df d1 <- c("1/2/14", "3/5/15", "1/13/11") #start d2 <- c("1/2/15", "4/5/15", "6/18/15") #stop d3 <- c("5/16/08", "1/7/07", "6/22/01") #start d4 <- c("11/29/12", "8/5/14", "1/13/12") #stop <- c("blah", "blah", "blah") b <- c("blah", "blah", "blah") c <- c("blah", "blah", "blah") f <- c("blah", "blah", "blah") colnames <- c("col.a", "col.b", "col.c", "project1.start", "project1.end", "project2.start", "project2.end", "col.f")  # assemble df df <- data.frame(a,b,c,d1,d2,d3,d4,f) names(df) <- colnames  # change char cols dx posix date objects play nicely     # lubridate df$project1.start <- mdy(df$project1.start) df$project1.end <- mdy(df$project1.end) df$project2.start <- mdy(df$project2.start) df$project2.end <- mdy(df$project2.end) 

but! want above mdy iteratively on dx specify. imagine instead of d1-d4 have d1-d142. there must elegant, i.e., non-brute force way of doing this!

so, tried this. know i'm doing mdy on many columns, trying make work @ all. i've tried loops seq(), etc., know i'm missing vector based approach r expects.

f <- function(x) {x <- mdy(x)} newdf <- apply(df,2,f) 

but throws

warning messages: 1: formats failed parse. no formats found.  ... 10: formats failed parse. no formats found.  

and newdf bad:

     col.a col.b col.c project1.start project1.end project2.start project2.end col.f [1,]    na    na    na             na           na             na           na    na [2,]    na    na    na             na           na             na           na    na [3,]    na    na    na             na           na             na           na    na         project1.duration project2.duration [1,]                na                na [2,]                na                na [3,]                na                na 

what doing st00pid?

so, once done, want date math

df$project1.duration <- (df$project1.end - df$project1.start ) df$project2.duration <- (df$project2.end - df$project2.start ) 

same here. want able iterate on durations dx columns perhaps need reshape data make happen. how take large number of durations of these different projects separately coded , reassemble them df can make plot of different durations each project. in sample df have 3 different durations, rows 1:3, able compare rows each project.

your error because apply applying mdy every column of df, not "projectx.{start,end}" ones. , because df[col] data.frame, , mdy needs vector -- try df[[col]].

e.g.

cols <- grep('project', names(df)) # one-liner df[cols] <- lapply(df[cols], mdy) # or loop if want (col in cols) {     df[[col]] <- mdy(df[[col]]) } 

in regards calculating per-project data (like duration), can kludge this:

projects <- paste0('project', 1:2) # many projects df[paste0(projects, '.duration')] <- df[paste0(projects, '.end')] - df[paste0(projects, '.start')] 

however in long run (particularly if have lots of projects or want calculate lots of stats per project, not duration) might consider having data in long format, i.e.

project  start  end duration  1       ...  1  1  2  2  2 

(probably sort of id variable know project 2 went project 1)

then can mydf$duration <- mydf$end - mydf$start , if want in wide format again can make use of reshape.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -