R remove multiple text strings in data.table -
i have vector
of words remove data.table
dt
follows.
wordstoremove <- c("simpson", "flander", "nahasapeemapetilon", "spuckler", "wiggum") dt <- structure(list(vid = c("simpsons", "flanders", "nahasapeemapetilons", "spucklers", "wiggums"), wr1 = c("homer simpson", "ned flanders", "apu nahasapeemapetilon", "cletus spuckler", "chief wiggum"), wr2 = c("bart simpson", "rod flanders", "manjula nahasapeemapetilon", "brandine spuckler", "ralph wiggum"), wr3 = c("marge simpson", "todd flanders", "sanjay nahasapeemapetilon", na, "sarah wiggum" )), .names = c("vid", "wr1", "wr2", "wr3"), row.names = c(na, -5l), class = c("data.table", "data.frame")) dt vid wr2 wr2 wr3 1: simpsons homer simpson bart simpson marge simpson 2: flanders ned flanders rod flanders todd flanders 3: nahasapeemapetilons apu nahasapeemapetilon manjula nahasapeemapetilon sanjay nahasapeemapetilon 4: spucklers cletus spuckler brandine spuckler na 5: wiggums chief wiggum ralph wiggum sarah wiggum
i know can use solution in r remove multiple text strings in data frame.
how using data.table
minimise copying of data?
try this:
library(data.table) foo <- function(x) gsub(paste0(wordstoremove, collapse="s?|"), "", x) dt[, names(dt)[-1] := lapply(.sd, foo), .sdcols = names(dt)[-1]] dt # vid wr1 wr2 wr3 # 1: simpsons homer bart marge # 2: flanders ned rod todd # 3: nahasapeemapetilons apu manjula sanjay # 4: spucklers cletus brandine na # 5: wiggums chief ralph sarah
Comments
Post a Comment