r - Keep doubled columns which differ in only 2 letters in a data.frame -

i have data frame in r consists of around 100 columns. of columns doubled differ in 2 letters. want keep these columns , delete columns not doubled.

here example:

234-rgz sk    234-rgz pv    556-gft sk    456-hjk sk    456-hjk pv

the output should be:

234-rgz sk    234-rgz pv    456-hjk sk    456-hjk pv

all columns have same naming conventions. number starting 2 150 "-" after 4 or 5 letters, space , "sk" or "pv". thought of using regular expression don't solving problem how rid of single columns. help!

you can use duplicated on column names after removing suffix part. output logical index can used subset original dataset.

v1 <- colnames(df1) v2 <- sub('\\s+[^ ]+$', '', v1) indx <- duplicated(v2)|duplicated(v2, fromlast=true) v1[indx] #[1] "234-rgz sk" "234-rgz pv" "456-hjk sk" "456-hjk pv"

to subset columns in dataframe,

df1[indx]

or option splitting column names string substring , use grep match substring have frequency >1

 tbl <- table(unlist(strsplit(v1, '\\s+.*')))  df1[grep(paste(names(tbl)[tbl>1], collapse="|"), v1)]

data

set.seed(24) df1 <- as.data.frame(matrix(sample(0:9, 5*10, replace=true), ncol=5,   dimnames=list(null, c('234-rgz sk',  '234-rgz pv' ,   '556-gft sk',     '456-hjk sk' ,   '456-hjk pv') )) )

Search This Blog

Brant

r - Keep doubled columns which differ in only 2 letters in a data.frame -

data

Comments

Post a Comment

Popular posts from this blog

searchKeyword not working in AngularJS filter -

sequelize.js - Sequelize: sort by enum cases -

user interface - how to replace an ongoing process of image capture from another process call over the same ImageLabel in python's GUI TKinter -