r - Keep doubled columns which differ in only 2 letters in a data.frame -


i have data frame in r consists of around 100 columns. of columns doubled differ in 2 letters. want keep these columns , delete columns not doubled.

here example:

234-rgz sk    234-rgz pv    556-gft sk    456-hjk sk    456-hjk pv   

the output should be:

234-rgz sk    234-rgz pv    456-hjk sk    456-hjk pv 

all columns have same naming conventions. number starting 2 150 "-" after 4 or 5 letters, space , "sk" or "pv". thought of using regular expression don't solving problem how rid of single columns. help!

you can use duplicated on column names after removing suffix part. output logical index can used subset original dataset.

v1 <- colnames(df1) v2 <- sub('\\s+[^ ]+$', '', v1) indx <- duplicated(v2)|duplicated(v2, fromlast=true) v1[indx] #[1] "234-rgz sk" "234-rgz pv" "456-hjk sk" "456-hjk pv" 

to subset columns in dataframe,

df1[indx] 

or option splitting column names string substring , use grep match substring have frequency >1

 tbl <- table(unlist(strsplit(v1, '\\s+.*')))  df1[grep(paste(names(tbl)[tbl>1], collapse="|"), v1)] 

data

set.seed(24) df1 <- as.data.frame(matrix(sample(0:9, 5*10, replace=true), ncol=5,   dimnames=list(null, c('234-rgz sk',  '234-rgz pv' ,   '556-gft sk',     '456-hjk sk' ,   '456-hjk pv') )) ) 

Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -