r - Assign unique ID based on instances of a pattern -


i have chat logs experiment exported in format:

df = data.frame(    subject = c("string",1,2,3,"string", 2, 3, "string", 1,1,3,4),    text = c(rep("blah blah blah", 12)),     period = c(rep("na", 12))   )  > head(df)   subject           text period   1  string blah blah blah     na   2       1 blah blah blah     na   3       2 blah blah blah     na   4       3 blah blah blah     na   5  string blah blah blah     na   6       2 blah blah blah     na 

where "string" identifier text repeated throughout column.

i want write function a) recognizes character pattern in subject column , b) assigns value period based on each instance of pattern.

for example, know can achieve first part running

> grepl("s+", df$subject, perl = t)  [1]  true false false false  true false false  true false false false false 

and there achieve second part running assigns period == 1 first of instance of true, period == 2 second instance of true, , on. can't figure out second part. ideas?

i'm not sure regarding desired output, assuming don't have period column (you made empty factor column values harder change), using data.table do

df = data.frame(   subject = c("string",1:3,"string", 2:3, "string", 1,1,3,4),   text = "blah blah blah" )  library(data.table) setdt(df)[grep("s+", subject), period := seq_len(.n)] df #     subject           text period #  1:  string blah blah blah      1 #  2:       1 blah blah blah     na #  3:       2 blah blah blah     na #  4:       3 blah blah blah     na #  5:  string blah blah blah      2 #  6:       2 blah blah blah     na #  7:       3 blah blah blah     na #  8:  string blah blah blah      3 #  9:       1 blah blah blah     na # 10:       1 blah blah blah     na # 11:       3 blah blah blah     na # 12:       4 blah blah blah     na 

what subset matched instances, take length of subset using .n operator (which 3 in case), , assign by reference (using := operator) sequence of 3 1,2,3 period column within subset.


unless want

cumsum(grepl("s+", df$subject)) ## [1] 1 1 1 1 2 2 2 3 3 3 3 3 

this, modification of solution converts logical vector binary 1 (true becomes 1 , false becomes 0) , performs cumulative sum.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -