r - Assign unique ID based on instances of a pattern -
i have chat logs experiment exported in format:
df = data.frame( subject = c("string",1,2,3,"string", 2, 3, "string", 1,1,3,4), text = c(rep("blah blah blah", 12)), period = c(rep("na", 12)) ) > head(df) subject text period 1 string blah blah blah na 2 1 blah blah blah na 3 2 blah blah blah na 4 3 blah blah blah na 5 string blah blah blah na 6 2 blah blah blah na
where "string" identifier text repeated throughout column.
i want write function a) recognizes character pattern in subject
column , b) assigns value period
based on each instance of pattern.
for example, know can achieve first part running
> grepl("s+", df$subject, perl = t) [1] true false false false true false false true false false false false
and there achieve second part running assigns period == 1
first of instance of true
, period == 2
second instance of true
, , on. can't figure out second part. ideas?
i'm not sure regarding desired output, assuming don't have period
column (you made empty factor column values harder change), using data.table
do
df = data.frame( subject = c("string",1:3,"string", 2:3, "string", 1,1,3,4), text = "blah blah blah" ) library(data.table) setdt(df)[grep("s+", subject), period := seq_len(.n)] df # subject text period # 1: string blah blah blah 1 # 2: 1 blah blah blah na # 3: 2 blah blah blah na # 4: 3 blah blah blah na # 5: string blah blah blah 2 # 6: 2 blah blah blah na # 7: 3 blah blah blah na # 8: string blah blah blah 3 # 9: 1 blah blah blah na # 10: 1 blah blah blah na # 11: 3 blah blah blah na # 12: 4 blah blah blah na
what subset matched instances, take length of subset using .n
operator (which 3 in case), , assign by reference (using :=
operator) sequence of 3 1,2,3
period
column within subset.
unless want
cumsum(grepl("s+", df$subject)) ## [1] 1 1 1 1 2 2 2 3 3 3 3 3
this, modification of solution converts logical vector binary 1 (true
becomes 1 , false
becomes 0) , performs cumulative sum.
Comments
Post a Comment