regex - Return original search terms for grep in R -


i have list of items , list of search terms, , trying 2 things:

  1. search through items matches of search terms, , return true iff match found.
  2. for items true returned (i.e., there match), return original search term matched in step 1.

so, given following data frame:

             items 1             alex 2 alex person 3   test 4            false 5    cathy 

and following list of search terms:

"alex"      "bob"       "cathy"     "derrick"   "erica"     "ferdinand" 

i create following output:

             items matches original 1             alex    true     alex 2 alex person    true     alex 3   test   false     <na> 4            false   false     <na> 5    cathy    true     cathy 

step 1 straightforward, having trouble step (2). create 'matches' column, use grepl() create variable true if row in d$items in list of search terms, , false otherwise.

for step 2, thought should able use grep() while specifying value = t, shown in code below. however, returns wrong value: rather return original search term matched grep, returns value of item matched. following output:

            items matches original 1             alex    true     alex 2 alex person    true     alex person 3   test   false     <na> 4            false   false     <na> 5    cathy    true     cathy 

this code using right now. thoughts appreciated!

# dummy data , search terms d = data.frame(items = c("alex", "alex person", "this test", "false", "this cathy")) searchterms = c("alex", "bob", "cathy", "derrick", "erica", "ferdinand")  # return true iff search term found in items column, not between letters d$matches = grepl(paste("(^| |[^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvqxyz])",      searchterms, "($| |[^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvqxyz])", sep = "",      collapse = "|"), d[,1], ignore.case = true )  # subset data dmatched = d[d$matches==t,]     # problem is: return value matched grepl above dmatched$original = grep(paste("(^| |[^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvqxyz])",      searchterms, "($| |[^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvqxyz])", sep = "",      collapse = "|"), dmatched[,1], ignore.case = true, value = true )   d$original[d$matches==t] = dmatched$original 

not want can use qdap's termco function this. in case have 2 names in same sentence:

library(qdap) termco(d$items, 1:nrow(d), searchterms)  ## > termco(d$items, 1:nrow(d), searchterms) ##   nrow(d word.count       alex bob     cathy derrick erica ferdinand ## 1      1          1 1(100.00%)   0         0       0     0         0 ## 2      2          4  1(25.00%)   0         0       0     0         0 ## 3      3          4          0   0         0       0     0         0 ## 4      4          1          0   0         0       0     0         0 ## 5      5          3          0   0 1(33.33%)       0     0         0 

to you're after qdap can use:

dat <- termco(d$items, 1:nrow(d), searchterms)$raw terms <- character()  (i in 3:ncol(dat)){     terms <- paste(terms, ifelse(dat[, i] == 1, colnames(dat)[i], "")) }  d$matches <- as.logical(rowsums(dat[, -c(1:2)])) x <- gsub(" ", ", ", clean(trim(terms))) d$original <- replacer(x, "", na)  ## > d ##              items matches original ## 1             alex    true     alex ## 2 alex person    true     alex ## 3   test   false     <na> ## 4            false   false     <na> ## 5    cathy    true    cathy 

Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -