samedi 27 juin 2015

Extracting users from twitter status in R. Non-trivial cases like RT@user @user:

I want to extract usernames from a tweet where these usernames may be:

  • followed by some non-alphanumerical characters.
  • not preceded by a white space.

I have this:

    text <- "RT@user1: This is a retweet that mentions @user2."
    tokens <- unlist(strsplit(text, " "))
    mentions.mask <- grepl("@\\w+", tokens)
    mentions <- tokens[mentions.mask]
    mentions
    [1] "RT@user1:" "@user2."

But I would like to get just @user1 and @user2

How can I do it?

Aucun commentaire:

Enregistrer un commentaire