Project

chipper

0.0
No commit activity in last 3 years
No release in over 3 years
twitter text extraction utilities
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

Twitter Text Extraction¶ ↑

A fast screen name, hashtag, url extraction and tokenizer for tweets.

API¶ ↑

Chipper
  #users     => [Array]
  #hashtags  => [Array]
  #urls      => [Array]
  #tokens    => [Array]

  #skip_users
  #skip_hashtags
  #skip_tokens
  #skip_token_pattern

Usage¶ ↑

require 'chipper'

Chipper.skip_users(%w(youtube msn))
Chipper.skip_hashtags(%w(abc24 cnn))
Chipper.skip_tokens(%w(story tv why that get from your))
Chipper.skip_token_pattern '^vimeo$'

tweet = "hi @youtube, could we get #cnn videos so i can #watch it on my @apple tv http://t.co/HM7XoimM"
Chipper.users(tweet)    #=> ["@apple"]
Chipper.hashtags(tweet) #=> ["#watch"]
Chipper.urls(tweet)     #=> ["http://t.co/HM7XoimM"]

# n-gram tokenizer, returns a list of tokens partitioned by stop words, punctuation, urls and hashtags.
Chipper.tokens(tweet)   #=> [["could"], ["get"], ["videos"], ["can"]]

# single method that does all of the above and returns a hash.
Chipper.entities(tweet)

Gotchas¶ ↑

  • skips tokens shorter than 3 characters

  • only handles t.co urls

Updating version¶ ↑

  • update ext/src/version.h

  • rake gemspec

License¶ ↑

Creative Commons Attribution - CC BY