Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Scraper for Twitter  RSS feed

 
Rahul Dayal Sharma
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Everyone,

I'm looking to make a scraper for Twitter and started looking online for resources that can give me a brief idea as to how I may go about doing it. I found quite a few gems like TweetStream and twitter-scraper but haven't seen a single full-fledged implementation of it to study for reference. Can someone please provide information regarding this concept as I'm familiar with the Ruby language but am not too sure as to how to proceed with this logic ?

 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try this tutorial to see how to connect to Twitter from Ruby:

http://blog.benmorgan.io/post/79339120263/how-to-use-the-twitter-api-for-ruby-on-rails

Then you should be able to use the Ruby Twitter client to request data from the Twitter streams etc.

I don't use Ruby myself, but I would assume there are decent Twitter clients for Ruby, as there are for most languages. I've used Twitter clients for Python, Java and Scala and they're all fairly similar.

You need to set up an "application" under your Twitter account and request a set of API keys, which you then use as your credentials inside your client program. These give you permission to access Twitter's various stream interfaces and request a feed of live Tweets, for example. The Twitter stuff is documented here:

https://dev.twitter.com/overview/documentation
 
Rahul Dayal Sharma
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Thanks for the link to the blog. I've already registered my application and obtained the 2 keys and their secrets, and was planning on storing the tweets in a NoSQL database like CouchDB but MongoDB looks like a good alternative to it.

I didn't think about using Rails though to create the scraper and this again I think brings in a bit of complexity in the design. Apart from Ruby, which language would you recommend to create a Twitter scraper ? and is it easy to create and manage ?
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, it's up to you. I like Python for this kind of thing, as it's easy to use, it handles JSON well, and the Twitter clients are pretty straightforward as far as I recall. Also, the PyMongo driver makes it really easy to work with MongoDB.

Have a look at Tweepy: http://tweepy.readthedocs.org/en/v3.5.0/

And PyMongo: https://api.mongodb.org/python/current/tutorial.html

I use the free Anaconda distribution of Python, which includes lots of extra libraries: https://www.continuum.io/downloads

And you might find the Jupyter Notebook (included with Anaconda) is a nice way to explore Python via your browser: https://jupyter.org

But you should be able to do what you want with any mainstream language, as they should all have Twitter clients and MongoDB drivers, so just pick whichever one you feel comfortable with.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!