• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
  • Junilu Lacar
Sheriffs:
  • Tim Cooke
  • Jeanne Boyarsky
  • Knute Snortum
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:
  • salvin francis
  • fred rosenberger
  • Frits Walraven

Scraper for Twitter

 
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Everyone,

I'm looking to make a scraper for Twitter and started looking online for resources that can give me a brief idea as to how I may go about doing it. I found quite a few gems like TweetStream and twitter-scraper but haven't seen a single full-fledged implementation of it to study for reference. Can someone please provide information regarding this concept as I'm familiar with the Ruby language but am not too sure as to how to proceed with this logic ?

 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try this tutorial to see how to connect to Twitter from Ruby:

http://blog.benmorgan.io/post/79339120263/how-to-use-the-twitter-api-for-ruby-on-rails

Then you should be able to use the Ruby Twitter client to request data from the Twitter streams etc.

I don't use Ruby myself, but I would assume there are decent Twitter clients for Ruby, as there are for most languages. I've used Twitter clients for Python, Java and Scala and they're all fairly similar.

You need to set up an "application" under your Twitter account and request a set of API keys, which you then use as your credentials inside your client program. These give you permission to access Twitter's various stream interfaces and request a feed of live Tweets, for example. The Twitter stuff is documented here:

https://dev.twitter.com/overview/documentation
 
Rahul Dayal Sharma
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Thanks for the link to the blog. I've already registered my application and obtained the 2 keys and their secrets, and was planning on storing the tweets in a NoSQL database like CouchDB but MongoDB looks like a good alternative to it.

I didn't think about using Rails though to create the scraper and this again I think brings in a bit of complexity in the design. Apart from Ruby, which language would you recommend to create a Twitter scraper ? and is it easy to create and manage ?
 
chris webster
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, it's up to you. I like Python for this kind of thing, as it's easy to use, it handles JSON well, and the Twitter clients are pretty straightforward as far as I recall. Also, the PyMongo driver makes it really easy to work with MongoDB.

Have a look at Tweepy: http://tweepy.readthedocs.org/en/v3.5.0/

And PyMongo: https://api.mongodb.org/python/current/tutorial.html

I use the free Anaconda distribution of Python, which includes lots of extra libraries: https://www.continuum.io/downloads

And you might find the Jupyter Notebook (included with Anaconda) is a nice way to explore Python via your browser: https://jupyter.org

But you should be able to do what you want with any mainstream language, as they should all have Twitter clients and MongoDB drivers, so just pick whichever one you feel comfortable with.
 
We don't have time to be charming! Quick, read this tiny ad:
ScroogeXHTML 8.7 - RTF to HTML5 and XHTML converter
https://coderanch.com/t/730700/ScroogeXHTML-RTF-HTML-XHTML-converter
    Bookmark Topic Watch Topic
  • New Topic