• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

PIG - Is there online reference/shell?

 
Akhilesh Trivedi
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not able to understand the working of many pig operators like how they operate on data-set, is there any online material that discusses on this.
Also is there any place where I can create tuples/data-set and try running PIG commands/operators online?
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Regarding trying out pig online, one option I can think of is Amazon's AWS EMR (Elastic Map Reduce). It's a pay-as-you-go web service.
There are public datasets available on their AWS S3 storage service, such as this one.

If you have never tried pig at all, then start off by running pig locally in a VM on your machine. Just download, extract and run in local mode with "pig -x local". Other than Java, nothing else is required (it already has hadoop embedded, so you don't even have to install hadoop in this mode).
Under the extracted directory, there's a /tutorial subdirectory with a simple dataset named excite.log. You can learn pig by trying it out on that dataset.

You can also find many CSV format datasets on the UCI ML Repository site.

 
Akhilesh Trivedi
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Karthik. I shall look at them.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic