Regarding trying out pig online, one option I can think of is Amazon's
AWS EMR (Elastic Map Reduce). It's a pay-as-you-go web service.
There are public datasets available on their AWS S3 storage service, such as
this one.
If you have never tried pig at all, then start off by running pig locally in a VM on your machine. Just download, extract and run in local mode with "pig -x local". Other than
Java, nothing else is required (it already has hadoop embedded, so you don't even have to install hadoop in this mode).
Under the extracted directory, there's a /tutorial subdirectory with a simple dataset named excite.log. You can learn pig by trying it out on that dataset.
You can also find many CSV format datasets on the
UCI ML Repository site.