• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to update records in PIG

 
perhir hi
Greenhorn
Posts: 1
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to update/delete some records in pig, I want to know how to achieve that in pig.

ID Name
1 A
2 B
3 C
4 D
5 E
I want to update value of ID = 3 and delete record with ID =5 so that my expected table will have records like :

ID NAME
1 A
2 B
3 Z
4 D
How to achieve the above result?
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Assuming your data is in files on HDFS, my understanding is that you cannot really do arbitrary in-place updates like you would with SQL in a relational database. You would probably need to read the data and modify the relevant records before writing it all back to HDFS. If you're using Hive or HBase to store your data, then maybe there are other options available, but in-place updates are not really what Hadoop is intended for.
 
ratnesh singh
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By using following steps we can update our records in PIG(hadoop).

1.start by selecting the File Browser from the top tool bar. The File Browser allows us to view the Hortonworks Data Platform(HDP) file store.This is separate from the local file system. In a Hadoop cluster this would be your view of the Hadoop Data File System(HDFS).Click on the Upload button to select the files we want to upload into the Hortonworks Sandbox environment.When you click on the Upload a file button you will get a dialog box. Navigate to where you stored the Batting.csv file on your local disk and select Batting.csv. Do the same thing for Master.csv. When you are done you will see there are two files in your directory.Now that we have our data files we can start writing our Pig script.

2.Pig user interface in our browser window. On the left is a list of the saved scripts. On the right is the composition area where we will be writing our script. Below the composition area are buttons to Save, Execute, Explain and perform a Syntax check of the current script. At the very bottom are status boxes where we will see logs, error message and the output of our script.

3.To get started fill in a name for your script. You can not save it until we add our first line of code. The first thing we need to do is load the data. We use the load statement for this. The PigStorage function is what does the loading and we pass it a comma as the data delimiter.code is: batting = load 'Batting.csv' using PigStorage(',');

4. next thing we want to do is name the fields. We will use a FOREACH statement to iterate through the batting data object.FOREACH statement will iterate through the batting data object and GENERATE pulls out selected fields and assigns them names. The new data object we are creating is then named runs. Our code will now be:
runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;


5.The next line of code is a group statement that groups the elements in runs by the year field. So the grp_data object will then be indexed by year. In the next statement as we iterate through grp_data we will go through year by year. Type in the code:
grp_data = GROUP runs by (year);

6.next FOREACH statement we are going to find the maximum runs for each year.max_runs = FOREACH grp_data GENERATE group as grp,MAX(runs.runs) as max_runs;

7.Now that we have the maximum runs we need to join this with the runs data object so we can pick up the player id. The result will be a dataset with Year, PlayerID and Max Run. At the end we dump the data to the output.
join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
dump join_data;

these are the basic steps for updating your records in PIG easily.if for further installation in hadoop and know more about concepts then we should work and learn full coding part by reffer the following link
http://alturl.com/soqps.
 
abhi k tripathi
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can't update the Pig values.
The same issue was mention in Pig issue list- https://issues.apache.org/jira/browse/PIG-1693
Pig script language called Pig Latin, used for filtering of the data.
Although hortonworks comes with Project_Range Expression that can help resolve this issue.
Check the link here: http://hortonworks.com/blog/new-apache-pig-0-9-features-part-3-additional-features/

But you can change some specific values check this links:
http://stackoverflow.com/questions/18796778/filter-and-change-a-column-in-pig

To learn more about Pig check this Pig tutorials:
https://www.dezyre.com//hadoop-tutorial/pig-tutorial
 
ruchika sharma
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pig is not a database in which you can update/delete data. It basically reads data from HDFS/local file system and does operations on it. So you should not try updating/deleting that data.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic