abhi k tripathi

Greenhorn
+ Follow
since Nov 21, 2015
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
2
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by abhi k tripathi

Hi Anup,

Try the following simple wordcount program to test your environment -
1. Please start the hadoop services - start-dfs.sh
2. Please start the yarn services - start-yarn.sh
3. Create a input directory
$ hadoop fs -mkdir /user/hadoop/input
-- If it will give you error that directory not found-Create the parent directory first
4. Upload the input file
$ hadoop fs -put $HADOOP_INSTALL/LICENSE.txt /user/hadoop/input/License
5. Use the hadoop example jar available with hadoop installation package to run the code
$ hadoop jar $HADOOP_INSTALL/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/hadoop/input/License /home/hadoop/Output
Note:- Define the input file and output directory based on your system/OS
8 years ago
Hi Jone,

You can follow the below curriculum to learn hadoop:-

1. Start with Introduction -
BigData, Hadoop, HDFS, Yarn, Architectures etc
2. Setup your own local machine cluster in Ubuntu Server or Centos.
You can create the Pseudo mode.
3. Understand Hadoop MapReduce Framework
Different stages of mapreduce, writing mapreduce code, Use cases of mapreduce
4. Advance MapReduce
Combiner and Partitioner, Map side/Reduce side Join, Using Writable and Comparable, etc
5. Pig
Installation of Pig, Learn Pig Latin -Load file, process files, apply queries. Define Pig UDF, Pig APIs, use cases, etc
6. Apache Hive
Installation of Hive, Learn HiveQL - create database, create table, partition table, join, union, group, serde, Hive UDF, use cases etc
7. Data migratory tools
Learn Flume, Learn Sqoop
8. Nosql databases
Basic and Architecture of some popular Nosql databases like- mongodb, hbase, use cases etc
9. Learn one Nosql database
10. Zookeeper
Installation of Zookeeper, Basics of Zookeeper, Zookeeper Data Model, ZNokde Types, Sequential ZNodes, Use cases etc
11. Project
Pickup some random dataset from internet and apply some learning fundamentals and generate some useful outcomes.

8 years ago
Hi Rajesh,

You are right Hbase does have version support in column family. Although I am not sure about the number of version it support.

According to me, having version support is one of the key benefits of Hbase.
In RDBMS, you can maintain a backup of the database for case like failure or roll back. It will consume lot of space and you have to load the whole backup inorder to check the single change in column value.
With HBase, you can simply do it by writing a single code:
For example: -
- to return more than one version, see Get.setMaxVersions()

You can also check the values at given time:
- to return versions other than the latest, see Get.setTimeRange()

You can check the hbase version example here:
http://hbase.apache.org/0.94/book/versions.html

Hbase is typically used in Analytics now days. If you are able to check the value change in the same field which is very important aspect of analytic you can easily do it with Hbase.
If you look for google, you will find the multiple scenarios of the version support.
8 years ago
Hi Rob,

Define the Sqoop_home in .bashrc file and it will resolve the issue.

export SQOOP_HOME=/var/log/sqoop
export PATH=$PATH:$SQOOP_HOME/bin

Please use the following command again:
sqoop import --connect jdbc:mysql://192.168.1.15:3306/world --username root --P --table city --m 1
or
sqoop import --connect jdbc:mysql://localhost/world --username root --P --table city --m 1
8 years ago
If I understand correct. Your program using the "," and "\t" to delimit the fields.
scanner.useDelimiter("\t");

If you see your input the data, there are some data which are not follow the syntax.
1,2 23,17,15 11,9 good
12 11,8 12,7,8 ill
14,12,9 8,6,4 24,18 ill


First you have write the command to filter the data from the input file first. You can use hasNextInt() function to filter the numeric values.

Hope this helpful for you.

You can learn more about mapreduce working. Here are some free tutorials you can look:
https://www.dezyre.com//hadoop-tutorial/hadoop-mapreduce-tutorial
8 years ago
You can't update the Pig values.
The same issue was mention in Pig issue list- https://issues.apache.org/jira/browse/PIG-1693
Pig script language called Pig Latin, used for filtering of the data.
Although hortonworks comes with Project_Range Expression that can help resolve this issue.
Check the link here: http://hortonworks.com/blog/new-apache-pig-0-9-features-part-3-additional-features/

But you can change some specific values check this links:
http://stackoverflow.com/questions/18796778/filter-and-change-a-column-in-pig

To learn more about Pig check this Pig tutorials:
https://www.dezyre.com//hadoop-tutorial/pig-tutorial
8 years ago
Check this link explains how big companies uses big data applications like Hadoop to get competitive edge in the market by effectively using their data.
I am sure you will find it very useful when you research about the practical applications of Hadoop.

https://www.dezyre.com/article/5-big-data-use-cases-how-companies-use-big-data/155
8 years ago