• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

error while copying a file from local file system to hdfs

 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i just tried my first line of code from the book hadoop defenative guide


and i am getting the following error.



help.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
HDFS URL should be same as "fs.defaultFS" in core-site.xml. In your previous post, you had specified "hdfs://localhost:9000", I guess? If so, the port here too should be 9000, otherwise it defaults to 8020.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
core-site.xml =>

 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
its already at 9000
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes I know. I was trying to point out that the URL in your comment line isn't.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i ran jps and now the response i am getting is



no datanode entry, JobHistoryServer instead it
could this be the problem?
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One thing at a time. Did you rerun the command with port 9000 in URL? What output does it give when you do?
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
akshay naidu wrote:core-site.xml =>



do i need to make any changes here?
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It would be good if you pay more attention to replies. Otherwise, these one-liner back and forth discussions will become never-ending and tiring.

You started with the command:


You got this error:


I replied "Did you rerun the command with port 9000 in URL? What output does it give when you do?".

Is it not clear what you should try next?
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
very sorry, i thought you asked me to rerun with port 9000 in core-site.xml.

i did rerun it with port 9000 in command line but getting the same

 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Type this command below, and reply back with its output:

It's to verify whether port 9000 is opened.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
here it is
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Looks like hadoop daemons haven't been started right now (no "java" in the rightmost column).
Can you start all of them, wait for a minute or two, rerun "netstat" command and get back with the new output?
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i waited for moore than 2mins
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OS seems to be setup with IPv6 as default.

While it's possible to disable IPv6 systemwide, I'm not sure how it'll affect other programs.
So what you can do instead is tell only hadoop daemons to prefer IPv4 stack.

Stop all the hadoop daemons and open your ~/.bashrc file in a text editor. It should already be containing a 'export HADOOP_OPTS' line like this:


Insert a '-Djava.net.preferIPv4Stack=true ' before the existing value, so it ends up like this:


Save all changes, log out, log back in, and restart all the daemons.
Wait for sometime and rerun same netstat command.
Instead of all the "tcp6 .... java" entries, you should see "tcp .... java entries" and 127.0.0.1:9000 should be one among them.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


after putting

again after start-dfs.sh and start-yarn.sh i ran jp and i got this

 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What is the full contents of ~/.bashrc now? And is this the same .bashrc you edited while following that tutorial?
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
bashrc file. and yes this is the file i've editing for tutorials and for book

 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Where's the "-Djava.net.preferIPv4Stack=true" inserted here? I don't see it.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i did added "-Djava.net.preferIPv4Stack=true" last time too but it wasnt saved i guess..

again i changed and saved it and the following are responses
bashrc=>



sudo netstat -antp=>

 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All the java PIDs are still same, which means you haven't shutdown and restarted hadoop services after that change.

I feel you are not verifying things thoroughly after doing some change. First the file was not changed at all. Now file is changed, but it has no effect because services were not restarted.
This indicates you are doing these changes without understanding why, and without verifying things after the change.
Use 'echo' to verify that environment variable values have actually changed in memory. If they haven't, log out and log back in, and then reverify.
When you edit a file and exit editor, always follow up with a 'cat' immediately and verify that the file is in fact changed.
These is basic stuff.

Attention to detail is very important in any engineering field, including software engg. Please make an attempt to understand why you are doing something, and thoroughly verify that expected changes have actually taken place.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i did restarted the hadoop services

did it again
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you reply with output of 'echo $HADOOP_OPTS'? It should contain '-Djava.net.preferIPv4Stack=true'.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
preferIPv4Stack usually works. I'm not really sure why it isn't on your machine.
Do a couple of checks:
1) Run a 'ps -ef | grep java' command after starting all hadoop daemons. Its output should contain a '-Djava.net.preferIPv4Stack=true' for each of the java processes.
Reply back with its output.

2) Run 'cat /proc/sys/net/ipv6/bindv6only'. It should show either 0 or 1. Ideally, it should be 0 (for false), but if it's 1 (true), it's probably what's causing this and needs to be set to 0.

3) Run 'cat /etc/hosts' and reply back with its output. This is to check which IP addresses 'localhost' is mapped to.
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


just reinstalled hadoop and now
port 9000 is there. now what.
should i do those checks
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
the java connectExecption error is gone..i guess reinstalling helped. but now =>



i dont know what is this file 'quangle.txt' and i have no idea about the addresses too.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How did simply reinstalling Hadoop solve an IPv6/IPv4 problem? I doubt that's the whole story. Was the earlier installed hadoop an old version? I assumed it was 2.7.0 from the directory name, but doesn't look like it now.

i dont know what is this file 'quangle.txt'

Well, you started this discussion by trying to copy this file - whatever it is - to HDFS. If you don't know what that file is or why you are copying it, how do you expect any of us to know?

and i have no idea about the addresses too.

Which addresses?
 
akshay naidu
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Karthik Shiraly wrote:How did simply reinstalling Hadoop solve an IPv6/IPv4 problem? I doubt that's the whole story. Was the earlier installed hadoop an old version? I assumed it was 2.7.0 from the directory name, but doesn't look like it now.

i have no idea, yes i did reinstalled the same version wiz 2.7.0. maybe i missed some steps in previous installation

Well, you started this discussion by trying to copy this file - whatever it is - to HDFS. If you don't know what that file is or why you are copying it, how do you expect any of us to know?

i am referring hadoop definative guide, i was assuming that this 'quangle.txt' is some example text that comes with hadoop download, i dont know about it so i am trying with NCDC data of year 1901
so now =>



here 1901.txt is in /media/akshay/MyMedia/HADOOP/1901.txt

but i am not sure copy to hdfs means where exactly i have to copy it, in the code above i am trying to copy it in $HADOOP_HDFS_HOME which is @



i also tried this but 'no such directory' shows up =>
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android C++ Java Linux PHP Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Data files are usually copied into HDFS subdirectories that follow the naming convention: /user/YOUR-USERNAME/PROJECT-NAME/
For example, your username is 'hdpuser' and if your current hadoop project is 'climate-analysis', copy the file to: /user/hdpuser/climate-analysis/1901.txt
Since PROJECT-NAME subdirectories don't exist, first create the subdirectory( or subdirectories), and then copy the file:

Your file will now be stored in /user/hdpuser/climate-analysis/1901.txt

HDFS file paths behave a lot like like regular file paths.
If the full HDFS URL is not specified, it'll just default to whatever is specified in etc/hadoop/core-site.xml (hdfs://localhost:9000 in your case).
If an absolute path is specified, it creates it at the absolute path. For example, this creates /myfiles/mydata under root path because of the presence of first slash:

If a relative path is specified, it creates it under /user/YOUR-USERNAME/. For example, this creates myfiles2/mydata2 under your /user/hdpuser/ because it's a relative path:


You can browse the HDFS filesystem from your browser. Just open http://localhost:50070/explorer.html to see all directories and files stored in HDFS.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic