• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Big Data Entry Level Job

 
Kyle Jones
Ranch Hand
Posts: 44
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys,

Just been offered a big data entry level job starting on Monday.

Job spec listed Java, Linux, Spring (Core + Integration), Hadoop, MapReduce & PIG as technologies used.

I am pretty new to a lot of this and looking to hit the ground running as its a 4 month contract with possibilty of extension.

Any ideas what areas I should be focussing on? I have Java experience but the others just a little

 
K. Tsang
Bartender
Posts: 3508
16
Android Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
4 months? What's that going to give you in terms of learning those technologies? Each of them in itself need a learning curve if not played/read about before.

I reckon the first thing should be Hadoop. Installing and configuring Hadoop is probably the first thing you will need to know.

Also if you are not familiar with big data, I suggest you do some research on this too.

Once you have, you should ask, why Hadoop and not XYZ to do big data?
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't try to install Hadoop and its various components (Pig, Hive etc) individually on your own, as it's a real configuration nightmare that will waste days/weeks. Instead, install one of the pre-packaged virtual machines from Hortonworks or Cloudera.

For example, I've been using the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. It's a lot easier than installing all these components by hand, and it's a great resource for learning about Hadoop, even if you expect to use a different Hadoop distribution for your project, as Pig/Hive etc are all pretty much the same across the different Hadoop distributions.
So:

  • Install VMWare Player or Virtualbox.
  • Download and install the Hortonworks Sandbox VM.
  • Work through the Hortonworks tutorials on Hadoop, HDFS, Pig, Hive etc


  • You should be able to learn enough from this to get started in your new job pretty quickly.

    Good luck!
     
    Kyle Jones
    Ranch Hand
    Posts: 44
    1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    K. Tsang wrote:4 months? What's that going to give you in terms of learning those technologies? Each of them in itself need a learning curve if not played/read about before.

    I reckon the first thing should be Hadoop. Installing and configuring Hadoop is probably the first thing you will need to know.

    Also if you are not familiar with big data, I suggest you do some research on this too.

    Once you have, you should ask, why Hadoop and not XYZ to do big data?


    Its 4 months and as far as I know it can be extended as long as they are happy with me.

    I am familiar with big data concepts and have been studying how hadoop works, with HDFS & MapReduce, but havent actually used them properly yet.

    I assume training will be provided in this.
     
    Kyle Jones
    Ranch Hand
    Posts: 44
    1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    chris webster wrote:Don't try to install Hadoop and its various components (Pig, Hive etc) individually on your own, as it's a real configuration nightmare that will waste days/weeks. Instead, install one of the pre-packaged virtual machines from Hortonworks or Cloudera.

    For example, I've been using the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. It's a lot easier than installing all these components by hand, and it's a great resource for learning about Hadoop, even if you expect to use a different Hadoop distribution for your project, as Pig/Hive etc are all pretty much the same across the different Hadoop distributions.
    So:

  • Install VMWare Player or Virtualbox.
  • Download and install the Hortonworks Sandbox VM.
  • Work through the Hortonworks tutorials on Hadoop, HDFS, Pig, Hive etc


  • You should be able to learn enough from this to get started in your new job pretty quickly.

    Good luck!


    Exactly what I was looking for!

    Cheers
     
    Kyle Jones
    Ranch Hand
    Posts: 44
    1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Been trying to install Hortonworks Sandbox on my 32bit macbook but the requirements say 64 bit only..

    Anyone know if there is a 32 bit alternative option available??

    I've actually set it up in Virtual Box but when you start it it doesnt fully finish loading so I am assuming this is the 32 bit issue
     
    chris webster
    Bartender
    Posts: 2407
    33
    Linux Oracle Postgres Database Python Scala
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I think you might be out of luck there, as the download instructions seem to specify a 64-bit host operating system is a requirement. Cloudera's Quickstart also requires a 64-bit host.

    Could you sign up for Amazon Web Services and set yourself up with a 64-bit VM there? Then maybe you can install VirtualBox/VMWare and Hortonworks inside your 64-bit VM at Amazon.
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic