• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Knute Snortum
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Ganesh Patekar
  • Stephan van Hulst
  • Pete Letkeman
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Ron McLeod
  • Vijitha Kumara

Big Data Entry Level Job  RSS feed

 
Ranch Hand
Posts: 46
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys,

Just been offered a big data entry level job starting on Monday.

Job spec listed Java, Linux, Spring (Core + Integration), Hadoop, MapReduce & PIG as technologies used.

I am pretty new to a lot of this and looking to hit the ground running as its a 4 month contract with possibilty of extension.

Any ideas what areas I should be focussing on? I have Java experience but the others just a little

 
Bartender
Posts: 3648
16
Firefox Browser Java Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
4 months? What's that going to give you in terms of learning those technologies? Each of them in itself need a learning curve if not played/read about before.

I reckon the first thing should be Hadoop. Installing and configuring Hadoop is probably the first thing you will need to know.

Also if you are not familiar with big data, I suggest you do some research on this too.

Once you have, you should ask, why Hadoop and not XYZ to do big data?
 
Bartender
Posts: 2407
36
Linux Oracle Postgres Database Python Scala
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't try to install Hadoop and its various components (Pig, Hive etc) individually on your own, as it's a real configuration nightmare that will waste days/weeks. Instead, install one of the pre-packaged virtual machines from Hortonworks or Cloudera.

For example, I've been using the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. It's a lot easier than installing all these components by hand, and it's a great resource for learning about Hadoop, even if you expect to use a different Hadoop distribution for your project, as Pig/Hive etc are all pretty much the same across the different Hadoop distributions.
So:

  • Install VMWare Player or Virtualbox.
  • Download and install the Hortonworks Sandbox VM.
  • Work through the Hortonworks tutorials on Hadoop, HDFS, Pig, Hive etc


  • You should be able to learn enough from this to get started in your new job pretty quickly.

    Good luck!
     
    Kyle Jones
    Ranch Hand
    Posts: 46
    1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    K. Tsang wrote:4 months? What's that going to give you in terms of learning those technologies? Each of them in itself need a learning curve if not played/read about before.

    I reckon the first thing should be Hadoop. Installing and configuring Hadoop is probably the first thing you will need to know.

    Also if you are not familiar with big data, I suggest you do some research on this too.

    Once you have, you should ask, why Hadoop and not XYZ to do big data?



    Its 4 months and as far as I know it can be extended as long as they are happy with me.

    I am familiar with big data concepts and have been studying how hadoop works, with HDFS & MapReduce, but havent actually used them properly yet.

    I assume training will be provided in this.
     
    Kyle Jones
    Ranch Hand
    Posts: 46
    1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    chris webster wrote:Don't try to install Hadoop and its various components (Pig, Hive etc) individually on your own, as it's a real configuration nightmare that will waste days/weeks. Instead, install one of the pre-packaged virtual machines from Hortonworks or Cloudera.

    For example, I've been using the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. It's a lot easier than installing all these components by hand, and it's a great resource for learning about Hadoop, even if you expect to use a different Hadoop distribution for your project, as Pig/Hive etc are all pretty much the same across the different Hadoop distributions.
    So:

  • Install VMWare Player or Virtualbox.
  • Download and install the Hortonworks Sandbox VM.
  • Work through the Hortonworks tutorials on Hadoop, HDFS, Pig, Hive etc


  • You should be able to learn enough from this to get started in your new job pretty quickly.

    Good luck!



    Exactly what I was looking for!

    Cheers
     
    Kyle Jones
    Ranch Hand
    Posts: 46
    1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Been trying to install Hortonworks Sandbox on my 32bit macbook but the requirements say 64 bit only..

    Anyone know if there is a 32 bit alternative option available??

    I've actually set it up in Virtual Box but when you start it it doesnt fully finish loading so I am assuming this is the 32 bit issue
     
    chris webster
    Bartender
    Posts: 2407
    36
    Linux Oracle Postgres Database Python Scala
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I think you might be out of luck there, as the download instructions seem to specify a 64-bit host operating system is a requirement. Cloudera's Quickstart also requires a 64-bit host.

    Could you sign up for Amazon Web Services and set yourself up with a 64-bit VM there? Then maybe you can install VirtualBox/VMWare and Hortonworks inside your 64-bit VM at Amazon.
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!