• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Rob Spoor
  • Devaka Cooray
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Jj Roberts
  • Al Hobbs
  • Piet Souris

What is the relationship between Hadoop and ETL ?

 
Ranch Hand
Posts: 2543
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Is there any relation between Hadoop and ETL ? If so how are they related.?Thanks
 
Ranch Hand
Posts: 105
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ETL stands for Extract ,Transform and Loading of Data .It is an technique to execute transformation and aggregation on data and load it into a table .Apache Hadoop is an platform for storing and analyzing large amount of data and provides different tool for aggregating data .In addition to that we case different tools like hive ,Spark ,Vertica to do ETL transformation on data .
 
Saloon Keeper
Posts: 7164
165
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Purvi Barot wrote:...a and load it into a table .


... or move it out of the DB (the Extract part).
 
Monica Shiralkar
Ranch Hand
Posts: 2543
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Purvi Barot wrote:ETL stands for Extract ,Transform and Loading of Data .It is an technique to execute transformation and aggregation on data and load it into a table .Apache Hadoop is an platform for storing and analyzing large amount of data and provides different tool for aggregating data .In addition to that we case different tools like hive ,Spark ,Vertica to do ETL transformation on data .


Thanks
Is the work done by hadoop map reduce "ETL on extremely large data sets " in other words ?
 
Tim Moores
Saloon Keeper
Posts: 7164
165
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't think that would be correct to say. Hadoop mainly concerns itself  with the processing and storage of large data sets. That is not what is generally meant by ETL.
 
Saloon Keeper
Posts: 24501
167
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ETL is typically done using external utility programs such as Hitachi/Pentaho PDI or Talend. These are "Swiss Army Knife" utilities and are capable of transferring and/or transforming data sets in a variety of formats. including, but definitely not limited to CSV files, Excel spreadheets, XML files, generated data, database tables, NoSQL servers, remote data servers such as Amazon S3,  email, ftp, web services and much more.

Technically, what ETL tools do is "processing and storage" - and fetching and they're very much tuned for massive data processing, but they don't hold data themselves (they store into things like databases) and they generally don't work well as a generic application framework.

I've used ETL to do things like pull tables once an hour (business hours) from a database, format them as CSV files and then upload them to a remote reporting server via FTP. I worked with a system that pulled gigabyte flat files down from a remote IBM mainframe, converted EBCDIC to ASCII, translated IBM's unique binary number formats to something more Java-friendly, and built Oracle Financials transactions. Generally speaking, anything involving large sets of data that was too awkward for shell scripts and simple utilities like awk and perl but not gnarly enough to demand a custom application program is a potential candidate for me to use ETL tools.
 
Monica Shiralkar
Ranch Hand
Posts: 2543
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks
 
Monica Shiralkar
Ranch Hand
Posts: 2543
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Moores wrote:I don't think that would be correct to say. Hadoop mainly concerns itself  with the processing and storage of large data sets. That is not what is generally meant by ETL.



Thanks. Would my original question in the thread have made sense had I said between ETL and Big Data (instead of Hadoop)?
 
Monica Shiralkar
Ranch Hand
Posts: 2543
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:but not gnarly enough to demand a custom application program is a potential candidate for me to use ETL tools.



Thanks.
 
Tim Moores
Saloon Keeper
Posts: 7164
165
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Would my original question in the thread have made sense had I said between ETL and Big Data (instead of Hadoop)?


Not really. Big Data is about problems arising from the handling of very large data sets, and the applications and approaches needed to handle those, whereas ETL concerns itself with data transfer in and out of DBs and data transformation, irrespective of data size. Certainly Big Data also needs ETL tools, but those two things are largely orthogonal.
 
Monica Shiralkar
Ranch Hand
Posts: 2543
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But for example Pentaho which is an ETL tool is used for Big Data.
 
Tim Moores
Saloon Keeper
Posts: 7164
165
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes. As said, both concepts can work together, but they're still different.
 
Monica Shiralkar
Ranch Hand
Posts: 2543
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ETL stands for extraction, transformation and loading. We will have or get the data from different source systems that is external sources, operational Systems. ETL has some tools name ETL tools. By this ETL tools we can change the data in a particular format which we want. Both Hadoop and ETL is used to move and transform data from many different sources and load it into various targets. complex ETL jobs are deployed and executed in a distributed manner due to programming and scripting frameworks on Hadoop. Hadoop is not an ETL tool but it acts like a helper. And when we want to use or work with an ETL tool we must likely use the Hadoop Map reduce and HDFS(Hadoop Distributed File System) because they are executed in a distributed manner. Hadoop helps the ETL tool to find the information and it do that by using the Map Reduce Technique .Hadoop is a best platform for ETL because it is considered an all purpose staging Area and landing Zone for big data. The ETL process feeds traditional warehouses directly
 
reply
    Bookmark Topic Watch Topic
  • New Topic