• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Jj Roberts
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Himai Minh
  • Carey Brown
  • salvin francis

What is the relationship between Hadoop and ETL ?

 
Ranch Foreman
Posts: 2060
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there any relation between Hadoop and ETL ? If so how are they related.?Thanks
 
Ranch Hand
Posts: 88
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ETL stands for Extract ,Transform and Loading of Data .It is an technique to execute transformation and aggregation on data and load it into a table .Apache Hadoop is an platform for storing and analyzing large amount of data and provides different tool for aggregating data .In addition to that we case different tools like hive ,Spark ,Vertica to do ETL transformation on data .
 
Saloon Keeper
Posts: 6709
161
Android Mac OS X Firefox Browser VI Editor Tomcat Server Safari
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Purvi Barot wrote:...a and load it into a table .


... or move it out of the DB (the Extract part).
 
Monica Shiralkar
Ranch Foreman
Posts: 2060
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Purvi Barot wrote:ETL stands for Extract ,Transform and Loading of Data .It is an technique to execute transformation and aggregation on data and load it into a table .Apache Hadoop is an platform for storing and analyzing large amount of data and provides different tool for aggregating data .In addition to that we case different tools like hive ,Spark ,Vertica to do ETL transformation on data .


Thanks
Is the work done by hadoop map reduce "ETL on extremely large data sets " in other words ?
 
Tim Moores
Saloon Keeper
Posts: 6709
161
Android Mac OS X Firefox Browser VI Editor Tomcat Server Safari
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think that would be correct to say. Hadoop mainly concerns itself  with the processing and storage of large data sets. That is not what is generally meant by ETL.
 
Saloon Keeper
Posts: 22784
153
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ETL is typically done using external utility programs such as Hitachi/Pentaho PDI or Talend. These are "Swiss Army Knife" utilities and are capable of transferring and/or transforming data sets in a variety of formats. including, but definitely not limited to CSV files, Excel spreadheets, XML files, generated data, database tables, NoSQL servers, remote data servers such as Amazon S3,  email, ftp, web services and much more.

Technically, what ETL tools do is "processing and storage" - and fetching and they're very much tuned for massive data processing, but they don't hold data themselves (they store into things like databases) and they generally don't work well as a generic application framework.

I've used ETL to do things like pull tables once an hour (business hours) from a database, format them as CSV files and then upload them to a remote reporting server via FTP. I worked with a system that pulled gigabyte flat files down from a remote IBM mainframe, converted EBCDIC to ASCII, translated IBM's unique binary number formats to something more Java-friendly, and built Oracle Financials transactions. Generally speaking, anything involving large sets of data that was too awkward for shell scripts and simple utilities like awk and perl but not gnarly enough to demand a custom application program is a potential candidate for me to use ETL tools.
 
Monica Shiralkar
Ranch Foreman
Posts: 2060
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks
 
Monica Shiralkar
Ranch Foreman
Posts: 2060
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Moores wrote:I don't think that would be correct to say. Hadoop mainly concerns itself  with the processing and storage of large data sets. That is not what is generally meant by ETL.



Thanks. Would my original question in the thread have made sense had I said between ETL and Big Data (instead of Hadoop)?
 
Monica Shiralkar
Ranch Foreman
Posts: 2060
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:but not gnarly enough to demand a custom application program is a potential candidate for me to use ETL tools.



Thanks.
 
Tim Moores
Saloon Keeper
Posts: 6709
161
Android Mac OS X Firefox Browser VI Editor Tomcat Server Safari
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Would my original question in the thread have made sense had I said between ETL and Big Data (instead of Hadoop)?


Not really. Big Data is about problems arising from the handling of very large data sets, and the applications and approaches needed to handle those, whereas ETL concerns itself with data transfer in and out of DBs and data transformation, irrespective of data size. Certainly Big Data also needs ETL tools, but those two things are largely orthogonal.
 
Monica Shiralkar
Ranch Foreman
Posts: 2060
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But for example Pentaho which is an ETL tool is used for Big Data.
 
Tim Moores
Saloon Keeper
Posts: 6709
161
Android Mac OS X Firefox Browser VI Editor Tomcat Server Safari
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes. As said, both concepts can work together, but they're still different.
 
Monica Shiralkar
Ranch Foreman
Posts: 2060
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks
 
You can't have everything. Where would you put it?
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic