• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
  • Bear Bibeault
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Piet Souris
  • salvin francis
  • Stephan van Hulst
Bartenders:
  • Frits Walraven
  • Carey Brown
  • Jj Roberts

Unlike Spark RDD, are Spark dataframes used in cases with lesser data too?

 
Ranch Foreman
Posts: 2085
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Spark RDD is used for processing of extremely large datasets using cluster computing system. Are dataframes too used for large datasets only are also used for lesser data too? e.g some configuration to be read from a table in SQL Server database?
thanks.
 
Ranch Hand
Posts: 31
3
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The difference between RDDs and DataFrames has nothing to do with the volume of data, but with what kind of data it is, and what you want to do with it.  You can use RDDs or Spark DataFrames to process a single record from a single file, or massive data-sources containing gigabytes of data.  You can run Spark on your laptop, in a cloud environment (Azure, AWS, Google Cloud Platform etc), or on a huge on-premises Hadoop cluster, and so on. It depends on what you want to do.  

If you are working with structured data that can be represented as a table, then you would probably use DataFrames and the Spark SQL APIs.

You can read data into DataFrames from pretty much any source that Spark can read, and you can write data via DataFrames to any sink that Spark can write to.

Spark data sources

This includes JDBC sources i.e. SQL databases.
 
Monica Shiralkar
Ranch Foreman
Posts: 2085
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. The reason I got this doubt, is because Dataframes are used for structured data and structured data is more related to relational databases (which process structured data only) than it is to NoSql databases (which deal with both structured and unstructured data).  From what I understand, between RDD and Dataframes, the latter is used with relational databases too which is associated with limited data whereas for RDD using it with relational DB with limited data from relational DB must be pretty rare.
 
What I don't understand is how they changed the earth's orbit to fit the metric calendar. Tiny ad:
the value of filler advertising in 2020
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic