This week's book giveaway is in the Python forum.
We're giving away four copies of High Performance Python for Data Analytics and have Tiago Rodrigues Antao on-line!
See this thread for details.
Win a copy of High Performance Python for Data Analytics this week in the Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

When should we use Dataframes instead of Spark SQL and vice versa

 
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Spark has RDD(for unstructured data) , and Dataframes and Datasets for structured data. When should we use dataframes instead of Spark SQL and vice versa? Thanks
 
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is no performance difference whatsoever. Both methods use exactly the same execution engine and internal data structures. At the end of the day, all boils down to personal preferences.

  • DataFrame queries are much easier to construct programmatically and provide a minimal type safety.


  • Plain SQL queries can be significantly more concise and easier to understand. They are also portable and can be used without any modifications with every supported language. With HiveContext, these can also be used to expose some functionalities which can be inaccessible in other ways (for example UDF without Spark wrappers).
  •  
    Don't get me started about those stupid light bulbs.
    reply
      Bookmark Topic Watch Topic
    • New Topic