Win a copy of High Performance Python for Data Analytics this week in the Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

Is Parquet used only as an intermediate format as it looks more machine readable than human readable

 
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I read a file which was in JSON format into a Spark data frame and saved that as parquet file so that I can view how it looks like. Below is the JSON file and its parquet equivalent:

The JSON file:

people.json



The Parquet file:

Inside people.parquet folder :




Although it is readable, Parquet file looks more like machine readable file and less like human readable. Is it used only as an intermediate format and not for starting or end stage?

thanks
 
Saloon Keeper
Posts: 6803
162
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's a binary file format, and thus not human readable (not that text file formats always are, but that's a diffent topic). Many file formats are not meant to be viewed by humans.
 
Monica Shiralkar
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Moores wrote:It's a binary file format, and thus not human readable (not that text file formats always are, but that's a diffent topic). Many file formats are not meant to be viewed by humans.



Thanks. So is it only for intermediate stage instead of initial or final one?
 
Tim Moores
Saloon Keeper
Posts: 6803
162
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Not sure what you're asking. It's what the app uses, and it's not meant for human consumption. That's really all that matters.
 
Monica Shiralkar
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Suppose, a developer is reading a JSON/text/CSV and processing the data and producing result as JSON/text/CSV format. Since the input as well as output would be readable he/she would know that this is the kind of input and this is what I am expecting the output to be and check whether it is correct. Now, suppose the developer is reading parquet instead. Since it is less readable he would not easily know that for this data what is the program expected to produce output as and verify the object. So, I thought may be it is not used as input or output but at intermediate stage.
 
Marshal
Posts: 3355
492
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Moores wrote:Many file formats are not meant to be viewed by humans.


This includes JSON/CSV/XML as well.  These are representations for transport and storage, not for interaction with application users.  Just because the format may be readable by humans doesn't mean that they are intended to be read by humans.

I can't think of any applications that I use (other than ones specifically for software development) which use JSON as an input or output.
 
Monica Shiralkar
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For E.g if a developer is working on a REST API which takes JSON input, he can easily see the input and know the expected output and check whether those are the same. In Parquet still it would be possible to have a looks. Just that it is little less readable than a JSON/XML/CSV/Text file.
 
Tim Moores
Saloon Keeper
Posts: 6803
162
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As I understand it, Parquet is for storage, not interchange. So the point about comparing inputs and outputs is moot. (I might be wrong about this.)

Ron McLeod wrote:Just because the format may be readable by humans doesn't mean that they are intended to be read by humans.


Quoted for emphasis. Much harm was done by people assuming that XML was meant to be created or maintained by humans. Much more flexible and easy to use alternatives existed, but we had to live with Struts config files and the like for years. :-(
 
Monica Shiralkar
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Moores wrote:As I understand it, Parquet is for storage, not interchange.



Does that mean not for reading as input to a program?  What does it mean that parquet is for storage. I mean storage of data is normally done on a filesystem (or a database).

I read further on parquet that it is faster than avro if we are dealing with lot of columns.
 
Tim Moores
Saloon Keeper
Posts: 6803
162
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Of course it's for reading, but by software, not by humans. Databases store their data in files.
 
Monica Shiralkar
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Parquet is faster than Avro/JSON if the processing involves many columns.
For data processing (e.g using Spark), is Parquet to be used for the same kind of use cases as JSON/Avro have been getting used  (e.g as input to Spark) or not?
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic