Suppose, a developer is reading a JSON/text/CSV and processing the data and producing result as JSON/text/CSV format. Since the input as well as output would be readable he/she would know that this is the kind of input and this is what I am expecting the output to be and check whether it is correct. Now, suppose the developer is reading parquet instead. Since it is less readable he would not easily know that for this data what is the program expected to produce output as and verify the object. So, I thought may be it is not used as input or output but at intermediate stage.
Tim Moores wrote:Many file formats are not meant to be viewed by humans.
This includes JSON/CSV/XML as well. These are representations for transport and storage, not for interaction with application users. Just because the format may be readable by humans doesn't mean that they are intended to be read by humans.
I can't think of any applications that I use (other than ones specifically for software development) which use JSON as an input or output.
For E.g if a developer is working on a REST API which takes JSON input, he can easily see the input and know the expected output and check whether those are the same. In Parquet still it would be possible to have a looks. Just that it is little less readable than a JSON/XML/CSV/Text file.
As I understand it, Parquet is for storage, not interchange. So the point about comparing inputs and outputs is moot. (I might be wrong about this.)
Ron McLeod wrote:Just because the format may be readable by humans doesn't mean that they are intended to be read by humans.
Quoted for emphasis. Much harm was done by people assuming that XML was meant to be created or maintained by humans. Much more flexible and easy to use alternatives existed, but we had to live with Struts config files and the like for years. :-(
Parquet is faster than Avro/JSON if the processing involves many columns.
For data processing (e.g using Spark), is Parquet to be used for the same kind of use cases as JSON/Avro have been getting used (e.g as input to Spark) or not?