• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

Is this kind of data structured or unstructured/semistructured?

 
Ranch Foreman
Posts: 2339
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I my program can receive JSONs like the below, does it come under structured or unstructured/semistructured?

JSON1


JSON2



JSON3


JSON4



JSON5


Thanks
 
Marshal
Posts: 3348
492
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:If I my program can receive JSONs like the below, does it come under structured or unstructured/semistructured?


What would your program do with the data?
 
Monica Shiralkar
Ranch Foreman
Posts: 2339
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Essentially, it filters it using Spark and sends email for the filtered results i.e some filter condition will be there such as if temperature>60, filter it and send email.
 
Marshal
Posts: 26282
80
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To me those all looked pretty structured. At least, there's a structure containing the data. But then you used the word "semistructured", which suggested you were using the words "structured" and so on with specific technical meanings, rather than as just ordinary English words.

And now you say it's in the Spark context. So I searched the web. And I found this page Spark Unstructured vs semi-structured vs Structured data. It's expressing some opinions which are directly related to your question.

 
Ron McLeod
Marshal
Posts: 3348
492
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote: Essentially, it filters it using Spark and sends email for the filtered results i.e some filter condition will be there such as if temperature>60, filter it and send email.


Then I would say for your application that the data is considered structured, since it sounds like your application knows the data's schema/organization and is able to parse it and recognize the various fields such as temperature.

If however the application had no intimate knowledge of the data and opaquely included it in the notification email, then it could be unstructured data for your application.

I guess if your application only understood a portion of the data, then you might say it is semi-structured data??
 
Ron McLeod
Marshal
Posts: 3348
492
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just read Paul's comments.  It sounds like in the context of Spark, there are some specific meanings for structured vs. semi-structured vs. structured.  My comments are more general.

 
Ranch Hand
Posts: 32
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Think about your data:

  • It is JSON so it has a structure (fields), but the structure may vary between different records.
  • In your example, it looks like there is a common structure, with optional fields in the attributes list, so you could define a schema to describe this.
  • You want to check specific fields, so you are treating it as structured data.
  • If you simply wanted to store the JSON as a CLOB, then for your purposes it would be unstructured data.


  • I really wouldn't get hung up on these abstract "is it A or B?" questions.  Think about what you want to do, then proceed on that basis.
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 2339
    12
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks all.
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 2339
    12
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Christopher Webster wrote:
    I really wouldn't get hung up on these abstract "is it A or B?" questions.  Think about what you want to do, then proceed on that basis.



    Yes. Actually, I took this example because when I started reading on datasets/dataframes and compared with rdd, I read that use the former whereever possible for better performance and use the latter only in case of unstructured data.So, I basically am trying to understand that for what kind of data should one go for rdd. What it looks like is that for this data too one can go datasets/dataframes as it has structure. Then for what kind of case would one use an RDD instead ?
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 2339
    12
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Paul Clapham wrote:To me those all looked pretty structured. At least, there's a structure containing the data. But then you used the word "semistructured", which suggested you were using the words "structured" and so on with specific technical meanings, rather than as just ordinary English words.

    And now you say it's in the Spark context. So I searched the web. And I found this page Spark Unstructured vs semi-structured vs Structured data. It's expressing some opinions which are directly related to your question.



    Thanks. Thats useful for me to try and understand based what I am looking for specifically, i,e "when to use RDD?".
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 2339
    12
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Ron McLeod wrote:

    If however the application had no intimate knowledge of the data and opaquely included it in the notification email, then it could be unstructured data for your application.

    I guess if your application only understood a portion of the data, then you might say it is semi-structured data??



    Thanks.  In general , this makes sense regarding structured, semi structured and unstructured. I that case in the big data world, most of the cases would be falling under category of semi structured.
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 2339
    12
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Paul Clapham wrote: And I found this page Spark Unstructured vs semi-structured vs Structured data. It's expressing some opinions which are directly related to your question.



    Thanks. This means my data is semi structured. For structured data one has to use Datasets/Dataframes. For any data which is not structured, RDD is to be used instead. What I am trying to understand is why exactly is RDD required in this example instead of Dataframes/Datasets.
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 2339
    12
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks all. I read further on this and found that although Dataframes/Datasets from the SQL API are preferred more, for low level APIs, one would still prefer RDD.
     
    With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
    reply
      Bookmark Topic Watch Topic
    • New Topic