Win a copy of Beginning Java 17 Fundamentals: Object-Oriented Programming in Java 17 this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Junilu Lacar
  • Rob Spoor
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Tim Moores
  • Jesse Silverman
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Frits Walraven

DB design for multidimensional data

 
Greenhorn
Posts: 6
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is a need to store RF spectrum data into DB for further processing  
Data is going to be collected from multiple devices (up to 10 devices in total) and represents an WaveForm object.  
Up to ~10 objects will be collected from each device every 5 second  

That object basically represents RF spectrum waveform points (x: frequency, y: power) and some additional data like: time, id of device, device settings  



In general, main need is to store that WaveForm into DB  
 
Later that data to be used for:
- Draw dynamic graphics of each spectrum over time (example of such in picture below)
- Constantly process the data to draw graphics of MAX/MIN values of power on each frequency for specified period of time
- Constantly process data and make some reports/alerts based on it
- In DB to store only last 30 days of data

 
Please see below screenshot with sample of such graphic generated from received data:  


Currently I'm using PostgreSQL  to store data in JSON and retrieve/graph/process it later using Grafana/Plotly.js



But I'm doubt it is the proper way to handle/store such data, as later there is a need to parse that JSON to retrieve the data, not sure how it will impact performance in large scale  
Also, it makes work of retrieving/filtering/process data using SQL harder

Question:
- What do you think the best DB to store such data (for purposes highlighted above)?
- (coming from question above) What is the best way to store/retrieve that data?

I understand that question might be to general, but I'm quite not experienced in this field and any suggestions/thoughts would be welcomed  

Thank you in advance
 
Saloon Keeper
Posts: 24889
174
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think you'll find PostgreSQL to be about as fast as any modern database, so I wouldn't worry about that part.

The only thing I'd do differently is I'd put the wf_data in a flat table of its own for easier access. You'd then link the data header table with rows in the wf_data table (foreign key relationship).

The primary advantage of JSON is that you can have a flexible format of hierarchical data. But if all of the data is in identical rows, you don't need that.
 
Rancher
Posts: 1012
27
Netbeans IDE Oracle MySQL Database Tomcat Server C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dmitry Gangan,

I tend to agree with Tim Holloway, in that, your record is set and the ~10 different sources should be easy to deal with, each with it's own unique ID.

The thing you didn't address is the sample rate: are you sampling VLF, LF, HF, VHF, or UHF? and how many inserts do you need per second?  Can your hardware support the amount of data you want to throw at it per second and can your Db handle it also--even when the buffers are full (that's both hardware and software buffers)?  I deal in large data, moderately large, billions of records and terabytes of data and I assure you: the dynamics of the system change drastically when the software and hardware buffers are full.

Depending on your answer about the buffers: you may consider some in memory buffering or even parallel writes to separate physical devices.

Les

 
Dmitry Gangan
Greenhorn
Posts: 6
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:I think you'll find PostgreSQL to be about as fast as any modern database, so I wouldn't worry about that part.

The only thing I'd do differently is I'd put the wf_data in a flat table of its own for easier access. You'd then link the data header table with rows in the wf_data table (foreign key relationship).

The primary advantage of JSON is that you can have a flexible format of hierarchical data. But if all of the data is in identical rows, you don't need that.



Thank you for answer
Having it in a flat table would give more possibilities to process that data using postgres stored procedures (plv8), for instance I would need to draw graphic of minimal signal level/maximal signal level for selected time range
Currently it's all handled to javascript/Plotly.js - with big timerange it's causing quite big load to the browser and it would be much better to process it on the server side
Example of such graphic:


I was thinking to put it in a flat table for easy access, but the problem is that the frequencies may always change for the same "client", so it cannot be "hardcoded" to the table column names
Meaning that for instance we have user carrier today at frequencies "1.00 Ghz - 1.05 Ghz" and next day they can move it to "1.10 - 1.15 Ghz" etc

Now we have following JSON saved in wf_data:
wafeform is consist of x,y points for the graph: frequency and power - as frequency is always may change, I'm not sure there is a good way to have it in flat manner...




But, we know that always we have 320 points returned by device (waveform data), so we can have 320 columns (e.g, p1, p2, p3 ... p320), although each column in a row should store an array: [frequency, power]...
Actually, while answering to you post I've just got to one possible option 🙂:
 When data is received from spectrum analyzer: we are just getting : Central Frequency, Span (RF bandwidth) and 320 values of power level for each point (no frequency of each point is indicated)
 After that, my application is calculation frequency for each point and forms waveform object/JSON:


 So theoretically, I can just write points to the DB without calculating frequency for each point and handle the calculation it to JavaScript/Plotly.js that will be drawing the graphs/etc
Meaning that in DB wf_data table will have following columns: id, spectrumId, time, wfSettingsId, p1, p2, p3 ... p320

Les Morgan wrote:Dmitry Gangan,


I tend to agree with Tim Holloway, in that, your record is set and the ~10 different sources should be easy to deal with, each with it's own unique ID.

The thing you didn't address is the sample rate: are you sampling VLF, LF, HF, VHF, or UHF? and how many inserts do you need per second?  Can your hardware support the amount of data you want to throw at it per second and can your Db handle it also--even when the buffers are full (that's both hardware and software buffers)?  I deal in large data, moderately large, billions of records and terabytes of data and I assure you: the dynamics of the system change drastically when the software and hardware buffers are full.

Depending on your answer about the buffers: you may consider some in memory buffering or even parallel writes to separate physical devices.

Les


Thank you for answer
I'm sampling spectrum device every 5 seconds (maybe I will decrease this interval later) - each sample will return 320 points of frequency/power
Each device has 6 physical ports to sample, also on each port we may sample multiple carriers - so I suppose in average we will have ~10 sources per device
It's sampling UFH (L-band), generally we are sampling satellite frequencies (mostly Ku/Ka after down-converting)
I suppose we may have up to 10 devices, having that on each we will sample up to 10 carriers - it would mean 100 inserts per 5 second (each insert has ~20KB of data)
Currently I'm planning to use Prepared Statement batch inserts per device to write all samples from one devise at once, so thinking that at this scale single Postgres DB server can handle it...


If someone would be interested, here is the link for github repository for this tool: https://github.com/dgangan/java-avcom-spectrum
 
Tim Holloway
Saloon Keeper
Posts: 24889
174
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If I haven't missed anything, you're talking about 20 20K samples per second being logged and I think that should be well within the range of PostgreSQL to handle directly. Failing that, another option would be to log to simple text and do the actual INSERTs offline. The psql utility should be able to handle that easily. My preference for that sort of stuff is the traditional Unix-style bar-separated values (like so: "13255221|0.0074|21.32"). Other popular options include tab separators, or, of course CSV - although that's slightly more work to produce.

If text capture isn't fast enough, the next level down would be raw binary capture. And if that's not fast enough, you have to start looking at the insides of the OS and/or hardware. But I don't think you're going to need to get that desperate.

Rather than keep a separate table for every frequency range, I'd just have one table for all and use selectors to pull the range (SQL BETWEEN operator is ideal for that). You can also create Views that assign a "table name" to a range.

If you're talking truly massive amounts of data, you can slice the data ranges into separate files and even separate distributed servers. But that's just icing on the cake.
 
Dmitry Gangan
Greenhorn
Posts: 6
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Indeed, it may be a good idea to store values as text in one column separated by "|" or CSV-linke ","
Suppose, that operation of splitting it in JavaScript is quite cheap and it will also prevent for having >320 columns in DB and as I see PostgreSQL has function to split text as well:
https://www.postgresql.org/docs/9.3/functions-string.html

Other option I was considering, is to store JSON in JSONB format, which will also allow to use JSON/JSONB functions to process the data: https://www.postgresql.org/docs/9.5/functions-json.html

Thank you for great advises, will make something out of it and will update the forum with results
 
Marshal
Posts: 74775
336
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please don't quote the whole of the previous post, which adds nothing to the discussion and is liable to removal.
 
WHAT is your favorite color? Blue, no yellow, ahhhhhhh! Tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic