Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Schema question for storing spreadsheet data

 
Rj Ewing
Ranch Hand
Posts: 93
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What would be the best way to store a spreadsheet in a db? If the columns will be fairly static, is it appropriate to just use a table to mimic the spreadsheet with a db column for each spreadsheet column? The can be many different datasets in this table, and some will be overwritten at times. This schema seems a bit strange to me, but I am not sure of a better schema. Thoughts?
 
Paul Clapham
Sheriff
Posts: 21581
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"Best" of course is going to depend on whether your spreadsheets have some internal structure, and on how many features of your spreadsheet software you've taken advantage of. For the general case you could start with three columns, namely "RowNumber", "ColumnNumber", and "Data".

But that's excessively simple, since "Data" is much more complex than that. You might also want "DataType" and you might want columns for the cell formatting -- colour, font information, alignment, borders, fill effects, you could go on for quite a while. And then there's information which doesn't apply to cells, like row heights and column widths, and whether rows and columns are hidden. And then groups of cells can be merged. And... you get the picture, but it depends on whether you care about any of that sort of thing. So as I said, it depends on what information you're actually using from the spreadsheet.
 
Rj Ewing
Ranch Hand
Posts: 93
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The spreadsheet data is very simple. Just text and numbers. We have validation profiles which state the given columns datatype, etc. The only thing I need to store in the db are the column values, which I know of the datatype in advanced. So I'm not so sure it would be necessary to keep track of row and cell numbers. Each row in the spreadsheet is essentially metadata about a biological sample. The spreadsheet data will have been validated (ensuring correct datatypes for each column, etc) before I load to the db.

Maybe loading the spreadsheet into a RDBMS isn't the best option and using something like apache metamodel would be more appropriate?
 
Paul Clapham
Sheriff
Posts: 21581
33
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let's look at the other side then. Why do you want to store that data in a database, or in some other external form? Why not just leave it in the spreadsheets?

If you stored it in a DB table then conceivably other applications could read the data from that table, which is a great deal easier than extracting it from the spreadsheets. So if that's one of your goals then perhaps you should drive the design based on the needs of those other applications.
 
Rj Ewing
Ranch Hand
Posts: 93
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, the uploaded datasets will be queryable. Majority of the time, the data is just regurgitated, but there are times when the user will query based on one or multiple column values, returning the entire row(s) that.

Other times, users will want to download the entire dataset as csv or excel files.

We currently have a fuseki triple store backend, but another client is requesting for their instance to mysql.

Thanks for helping me think through this.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Detail is important. Databases store data, not spreadsheets. Unless the spreadsheet itself is the datum to be stored.

For example: If the spreadsheet was a receipt-form to be stored in the Order table alongside the rest of the order, it would be a BLOB. The database would not know what is in the spreadsheet; it just knows it is there.
Conversely: If the data in the spreadsheet is important, and its being a spreadsheet a mere medium to present the data, the data would be stored in the database's columns and rows instead of (and akin to) the spreadsheet's columns and rows.
Similarly: If the database were to assume the role of the spreadsheet, it would likely require normalization, supporting lookups, and views to bring it all together.
Though: As spreadsheets often support saving as XML, and databases often support XML querying, it might make sense to store the XML in an XML column, and be queryable via an XMLQuery.

So many choices, so much fun.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic