I'm working on a network application that reads web data, parses it and processes it with conditional logic. Now, I'm wanting to store the data so I can run analysis on historic data at some point in the future. The book I have isn't very telling and I'm not able to find conclusive support for a particular method online. What I'm considering now is storing these values in a tab-delimited text file since I don't have any local database on my PC. Ultimately, though, I would like to be able to use Java to take the data directly from the text file (if this is where I should store it for now) and insert it into a table in SQL Server so I can query it. However, since I'm still far off from using SQL Server, is storing in a text file the best practice? Also, would it be impractical to take the data from the text file and insert it into a database table someday? Please share your thoughts. Thanks in advance!
As far as transferring the data from a text file to a database table, that should be pretty easy. Read a line from the text file, extract the data from the line, insert the data into the database table, repeat until no more lines.
But as for your design questions: As usual the design would depend on several things. For example, how much data do you have? If your text file ends up being several megabytes in size then scanning through it to find one particular row out of one million is not going to be practical. And you said it was historic data, which sounds like that means the data is not going to change. But is that a correct assumption? If the data changes frequently then you'd have to rebuild the text file frequently too. But if it changes, say, annually, then that probably wouldn't be a big deal.
You could do worse than to store data in a TSV file. Just remember 2 things:
1. When you create the file make the path absolute, don't use a relative directory path. The concept of "current directory" isn't valid for a webapp.
2. Absolutely do not store the file inside the WAR or the webapp server directories. Storing in the webapp server directory tree pollutes the product. Storing inside the WAR can fail if the WAR hasn't been exploded and if it doesn't the entire file may get deleted if the WAR is updated.
Other options. If your data is hierarchical instead of flat, consider using YAML format. It's easy for Java to parse YAML.
Linux people have an advantage. The SQLite DBMS is almost guaranteed to be installed on a Linux box because a lot of system utilities use it for lightweight storage needs. There is a free version of Microsoft SQL Server, although it's not that lightweight. Alternatively you could install something like Apache Derby for testing purposes.
An IDE is no substitute for an Intelligent Developer.
posted 3 months ago
Gentlemen, thank you for your advice and recommendations – it’s all very useful to me. I like the idea of using Apache Derby a lot. I’ll probably work on building some data models using that and then move them to a local DB when the data becomes large.
Currently, I have about 25 rows of completely flat integer and double data, three columns wide. The data will never change, but will be appended to. I haven’t decided for how long the data will be stored, but probably for about a year. However, I don’t think I would collect more than maybe 100k rows in this timeframe.
Also, I did decide to separate the values with commas; my reasoning is that I’ll more confident about the consistency of my data if I’m parsing at commas instead of blanks.