• Post Reply Bookmark Topic Watch Topic
  • New Topic

non-printable delimiter  RSS feed

 
Dave Robbins
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello All,
Stictly speaking I don't guess this is really a java question but I'm writing java at the moment and you folks know everything so here goes.

I've got an applet that calls an asp page back on the server it came from which runs some database queries and returns the data to the applet. ( I know, the whole problem is the asp, but it wasn't my descision) The asp packs the data as a comma delimited string, but now some of the data fields contains commas and it throws everything off. In the applet I use the String split() function to parse the string. Surely this is a common problem, how do people get around it. I was think of trying to use some non-printable character for a delimiter but I'm not sure how.

Advice
Dave
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My favorite trick is to let the routine the concatenates a delimited string pick the delimiter. It can scan all the values that will go into the string and find some character that is not in any of the values. Then it puts the delimiter as the FIRST character. Now the parser can take the first character off and use it as the delimiter. Here's a method that demonstrates:
 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
seems to be a solid solution, as long as nobody stores the ascii-Table in the fields, or big binary data.

Another often seen solution is, to encapsulate every value with quotes:
row=("1",""Peter"","17.34",""Smith"); which leads to "," as separator between values, and special treatment for the first and last value.

Sometimes the TAB might be helpful, especially if users are forced to input data via Forms, where Tab leads them to the next field, and might not been inserted.

There are ASCII-Values FS and RS - Field-Separator and Row-Separator defined from former times - but I can only find RS=(char)30; GS=(char)29; (GroupSeparator).

http://www.math.grin.edu/~stone/courses/fundamentals/characters.html
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've really only used comma separated values from MS products, Excel save as CSV. They put quotes around strings, no quotes around numbers. Quotes around everything as you showed is even easier. The problem is always quotes inside quoted strings so you get into some escape character syntax. I've always hated escape charaters. Didn't need em in assembler or COBOL or REXX and don't want to get involved with em now. Of course there's really no choice, but I still avoid them when I can. Hence the extra effort in the delimited string thing.

I've mostly used this delimited trick with business data, but the risk of somebody making a string with every candidate delimiter is non-zero. Hey, if the String encoding stays Unicode there are thousands of unprintable characters you could use, aren't there? BTW: You can nest these delimited strings, too. That's kinda cool some times.
 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, "," will be very unusual in userinput - put not impossible.
Think of a csv-File from forum-postings

Databases often allow csv-export or -import.
Mostly separated by ';'.

The usability depends on the circumstances - perhaps we should google for 'evil csv hacks'

Unprintable characters have the drawback, that you may not easily edit such files or create them from scratch.

Avoid escape-sequences is a good rule of thumb.
 
Lu Battist
Ranch Hand
Posts: 104
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's an idea from MIME email attachment encodings. Use a delimiter of a randomly generated string of a fixed size.
Something like:
--axfjs30098sljc8900s--

The longer you make it, the less likely anything would clash with it even in binary format. I think its over kill for normal database text, but safe.
Personally, if I can't use tab or quoted comma delimeters ("text","moretext") then I like to use a double pipe (text||moretext||etc)
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!