• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

NX: reading the Datafile

 
lydie prevost
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
I am confused by the way I have to read the Dtafile.
Here is my text
All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.

Right now I can read everything this way but I don't use any encoding
For the header I read it this way:

the for the records I use this


But nowhere I used this 8 bit US-ASCII
I don't know which character encoding it is
All I found is

US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit UCS Transformation Format
UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark


So Here are my questions:
1) Which one is it? is it US-ASCII ( 7 bits) or ISO-8859-1 or UTF-8???
2) whwn I read the schema which is a mix of int and char for the field name and field length: if I use it should I read also the integer as char and then convert them?
3) Then I cannot use BufferedReader and I have to read char one by one
4) They also say DataInputSteam for the header and in the DataInputStream java definition they say :
A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.

which mean that it is machine dependent....
I am alll confused because th eheader and schema and th erecords are all written in the same fil ewith the same encoding I suppose....

Can you help me understand this point
Thank you
- Lydie
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bonsoir Lydie,
Welcome to JavaRanch and this forum!
1) Which one is it? is it US-ASCII ( 7 bits) or ISO-8859-1 or UTF-8???

There is no real difference between US-ASCII ( 7 bits) and the "8 bit US-ASCII" stated in the instructions.
Think of the fact that a byte (the smallest "normally" manageable memory unit) uses 8 bits, hence 7 bits use a full byte anyway. In English, one bit is useless (and unused), just because English writers don't know the joy of using the weird �, �, �, �, �, �, �, �, �, ... and other funny characters we use in French and other european languages and which are all coded on the 8th bit. So, at the binary level, US-ASCII and ISO-8859-1 are just compatible.
2) whwn I read the schema which is a mix of int and char for the field name and field length: if I use it should I read also the integer as char and then convert them?

The provided file is a binary one. So, you should read expected primitives as such (readShort(), readInt(), ...), and text data as bytes (byte[]). Without using NIO (and its provided Charset class) as NIO is now forbidden in the latest versions of the instructions, you can convert a byte[] to a String using the special String constructor which accept a charset name as second parameter. And to convert a String to a byte[], String.getBytes(String charsetName) looks perfect either.
3) Then I cannot use BufferedReader and I have to read char one by one

I'd avoid the use of any Reader (aimed to read *text*) with a binary file, and anyway you don't need it (see 2)).
A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way.
(...)
which mean that it is machine dependent....

What do you mean exactly?
What I can tell you for sure (it's a question often asked about that part of the instructions) is that DataInputStream and DataOutputStream are format-compatible with RandomAccessFile that you'll probably prefer to both of them (you'll have to read from the file, but also write to it, so RAF looks handier).
Regards,
Phil.
[ April 30, 2004: Message edited by: Philippe Maquet ]
 
lydie prevost
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ReBonjour Philippe!
What you are telling me is that I can use "raf" even if in if in the intructions it is told that the format used is DataInputStream (characte encoding 8 bit US ASCII). Is that right?
Sorry I have a hard time with IO....
 
Andrew Monkhouse
author and jackaroo
Marshal Commander
Pie
Posts: 12014
220
C++ Firefox Browser IntelliJ IDE Java Mac Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Lydie,
What you are telling me is that I can use "raf" even if in if in the intructions it is told that the format used is DataInputStream

That's right.
I suspect that this is just Sun's way of telling you that you do not have to worry about whether the datafile was created on a little endian machine while you are working on a big endian machine or vice versa. You know that the data file can be read / written with the standard Java classes.
But you are free to use any other Java class that suits your needs.
Regards, Andrew
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Lydie,
As usual, I cannot find anything to add to what Andrew already wrote.
Regards,
Phil.
[ May 03, 2004: Message edited by: Philippe Maquet ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic