• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

NX: reading the Datafile

 
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,
I am confused by the way I have to read the Dtafile.
Here is my text

All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.


Right now I can read everything this way but I don't use any encoding
For the header I read it this way:

the for the records I use this


But nowhere I used this 8 bit US-ASCII
I don't know which character encoding it is
All I found is


US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8 Eight-bit UCS Transformation Format
UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark



So Here are my questions:
1) Which one is it? is it US-ASCII ( 7 bits) or ISO-8859-1 or UTF-8???
2) whwn I read the schema which is a mix of int and char for the field name and field length: if I use it should I read also the integer as char and then convert them?
3) Then I cannot use BufferedReader and I have to read char one by one
4) They also say DataInputSteam for the header and in the DataInputStream java definition they say :

A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.


which mean that it is machine dependent....
I am alll confused because th eheader and schema and th erecords are all written in the same fil ewith the same encoding I suppose....

Can you help me understand this point
Thank you
- Lydie
 
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Bonsoir Lydie,
Welcome to JavaRanch and this forum!

1) Which one is it? is it US-ASCII ( 7 bits) or ISO-8859-1 or UTF-8???


There is no real difference between US-ASCII ( 7 bits) and the "8 bit US-ASCII" stated in the instructions.
Think of the fact that a byte (the smallest "normally" manageable memory unit) uses 8 bits, hence 7 bits use a full byte anyway. In English, one bit is useless (and unused), just because English writers don't know the joy of using the weird �, �, �, �, �, �, �, �, �, ... and other funny characters we use in French and other european languages and which are all coded on the 8th bit. So, at the binary level, US-ASCII and ISO-8859-1 are just compatible.

2) whwn I read the schema which is a mix of int and char for the field name and field length: if I use it should I read also the integer as char and then convert them?


The provided file is a binary one. So, you should read expected primitives as such (readShort(), readInt(), ...), and text data as bytes (byte[]). Without using NIO (and its provided Charset class) as NIO is now forbidden in the latest versions of the instructions, you can convert a byte[] to a String using the special String constructor which accept a charset name as second parameter. And to convert a String to a byte[], String.getBytes(String charsetName) looks perfect either.

3) Then I cannot use BufferedReader and I have to read char one by one


I'd avoid the use of any Reader (aimed to read *text*) with a binary file, and anyway you don't need it (see 2)).

A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way.
(...)
which mean that it is machine dependent....


What do you mean exactly?
What I can tell you for sure (it's a question often asked about that part of the instructions) is that DataInputStream and DataOutputStream are format-compatible with RandomAccessFile that you'll probably prefer to both of them (you'll have to read from the file, but also write to it, so RAF looks handier).
Regards,
Phil.
[ April 30, 2004: Message edited by: Philippe Maquet ]
 
lydie prevost
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ReBonjour Philippe!
What you are telling me is that I can use "raf" even if in if in the intructions it is told that the format used is DataInputStream (characte encoding 8 bit US ASCII). Is that right?
Sorry I have a hard time with IO....
 
author and jackaroo
Posts: 12200
280
Mac IntelliJ IDE Firefox Browser Oracle C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Lydie,

What you are telling me is that I can use "raf" even if in if in the intructions it is told that the format used is DataInputStream


That's right.
I suspect that this is just Sun's way of telling you that you do not have to worry about whether the datafile was created on a little endian machine while you are working on a big endian machine or vice versa. You know that the data file can be read / written with the standard Java classes.
But you are free to use any other Java class that suits your needs.
Regards, Andrew
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Lydie,
As usual, I cannot find anything to add to what Andrew already wrote.
Regards,
Phil.
[ May 03, 2004: Message edited by: Philippe Maquet ]
reply
    Bookmark Topic Watch Topic
  • New Topic