Kevin Simonson wrote:This has puzzled me for a long time. If I have any kind of a text file (like the source code for a Java program for example), and use a {Scanner} object to read it into {String} objects in a Java program, and then end up writing those {String} objects to a file by using a {PrintWriter} object, each of the {char} components of the {String} objects take up sixteen bits in the Java program, but each only takes up eight bits in the source file the {Scanner} object reads from, and each only takes up eight bits in the destination file the {PrintWriter} writes to. Why store each {char} with sixteen bits of memory in the Java program, when there was only eight bits of memory where the {char} originated from, and there will be only eight bits of memory where the {char} will ultimately get stored?
Furthermore, in my own particular application I've discovered that if a {String} has individual characters that correspond to numbers higher than 127, if I do a {println()} on the {PrintWriter} object for that particular {String}, those characters get writtern as {63}s, and a bunch of information appears to be getting lost. Is there some other way to write {String}s like this to files so that information doesn't get lost?
SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6 - OCEJPAD 6
How To Ask Questions How To Answer Questions
Experience keeps a dear School, but Fools will learn in no other.
---
Benjamin Franklin - Postal official and Weather observer
SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6 - OCEJPAD 6
How To Ask Questions How To Answer Questions
Rob Spoor wrote:Well, String does have constructors that take a byte[], and can return a byte[] as well. However, you need to specify the encoding/charset to use (like StandardCharsets.US_ASCII).
Experience keeps a dear School, but Fools will learn in no other.
---
Benjamin Franklin - Postal official and Weather observer
Kevin Simonson wrote:Rob Spoor and Tim Holloway have provided a lot of interesting information, and they've detailed to some extent the history of {String} objects and how each of their {char}s got to be sixteen bits long in memory, but it looks to me like they haven't answered my main question, which was, why each {char} in a {String} object takes up sixteen bits while stored on disk each takes up only eight bits? What good does it do to store a very diverse set of characters in a Java program, when they can't be stored anywhere on disk with their diversity intact?
Experience keeps a dear School, but Fools will learn in no other.
---
Benjamin Franklin - Postal official and Weather observer
Paul Clapham wrote:Sure they can. You just haven't heard about encodings, or charsets, yet. Generally (but not always), system designers don't need to deal with everything in Unicode, but only a subset. So they choose an encoding which maps that subset of Unicode characters into bytes, to save space in the output. Here's the relevant tutorial from Oracle's series of Unicode tutorials: Character and Byte Streams; you should probably read some of the other tutorial pages related to that one.
Experience keeps a dear School, but Fools will learn in no other.
---
Benjamin Franklin - Postal official and Weather observer
Kevin Simonson wrote:I have been thinking about writing a Java program that implements an editor (possibly based on Emacs), and since it works with {String} objects the possibility exists that the document it's going to store may have characters in it that won't get preserved, should I just use a {PrintWriter} object and its {println()} method to write those characters to disk.
People have been expending a lot of energy telling me there are other ways to do disk I/O that stores sixteen bits to disk for each {char} in a {String}, but so far nobody's told me what one of those ways is.
Experience keeps a dear School, but Fools will learn in no other.
---
Benjamin Franklin - Postal official and Weather observer
Don't get me started about those stupid light bulbs. |