Alan Moore

Ranch Hand
+ Follow
since May 06, 2004
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
2
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Alan Moore

Also, UTF-8 is not a superset of ISO-8859-1.
10 years ago
The problem is that you aren't using UTF-8. You're using the system default encoding, which doesn't support those two characters. There are four parts to the problem: (1) You have to save the source file in an encoding that supports those characters. (2) You have to tell the compiler to read the source file with the same encoding. (3) When you write the output, you have to specify an appropriate encoding. (4) If you're writing to the console, you have to tell it to use the same encoding; if you write to a file, you have to make sure the editor you use to read the file uses the right encoding.

You can eliminate the first two parts of the problem by using Unicode escapes for any non-ASCII characters: As for the other two parts, the easiest solution is to avoid the console; it's too difficult to work with and too system-dependent. Just write to a file and specify the correct encoding: You can then read the result with any reasonably good editor (but try to avoid Windows Notepad).
10 years ago
AFAIK, it only matters when the regex is coming from something outside the program, like a file or the console, where it can be difficult or impossible to input control characters like linefeeds. Then you have to use the escape sequence.
10 years ago
FileReader decodes the contents of the file using the system default encoding. What encoding that is depends on the operating system and the locale of whatever computer the code is running on. That means it will be different on different machines, so you shouldn't use the default if you're dealing with anything other than pure ASCII.

Your file contains both ASCII characters and Japanese ideograms, so the encoding has to be one that supports both character sets: the most likely candidates are Shift_JIS and UTF-8. I would try UTF-8 first: And when you create the BufferedWriter, use an OutputStreamWriter and specify "UTF-8" again. If that doesn't work, try "Shift_JIS" for the Reader (but leave the Writer set to UTF-8).

This is just my best guess, based on experience; I can't get enough out of your posts to be more definite. If you still have problems, remember to check "Disable HTML" and "Disable smilies" when you post again (in fact, do what I did and set them to be disabled by default in your "My Profile" page).
10 years ago
As I said in your other thread, Big5 works fine on my JDK 1.6 machine. I think we're going to need to see your code before we can help you.
10 years ago
Big5 works fine on my machine (WinXP, JDK 1.6);
10 years ago
First, some corrections:

The number of bytes that make up a character is not fixed; it depends on the encoding that's being used to convert between bytes and characters. (Read this.) You're probably thinking of the way Java strings are stored; they use the UTF-16 encoding, which uses two bytes per character (usually--see the article), but that has nothing to do with how the text is stored on disk.

It used to be the case that Macs favored the carriage-return ('\r' or '\u000D') for line separators, but as of version 10 (OSX), Mac OS is based on Linux, which prefers the linefeed ('\n' or '\u000A').

However, it doesn't matter that much what the operating system thinks a line separator should be, because it's the application (in this case, your Java program) that has to read and write the files. Virtually every modern application will accept any of the three major styles of line separator ("\n", "\r", or "\r\n"). BufferedReader is no exception; you can use a different separator at the end of every line, and BufferedReader will handle them correctly.

There is, unfortunately, one very important exception: Windows Notepad. It refuses to recognize anything except the DOS/Windows-style carriage-return+linefeed ("\r\n") line separator. If it encounters a linefeed or carriage-return by itself, Notepad renders it as a rectangle instead of a line break. That's probably not the cause of your problem, since you're using a Mac, but it's useful to know about (not to mention infuriating).

Now that all that's out of the way, we'll need some more info before we can help you. Like, how exactly are you reading and writing the files? What's the exact code you use to construct the Reader and Writer? How do you write the line separators? Do you use BufferedWriter#newLine(), or do you explicitly write a "\r"? And how do you view the contents of the files?
10 years ago
Well, static imports from the default package were never possible, but it used to be possible to do regular imports from there. The default package is convenient for school lessons and throwaway examples (like most of the code you see in forums like this), but it never should have been used in real-world apps; it defeats the purpose of having packages in the first place.
10 years ago
That regex will match "0/0/00". The year part is okay, but the month and day parts should match a leading zero only if it's followed by another, non-zero digit:
10 years ago
Actually, "[pial]" matches exactly one of the letters 'p', 'i', 'a', or 'l' each time it's applied. The order in which they're listed has no effect on the order in which they're matched. Here's a regex that will optionally match on or more of those letters, but only if they're in the correct order:
10 years ago
The line separator is there; the problem is with the program you're using to view the text: Windows Notepad. Every other program in the world will accept any of the three most common line separators: "\n" (linefeed only), "\r" (carriage-return only), or "\r\n" (carriage-return + linefeed). But Notepad only recognizes "\r\n", the traditional line separator for DOS/Windows systems. Just use something other than Notepad to view the text, and don't worry about the line separators.
10 years ago
You want to match an asterisk or tilde that isn't adjacent to letters, plus any adjacent whitespace?
10 years ago

Originally posted by Ilja Preuss:
Which, on the other hand, has nothing to do with the compiler at all.



No, what makes the compiler complain is when you try to use a single backslash to escape something in a regex, like this: The Java compiler tries to treat \( one of the String-literal escape sequences, and of course it fails.
10 years ago
Are you trying to compress multiple spaces within quotes down to a single space? You can't use \s for that because it matches all ASCII whitespace characters, which includes linefeeds ("\n"). Try this:
10 years ago
I would recommend using the findWithinHorizon() method, with a horizon of zero and the regex If you don't want the delimiters returned as part of the record, you can use lookbehind and lookahead to match them:
10 years ago