To understand what this does, you have to know a little bit about how computers deal with text and what
character encoding is.
Computers ultimately store everything in their memory as ones and zeros, or, at a slightly higher level of abstraction, as numbers. In the memory as a computer, text is also represented as numbers. Ofcourse, to do this, you have to have an agreement about what number means what character. That agreement is a
character encoding. For example, take ASCII, which is one of the oldest character encodings. In ASCII, the number 65 means the letter 'A', 66 means 'B', 67 means 'C' etc.
ASCII is very limited - it uses 7 bits per character so that it can represent only 128 different characters. Over the years, people have invented many other character encodings besides ASCII, to be able to represent more characters besides just the standard Latin alphabet plus a few extra characters.
One of the most used character encodings nowadays is UTF-8, which is a specific encoding for characters from the
Unicode character set. Note that
UTF-8 is a variable-length encoding; each character takes up between 1 and 4 bytes when encoded with UTF-8.
Now, let's look at your line of code. To start, I'll tell you that this line of code is most likely wrong, and you'll see why.
It is doing two things:
1.
"中文字".getBytes() - This takes the
string "中文字" and returns an array of bytes that represent the string encoded with
the default character encoding of the system. You said that the default encoding of your system is Cp1252 (a Microsoft Windows-specific encoding).
2.
new String(bytes, "UTF-8") - This takes the bytes and creates a new String out of it, decoding the bytes using the UTF-8 character set. This is
wrong, because the bytes were encoded using Cp1252 and not UTF-8, as we saw in the first step. You will get a string that is likely to contain wrong characters, or you might even get an exception when the bytes do not form a valid UTF-8 sequence.
So, summarizing what this does:
1. Take a string and convert it to bytes using the default character encoding (Cp1252)
2. Convert those bytes back to a string, telling the computer that the bytes were encoded with UTF-8 - which is wrong, because in step 1 you encoded them with Cp1252 instead of UTF-8
Just
saying that the bytes are UTF-8 doesn't
make them UTF-8. It's as if I write down a sentence in Dutch and then tell my English friend, "Can you read this? It's written in English".
I don't know what the intention was of the person who wrote this code, but (s)he probably did not understand character encoding very well and didn't really know what (s)he was doing. It looks like a case of cargo cult programming, where the programmer just copied and pasted a "magic formula" without understanding.
Probably (s)he wanted to store the text as UTF-8 in the database. The best way to do that is by configuring the database to store strings using UTF-8 (has nothing to do with the
Java code) and then just do
pstmt.setString(1, "中文字"); without the unnecessary manual (and wrong) conversion. You do not need to set the default character encoding to UTF-8 for this.