To understand what this does, you have to know a little bit about how computers deal with text and what character encoding is.
Computers ultimately store everything in their memory as ones and zeros, or, at a slightly higher level of abstraction, as numbers. In the memory as a computer, text is also represented as numbers. Ofcourse, to do this, you have to have an agreement about what number means what character. That agreement is a character encoding. For example, take ASCII, which is one of the oldest character encodings. In ASCII, the number 65 means the letter 'A', 66 means 'B', 67 means 'C' etc.
ASCII is very limited - it uses 7 bits per character so that it can represent only 128 different characters. Over the years, people have invented many other character encodings besides ASCII, to be able to represent more characters besides just the standard Latin alphabet plus a few extra characters.
One of the most used character encodings nowadays is UTF-8, which is a specific encoding for characters from the Unicode character set. Note that UTF-8 is a variable-length encoding; each character takes up between 1 and 4 bytes when encoded with UTF-8.
Now, let's look at your line of code. To start, I'll tell you that this line of code is most likely wrong, and you'll see why.
It is doing two things:
1. "中文字".getBytes() - This takes the string "中文字" and returns an array of bytes that represent the string encoded with the default character encoding of the system. You said that the default encoding of your system is Cp1252 (a Microsoft Windows-specific encoding).
2. new String(bytes, "UTF-8") - This takes the bytes and creates a new String out of it, decoding the bytes using the UTF-8 character set. This is wrong, because the bytes were encoded using Cp1252 and not UTF-8, as we saw in the first step. You will get a string that is likely to contain wrong characters, or you might even get an exception when the bytes do not form a valid UTF-8 sequence.
So, summarizing what this does:
1. Take a string and convert it to bytes using the default character encoding (Cp1252)
2. Convert those bytes back to a string, telling the computer that the bytes were encoded with UTF-8 - which is wrong, because in step 1 you encoded them with Cp1252 instead of UTF-8
Just saying that the bytes are UTF-8 doesn't make them UTF-8. It's as if I write down a sentence in Dutch and then tell my English friend, "Can you read this? It's written in English".
I don't know what the intention was of the person who wrote this code, but (s)he probably did not understand character encoding very well and didn't really know what (s)he was doing. It looks like a case of cargo cult programming, where the programmer just copied and pasted a "magic formula" without understanding.
Probably (s)he wanted to store the text as UTF-8 in the database. The best way to do that is by configuring the database to store strings using UTF-8 (has nothing to do with the Java code) and then just do pstmt.setString(1, "中文字"); without the unnecessary manual (and wrong) conversion. You do not need to set the default character encoding to UTF-8 for this.
It depends. If the string x contains only characters that take up one byte each, then it can be up to 100 characters in length. But if the string contains characters that are 4 bytes when encoded with UTF-8, then you can fit at most 25 of those characters into that database field.
UTF-8 is a variable length encoding. Some characters (such as the letters of the Latin alphabet) take up only 1 byte per character. But other characters may take up 2, 3 or 4 bytes each.
You should also keep in mind that 'character' isn't a well defined concept.
The characters that Jesper is referring to are actually called 'code points'. Most code points correspond to a visual character on your screen, but some code points actually modify other code points, to make a combined character. That means that some visual characters are made up of multiple code points, which in turn can be made up of a variable number of bytes, depending on the encoding used.