Win a copy of Java Concurrency Live Lessons this week in the Threads forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Example of Using -Xprof to improve code  RSS feed

Norm Radder
Posts: 1734
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In looking at the questions and answers on various forums, I saw a way to improve one of my programs. It's a search program that looks thru html files that I have downloaded from various sites and saved on my harddrive. I'm on a dial-up connection and sometimes find using Google a problem. And I enjoy writing/using my own code. For many of the folders of HTML I have(such as the Java Tutorial) I can invoke my search program (as an applet) from the browser while I'm looking at the pages.

One of the items I saw that I wanted to use was the java -Xprof option. I used it with the search program and found that there was a heavy usage in the toUpperCase() method. The other question I saw was: How to determine what language a String was. This lead to thinking of Strings as being of characters which have a value from 0 to 64K. Eureka!!

A part of the Xprof output for my search program follows. This search took 10.6 seconds and looked at 979 files.

Searched 979 files in 808 dir, total time=9593, average=9, duration= 10609

Flat profile of 10.62 secs (220 total ticks): Thread-3

Interpreted + native Method
2.4% 0 + 5
1.0% 0 + 2
13.5% 17 + 11 Total interpreted

Compiled + native Method
13.9% 29 + 0 java.lang.Character.toUpperCaseEx
12.0% 25 + 0 sun.nio.cs.SingleByteDecoder.decodeArrayLoop
11.5% 24 + 0 java.lang.String.codePointAt
9.6% 20 + 0 java.lang.String.toUpperCase
7.2% 15 + 0 java.lang.String.indexOf

The three above lines in the report show that toUpperCase() is very expensive. I uppercase everything to make finding strings easier. I use indexOf() for example

I thought about my application and realized that I was only interested in a-z being uppercased to A-Z. So I created a small class that would only uppercase those 26 letters:

Then used the above method to uppercase the strings before searching them. I got about a 20% time improvement. The following shows the same search taking 7.9 seconds.

Searched 979 files in 808 dir, total time=6908, average=7, duration= 7891

Flat profile of 7.93 secs (154 total ticks): Thread-3

Interpreted + native Method
3.8% 0 + 5
3.8% 5 + 0 java.awt.EventQueue.postEventPrivate
2.3% 0 + 3
1.5% 0 + 2
22.9% 17 + 13 Total interpreted

Compiled + native Method
20.6% 27 + 0 NormsTools.UC_a_to_z.toUpperCase
12.2% 16 + 0 java.lang.String.indexOf
12.2% 16 + 0 sun.nio.cs.SingleByteDecoder.decodeArrayLoop
6.1% 8 + 0 NormsDev.SearchFiles.SearchStrings.searchFile
5.3% 7 + 0

The question: What have I overlooked? Are there characters that my toUpperCase() method will miss? Am I interpreting the data incorrectly?

Sean Collins
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I realise this is an old post - was looking for -Xprof experiences and found your case-changing code.

I remember doing something similar recently, but have a lingering impression my original problem with toUpper/toLower might have been confused by charsets. If you're doing this on your own PC, the code-free way of doing it would be to not check if the character was in your array, but fill in all the places (like UPPER_CHAR_ARRAY[(int)'!'] = '!') in the 256-place array and assign them anyway:

That saves you a .length lookup in the loop, and a zero-member check. Now that I check the JVM spec, char is a 16-bit type (, so I'm sure you could spare 64K of memory! Just assign all 64K array members to the same value as their index, except for your lower case characters, and lose the range check:

I did this for a search engine (only a hobby on my desk!) last year, but saved the pages in a 'canonical form', so I didn't need to uppercase them every time I needed to search. For your needs, you could just double up your storage and save an uppercased copy, or you could use javax.swing.something.HTMLEditorKit to strip out the non-text (it wasn't 100% for me, but not bad), uppercase what's left and compress the originals - that would probably save time and space, at the expense of a little bit of pre-processing.

Heh. That's enough! I often come to CodeRanch, never joined it until now. I'd better get back to looking for -Xprof...
Campbell Ritchie
Posts: 53779
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch Sean Collins.
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!