Not all stories need to be success stories. Reality is also not like that. We would like to share a true, disappointing story (but a phenomenal learning experience) that may be beneficial to you.
This is a story about optimizing memory utilization of a web application. This application was configured with a lot of memory (4GB) just to service handful of transactions/sec. Thus, we set out to study the memory utilization patterns of this application. We captured heap dumps of this application using ‘jmap’ tool. We uploaded the captured heap dump to HeapHero tool. HeapHero is a heap dump analysis tool just like Eclipse MAT, JProfiler, Yourkit. HeapHero tool profiled the memory and provided statistics on total classes, total objects, heap size, histogram view of large objects residing in the memory. On top of these traditional metrics, HeapHero reported the total amount of memory wasted due to inefficient programming practices. In modern computing, considerable amount memory is wasted because of inefficient programming practices such as: Duplicate object creation, suboptimal data type definitions (declaring ‘double’ and assigning only ‘float’ values), over allocation and underutilization of data structures and several other practices.
This application was no exception to it. HeapHero reported that application is wasting 56% of memory due to inefficient programming practices. Yes, it’s eyebrow raising 56%. It reported that 30% of application’s memory is wasted because of duplicate strings.
Fig:HeapHero tool reporting amount of memory wasted due to inefficient programming
String Deduplication Since Java 8 update 20 a new JVM argument ‘-XX:+UseStringDeduplication’ was introduced. When an application is launched with this argument, JVM will eliminate the duplicate strings from the application’s memory during garbage collection. However please be advised that ‘-XX:+UseStringDeduplication’ argument will work only with G1 GC algorithm. You can activate G1 GC algorithm by passing ‘-XX:+UseG1GC’.
We got excited. We thought just by introducing ‘-XX:+UseG1GC -XX:+UseStringDeduplication’ JVM argument, we would be able to save 30% of memory without any code refactoring. Wow, isn’t it wonderful? To verify this theory, we conducted two different tests in our performance lab:
Test 2: Passing ‘-XX:+UseG1GC -XX:+UseStringDeduplication’
We enabled Garbage collection logs on the application to study the memory usage pattern. Analyzed Garbage Collection logs using the free online garbage collection log analysis tool – GCeasy. We were hoping that in the test run #2 we would be able to see 30% reduction in the memory consumption, because of elimination of duplicate strings. However, the reality was quite different. We didn’t see any difference in the memory usage. Both test runs were consistently showing the same amount of memory utilization. See the heap usage graphs generated by the GCeasy tool by analyzing the garbage collection logs.
Fig: GCeasy Heap usage graph with ‘-XX:+UseG1GC’
Fig: GCeasy heap usage graph with ‘-XX:+UseG1GC -XX:+UseStringDeduplication’
In Test run #1 heap usage hovering around 1500mb all through the test, in test run #2 also heap usage was hovering around 1500mb. Disappointingly we didn’t see the anticipated 30% reduction in the memory usage, despite introducing ‘-XX:+UseG1GC -XX:+UseStringDeduplication’ JVM arguments.
Why there wasn’t reduction in heap usage? ‘Why there wasn’t reduction in heap usage?’ – this question really puzzled us. Did we configure JVM arguments rightly? Doesn’t ‘-XX:+UseStringDeduplication’ do its job correctly? Is the analysis report from the GCeasy tool is correct? All these questions troubled our sleep. After detailed analysis, we figured out the bitter truth. Apparently ‘-XX:+UseStringDeduplication’ will eliminate duplicate strings that are present in the old generation of the memory only. It will not eliminate duplicate strings in the young generation. Java memory has 3 primary regions: young generation, old generation, Metaspace. Newly created objects go into the young generation. Objects that survived for longer period are promoted into the old generation. JVM related objects and metadata information are stored in Metaspace. Thus stating in other words ‘-XX:+UseStringDeduplication’ will only remove duplicate strings that are living for a longer period. Since this is a web application, most of the string objects were created and destroyed immediately. It was very clear from the following statistics reported in the GCeasy log analysis report:
Fig: Object creation/promotion stats reported by Gceasy
Average object creation rate of this application is: 44.93 mb/sec, whereas average promotion rate (i.e. from young generation to old generation) is only 918 kb/sec. It’s indicative that very small of the percentage of objects are long living. Even in these 918 kb/sec promoted objects, string objects are going to be a smaller portion. Thus the amount of duplicate strings removed by ‘-XX:+UseStringDeduplication’ was very negligible. Thus, sadly we didn’t see the expected reduction in memory.
(a). ‘-XX:+UseStringDeduplication’ will be useful only if application has a lot of long-lived duplicate strings. It wouldn’t yield fruitful results for applications when majority of the objects are short-lived. Unfortunately, most modern web applications, micro-service applications objects are short-lived.
(b). Another famous option recommended in the industry to eliminate duplicate strings is to use String#intern() function. However, String#intern() isn’t going to be useful for this application. Because in String#intern() you end up creating the string objects and then eliminating it right after. If a string is short-lived by nature, you don’t need to do this step, as regular garbage collection process will eliminate the strings. Also, String#intern() has a possibility to add (very little) latency overhead to the transaction and CPU overhead.
(c). Given the current situation best way to eliminate duplicate strings from the application is to refactor the code to make sure duplicate strings are not even created. HeapHero points out the code paths where a lot of duplicate of strings are created. Using those pointers, we are going to continue our journey to refactor the code to reduce memory consumption.
All of the world's problems can be solved in a garden - Geoff Lawton. Tiny ad:
RavenDB is an Open Source NoSQL Database that’s fully transactional (ACID) across your database