Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Frits Walraven
Bartenders:
  • Carey Brown
  • salvin francis
  • Claude Moore

UseStringDeduplication – pros and cons  RSS feed

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Let me start this article with an interesting statistic (based on the research conducted by the JDK development team):

+ 25% of Java applications memory is filled up with strings.
+ 13.5% are duplicate strings in Java applications.
+ Average String length is 45 characters.

Yes, you are right 13.5% of memory is wasted due to duplicate strings 😊. 13.5% is the average amount of duplicate strings present in java application. To figure out how much memory your application is wasting because of duplicate strings, you may use tools like HeapHero, which can report how much memory is wasted because of duplicate strings and other inefficient programming practices.

What are duplicate strings?
First, let’s understand what does duplicate string mean. Look at the below code snippet:



In the above code there are two string objects string1 and string2 they have same contents i.e. “Hello World”, but they are stored in two different objects. When you do string1.equals(string2), it will return ‘true’, but ‘string1 == string2′ will return ‘false’. This is what we call as duplicate strings.

Why there are so many duplicate strings?
There are several reasons why application ends up having lot of duplicate strings. In this section lets review two most common patterns:

# 1. Developers create new string objects for every request, instead of referencing/reusing ‘public static final string literal’. Above example can be optimally written using String literal pattern:



# 2.
Suppose you are building banking/e-commerce application. You are storing currency (i.e. ‘USD’, ‘EUR’, ‘INR’, ….) for every transaction record in the database. Say now customer login to your application and he is viewing transaction history page. Now your application will end up reading all transactions pertaining to this customer from database. Suppose this customer lives in USA (then most if not all his transactions would be in USD). Since every transaction record has currency, your application will end up creating ‘USD’ string object for every transaction record read from database. If this customer has thousands of transactions, you will end up creating thousands of duplicate ‘USD’ string objects in memory that too just for this one single customer.

Similarly, your application could be reading multiple columns (customer name, address, state, country, account number, Ids,…..) from databases multiple times. There could be duplicates among them. Your application reads and writes XML/JSON with external applications, it manipulates lot of strings. All these operations can/will create duplicate strings.

This problem has been long recognized by JDK team since its origin (mid 1990s), thus have come up with multiple solutions so far. Latest addition to this solution list is ‘-XX:+UseStringDeduplication’

-XX:+UseStringDeduplication
Least effort attempt to eliminate duplicate strings is to pass ‘-XX:+UseStringDeduplication’ JVM argument. When you pass this JVM argument during application startup, JVM will try to eliminate duplicate strings as part of garbage collection process. During garbage collection process, JVM inspects all the objects in memory, thus as part of that process, it tries to identify duplicate strings among them and tries to eliminate it.

Does that mean if you just pass ‘-XX:+UseStringDeduplication’ JVM argument will you be able to save 13.5% of memory immediately? Sounds very easy right? We wish it is that easy. But there are some catches to this ‘-XX:+UseStringDeduplication’ solution. Let’s discuss them.


(1). Works only with G1 GC algorithm

There are several garbage collection algorithms (Serial, Parallel, CMS, G1,…). ‘-XX:+UseStringDeduplication’ works only if you are using G1 GC algorithm. So, if you are using some other GC algorithm, you need to switch to G1 GC algorithm to use ‘-XX:+UseStringDeduplication’.

(2). Works only on long lived objects
‘-XX:+UseStringDeduplication’ eliminates duplicate strings which live for a longer period of time. They don’t eliminate duplicate strings among short-lived string objects. If objects are short-lived, they are going to die down soon then what is the point spending resources to eliminate duplicate strings among them. Here is a real-life case study conducted on a major Java web application which didn’t show any memory relief when ‘-XX:+UseStringDeduplication’ was used. However ‘-XX:+UseStringDeduplication’ can be of value, if your application has lot of caches (since cache objects typically tend to be long lived objects).

(3). -XX:StringDeduplicationAgeThreshold
By default, strings become eligible for deduplication if they have survived 3 GC runs. It can be changed by passing this ‘-XX:StringDeduplicationAgeThreshold’.



(4). Impact on GC Pause Times
Since String Deduplication is performed during garbage collection, it has potential to impact to GC pause time. However, assumption is that a high enough deduplication success rate will balance out most or all of this impact, because deduplication can reduce the amount of work needed in other phases of a GC pause (like reduced number of objects to evacuate) as well as reduce the GC frequency (due to reduced pressure on the heap). To analyze GC pause time impact, you may consider using tools like GCeasy

(5). Only underlying char[ ] is replaced
The java.lang.String class has two fields:



‘-XX:+UseStringDeduplication’ doesn’t eliminate duplicate string object itself. It only replaces underlying char[ ]. Deduplicating a String object is conceptually just a re-assignment of the value field, i.e., aString.value = anotherString.value.

Each string object takes at least 24 bytes (the exact size of a string object depends on the JVM configuration, but 24 bytes is a minimum). Thus, this feature saves less memory if there are lot of short duplicate strings.

(6). Java 8 update 20

‘-XX:+UseStringDeduplication’ feature is supported only from Java 8 update 20. Thus, if you are running on any older versions of Java, you will not be able to use this feature.

(7). -XX:+PrintStringDeduplicationStatistics
If you would like to see String deduplication statistics, such as how much time it took to run, how much duplicate strings were evacuated, how much savings you gained, you may pass ‘-XX:+PrintStringDeduplicationStatistics’ JVM argument. In the error console statistics will be printed.

Conclusion:
If your application is using G1 GC and running on version above Java 8 update 20, you may consider enabling ‘-XX:+UseStringDeduplication’. You might get fruitful results especially if there are lot of duplicate strings among long-lived objects. However, do thorough testing before enabling this argument in production environment.



 
permaculture is giving a gift to your future self. After reading this tiny ad:
Create Edit Print & Convert PDF Using Free API with Java
https://coderanch.com/wiki/703735/Create-Convert-PDF-Free-Spire
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!