• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Knute Snortum
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Ganesh Patekar
  • Stephan van Hulst
  • Pete Letkeman
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Ron McLeod
  • Vijitha Kumara

Why OOME when populating a List from a file, but not from new String()?  RSS feed

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Given the testcase below:
The first block succeeds in populating a List with 5 million String objects.
The second block fails (OutOfMemoryError) when trying to  populate the List with 5 million String objects

The question is:  Why?

The only difference so far as I understand is:
In block one, the coded String is added via a loop  50 million times
and
In block two, 50 million lines are added as Strings into the List.

Both blocks add the same String objects, in terms of size - and both are new String objects, not references.

Only thing I can think of it the I/O object overhead issue -- ie., all 5 million line objects are held on the heap until after Files.readAllLines() completes.

Guess, I am missing a fundamental concept here.

PS - I know ways to do what I need to do (process in batches, increase heap, etc) but I'd like to understand the underlying reason for my use case.

OOMExample.java
--------------------
import java.util.*;
import java.nio.*;

public class OOMExample {

    public static void main(String[] args) throws java.io.IOException {

        // File containing 5 million lines of the same string (26 letters in the alphabet)
        String fname = "/home/user/5MillionsLinesOfAlphabet.dat";
        List<String> list = new ArrayList();

        // Populate List with 5 millions new Strings (not references)
        // Result: No OOME.
        String str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        for (int i = 0; i < 5000000; i++) {
            list.add(new String(str));
        }
        System.out.println("List size: " + list.size());
        list.clear();

        // Read file into List.
        // Result: OOME -Xmx512m
        Path path = Paths.get(fname);
        list = Files.readAllLines(path);
        System.out.println("List size: " + list.size());
    }
}

makedataset.sh
-----------------
#!/bin/bash
for ((i=1;i<=5000000;i++));
do
   echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ"  >> 5MillionsLinesOfAlphabet.dat
done

 
Marty Moose
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Corrections to description:

In a few places I wrote 50 million.  Use case is 5 million all scenarios.

< 50 million times
> 5 million times.
 
author
Sheriff
Posts: 23586
138
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One possibility, the 50 million strings in the file are, on average, longer than the one string that you are copying from the string pool.  (EDIT. Okay, I didn't read the original post completely; you can ignore this first part)

And...

Marty Moose wrote:
Both blocks add the same String objects, in terms of size - and both are new String objects, not references.



This isn't quite true. Yes, you are creating new String objects... however, since Strings are immutable, and hence, the character array that is used to hold the characters internally does not change, I believe, as an implementation detail, the String class will simply use the same character array.  In other words, the new string objects are using references to the same character array backing store.

As a suggestion, instead of using the constructor that takes a string, how about using the constructor that takes something mutable? such as a character array? This way, the String class can't assume that it won't change, and hence, will have to make a copy of the string's characters.

Henry
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!