This week's book giveaway is in the Kotlin forum.
We're giving away four copies of Kotlin in Action and have Dmitry Jemerov & Svetlana Isakova on-line!
See this thread for details.
Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

LZ78 and FileReader  RSS feed

 
Charlie Murphy
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi all, i am making a compression program based on the original LZ78 and am having a problem getting my String variable that holds a line of a file to read in another line after the 1st one. It processes the first line fine but it simply wont continue reading in new lines after exiting the inner loop. i get a StringIndexOutOfBounds exception as a result of not having any more input to read at runtime. heres my code:



how do i make it continue reading new lines in after the 1st one is processed?
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You are reading in the second line, the error probably occurs at this line 28 (in the post), where you are getting the charAt(i). The cause is: you never reset i to 0, so you are trying to read characters from the index where the previous line ends, not from the start of the new line.

So in between your while statements you should re-set i to 0:


You might also consider using a for loop to get the characters in the String:


You will also probably have to reset the value of other variables. You have a lot of variables that are intialized outside of the first while loop, and which you continue to append to in the inner while loop. I don't know exactly how your compression works, but I would suggest moving the variables w, tuple, node, and K to their smallest possible scope. For example, K only needs to exist inside the inner loop (if it needs to exist at all). And I think the others only need to exist for each line. So you might change the code to something like this:
 
Rob Spoor
Sheriff
Posts: 21090
85
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think the I/O forum is a better place for this. Moving it there.
 
Charlie Murphy
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:You are reading in the second line, the error probably occurs at this line 28 (in the post), where you are getting the charAt(i). The cause is: you never reset i to 0, so you are trying to read characters from the index where the previous line ends, not from the start of the new line.


Thanks Steve that fixed it! It obvious your a gun :mrgreen:

You wouldnt happen to know what the Exception is for Out Of Memory would you?

i looked through the java api, but couldnt find anything. currently my code is:



but as i found out, that goes to the catch block with any kind of runtime exception (ie. NullPointer, SIOutOfBounds etc)

i just want the one for Memory!
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OUtOfMemoryError is an Error, not an Exception. Errors tend to be un-recoverable, so you generally don't catch them.

OOME means that you have consumed all the memory available to the JVM - probably because wherever you are storing your values never releases resources. You can't fix it at runtime. You can increase the memory available to the JVM, see this 'Tuning' guide. You should also take a long hard look at the code and see if there isn't a memory leak in your code - where you store and hold values that should be thrown away and garbage collected.
 
Charlie Murphy
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:OUtOfMemoryError is an Error, not an Exception. Errors tend to be un-recoverable, so you generally don't catch them.


I see.

heres a screenshot of the dictionary that my program stores:



the insertion basically goes:

1. Seen a B? no, add to dictionary, output <0, B>
2. Seen a A? no, add to dictionary, output <0, A>
3. Seen a D? no, add to dictionary, output <0, D>
4. Seen a B? yes, add to String word, move along
5. Seen a B followed by an A? no, add BA to dictionary output <1, A>
6. Seen a D? yes, add to String word, move along
7. Seen a D followed by a G? no, add DG to dictionary output <3, G>

and so on. So the dict just keeps growing as new words get added.

There must be something i can do as most publications about LZ78 state the 2 main ways to prevent the dictionary from growing forever are to a. limit the size of the dictionary or b. check to see if theres available memory to insert another item (in my case a node containing a String, an Integer and 2 pointers).

while a. is an obvious solution and easy to implement, it means when deciding on the MAXSIZE of the dictionary, I have to ensure that I select a size that caters to all memory sizes. There are so many people around still using Celerons with 256MB of memory that I would have to use that as a basis and declare a size that will fit into that.

but more elegant is a program that extends to the full capability of the system your working on - if your working on a quadcore with 4GB of memory or an Amstrad it wont matter - the program will still run ( granted it might take a few years to run on the Amstrad, but the point stands)

what i basically want in pseudocode is:



I see what your saying about memory leaks and while my program is far from robust, Thats not the reason i want to throw the exception. I want the dictionary to be a memory hog while its compressing so i get the best possible compression ratio and only throw away the dictionary when i dont have a choice.
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"but more elegant is a program that extends to the full capability of the system your working on "

Java will only consume as much memory as you allow it to using the command line arguments. The only way to make it consume all the resources of the system would be to set the value to something reasonably high. See the tuning article I linked to above to see how to assign more memory to the JVM.

Once you have assigned an appropriate amount of memory you can look into monitoring available memory. See using MemoryMXBeans java.lang.management.MemoryMXBean to check available versus used memory. Another option would be to use Runtime, like this blog shows, or an external JAR such as SIGAR.

The problem with doing a check like isThereEnoughMemoryLeft(); is that you may not know the amount of memory your node will take. If the String being stored has been interned then the memory usage might be (Integer) + (Integer) + 2(Integer) - or next to nothing, just a reference for the String, the Integer value, and the other 2 references. If the String has not been interned then the memory usage might be [(Integer) + (Integer) + (Byte * length of String)] + (Integer) + 2(Integer) (the String is internally a character array. So your node would have a reference to the String, the String a reference to the character array, and the character array a slot for each character in the String), which could be relatively small to relatively large, depending on the size of the String - you wouldn't know until the String is made, and it could be at that point that you run out of memory.

Another option would be to get a warning about when a specific amount of total available memory for your application has been reached. Let's say if you get within 1024 bytes of the maximum memory, then dump some/part of your dictionary. I think threshold notification is built into the MXBean API, and could be implemented as a polling service otherwise.


There are a lot of elegant solutions for maintaining large amounts of data. Everything from holding a portion of the hard-disk as a memory-mapped area (slow but could be nearly infinite in size) to time-managed caches where nodes that haven't been used in a while get re-assigned to softer references that are more likely to be garbage collected when memory runs low. This may save you from having to completely dump the dictionary. You should Google Java Object Caching and see if there is a good implementation for you (or at least look up Java Soft Reference to see if that helps).
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!