Help coderanch get a
new server
by contributing to the fundraiser
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Overhead of Collection Classes

 
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi there!

So i am currently preparing for a class that I am taking starting in January and I need to decide on a language of choice for the quarter. Java is naturally my instinctual choice because it is what I know and understand best. With that, I am unsure of the limitations of Java with respect to Java Collections.

In particular, suppose we have a defined matrix of indexes and documents. The indexes can be thought of words and the documents can be thought of as text documents. I want to be able to perform a search on every document in my document collection for each index in my dictionary (words). This being a very computationally heavy (and potentially memory heavy) task, I am unsure if utilizing classes from the Collections classes make sense in doing so. I am accustomed to using Java for higher level computation. This is more lower level computation and I was curious if anyone had any input with respect to how Collections (List, Set, etc . . .) perform in this environment.

I appreciate everyone's time and effort in answering this question.

Anthony

** Maybe this should have gone in the General Java forum? I am sorry if I posted in the wrong area. **
 
Marshal
Posts: 79535
380
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A "defined matrix" sounds a bit like a Map. You can find out the overhead for the different types of Collection from their implementations. For example,

The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.

appears in the ArrayList documentation but there is rather less about LinkedList

The memory usage of the Collections classes is probably less than you think; there was a post about a similar topic yesterday, here. I don't know about the speed, nor about the memory consumption, of the low-level searches. I suspect you will go into quadratic complexity if you do straight searches. What about a Map whose key is a word and whose value is a List of locations where that word was found. Then you can search your documents in linear time, insert those values into your Map in amortised constant time, and search in constant time. You can get Maps which work in two directions somewhere: try Apache Commons first.
 
Campbell Ritchie
Marshal
Posts: 79535
380
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, I think "Java in General" would have been better. Moving.
 
Ranch Hand
Posts: 92
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Anthony Andras wrote:
In particular, suppose we have a defined matrix of indexes and documents. The indexes can be thought of words and the documents can be thought of as text documents. I want to be able to perform a search on every document in my document collection for each index in my dictionary (words).



So you want to know which words in a document are also present in the dictionary, right?

The best way is to hold the dictionary in a HashSet. Then checking whether a word is in the HashSet is a constant, or O(1), operation.

This means that walking through a document word by word and checking whether each word is in the dictionary will be a linear, or O(N), operation. This is because the actual "is the word in the dictionary"-check is O(1).

This is as fast as it gets really and you can do it in Java easily.
 
And then the flying monkeys attacked. My only defense was this tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/t/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic