• Post Reply Bookmark Topic Watch Topic
  • New Topic

removing numerics from a TreeSet  RSS feed

 
Bob Matthews
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am using a TreeSet to tokenize a string
The output is sorted with numerics first followed by words
e.g. 13 26 45 and before etc.....................

Is there a tidy way to remove the numerics ?

Last bit of my code is :-

// print the words separating them with a space
for(String word : words) {
System.out.print(word + " ");
}
} catch (FileNotFoundException fnfe) {
System.err.println("Cannot read the input file - pass a valid file name");
}

Bob M
 
Ivan Jozsef Balazs
Rancher
Posts: 999
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is an issue to modify a collection while traversing it, because of the ConcurrentModificationException.

A trivial way to avoid this in our case would be to prepare the list of the elements to remove in the first step
and then to remove them in the second step with the help of the method "remove" inherited from Set.


 
Bob Matthews
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK

Do you mean "remove element by value" ?

My problem is that I am repeatedly doing this exercise with a different string each time and I do not know whether it contains any numerics or not

If it doesn't I don't need to do anything further but if it does I do wish to remove same

Bob M
 
Ivan Jozsef Balazs
Rancher
Posts: 999
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is your problem how to remove something from the set or how to pick the entries to be removed or both?
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bob Matthews wrote:I am using a TreeSet to tokenize a string
The output is sorted with numerics first followed by words
e.g. 13 26 45 and before etc.....................

I suspect that's because it's simply using String's natural order, which will place "numerics" first because numeric characters are lower in the collating sequence than letters.

However, that has nothing to do with whether the String is a valid number or not - it will place "1A" before "AA" as well.

Is there a tidy way to remove the numerics ?

Well, one way would be to iterate through the Set and remove any word whose first character is not a letter. You might want to look at the Character class API to see how you might do that.

However, an even better idea might be not to put those "numerics" into the Set to begin with.

Programming is a bit like medicine in that respect: "Prevention" is usually far better than "cure".

You might also want to have a look at the StringsAreBad page.

HIH

Winston
 
Campbell Ritchie
Marshal
Posts: 56525
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Make the array into a Stream and filter it possible with a regex to match numbers.
 
Bob Matthews
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My input is a string of headlines text
After applying my TreeSet code I finish up with a string such as "12 435 as before criteria............"

I would rather not play with the input string but leave it as is
I am happy with the output ordered string

Now, all I wish to do is to remove the "12 435 " from the left side of the output string

My code so far is the following:-



Just not sure how to finish the task

Bob M
 
Paul Clapham
Sheriff
Posts: 22819
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seems to me the best thing would be to just not put those non-words into the TreeSet.
 
Bob Matthews
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
and how do I not put digits into the TreeSet ?

Bob M
 
Paul Clapham
Sheriff
Posts: 22819
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Before you add a word to the TreeSet, examine it to see if it's numeric. If it is, then don't add it. You already have the logic in your code, but it only rejects empty strings. Change it to reject numeric strings instead.
 
Bob Matthews
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

how about this?

if((!word.equals("")) && (!word.matches("[0-9]+")))


Bob M
 
Campbell Ritchie
Marshal
Posts: 56525
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That will find “natural numbers” all right, but not real numbers nor all integers.
 
Campbell Ritchie
Marshal
Posts: 56525
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't use word.equals("") but word.isEmpty()
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Bob Matthews wrote:Now, all I wish to do is to remove the "12 435 " from the left side of the output string...

OK, but what if your 'headline' is "Murder at 10A Rillington Place"?

What is "10A" in that case? A "numeric"? It'll certainly be sorted like one - ie, it'll sort before any normal English word.

Do you want "10A" to be removed from your TreeSet? And what about "A1" (as in: "IMF upgrades US credit rating to A1")?

This is what I mean by 'numbers' - a String is NOT a number, and never will be: "37" + "37" is "3737"; not "74".

What you are seeing is the natural sort order for Strings, and you will have to decide exactly what kinds of Strings you want removed from your Set, and (possibly more importantly), what ones you want to keep.

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!