I am using a TreeSet to tokenize a string The output is sorted with numerics first followed by words
e.g. 13 26 45 and before etc.....................
Is there a tidy way to remove the numerics ?
Last bit of my code is :-
// print the words separating them with a space
for(String word : words) {
System.out.print(word + " ");
}
} catch (FileNotFoundException fnfe) {
System.err.println("Cannot read the input file - pass a valid file name");
}
It is an issue to modify a collection while traversing it, because of the ConcurrentModificationException.
A trivial way to avoid this in our case would be to prepare the list of the elements to remove in the first step
and then to remove them in the second step with the help of the method "remove" inherited from Set.
Bob Matthews wrote:I am using a TreeSet to tokenize a string
The output is sorted with numerics first followed by words
e.g. 13 26 45 and before etc.....................
I suspect that's because it's simply using String's natural order, which will place "numerics" first because numeric characters are lower in the collating sequence than letters.
However, that has nothing to do with whether the String is a valid number or not - it will place "1A" before "AA" as well.
Is there a tidy way to remove the numerics ?
Well, one way would be to iterate through the Set and remove any word whose first character is not a letter. You might want to look at the Character class API to see how you might do that.
However, an even better idea might be not to put those "numerics" into the Set to begin with.
Programming is a bit like medicine in that respect: "Prevention" is usually far better than "cure".
You might also want to have a look at the StringsAreBad page.
Before you add a word to the TreeSet, examine it to see if it's numeric. If it is, then don't add it. You already have the logic in your code, but it only rejects empty strings. Change it to reject numeric strings instead.
Bob Matthews wrote:Now, all I wish to do is to remove the "12 435 " from the left side of the output string...
OK, but what if your 'headline' is "Murder at 10A Rillington Place"?
What is "10A" in that case? A "numeric"? It'll certainly be sorted like one - ie, it'll sort before any normal English word.
Do you want "10A" to be removed from your TreeSet? And what about "A1" (as in: "IMF upgrades US credit rating to A1")?
This is what I mean by 'numbers' - a String is NOT a number, and never will be: "37" + "37" is "3737"; not "74".
What you are seeing is the natural sort order for Strings, and you will have to decide exactly what kinds of Strings you want removed from your Set, and (possibly more importantly), what ones you want to keep.
Winston
Post by:autobot
This is my favorite show. And this is my favorite tiny ad:
a bit of art, as a gift, the permaculture playing cards