Forums Register Login

removing numerics from a TreeSet

+Pie Number of slices to send: Send
I am using a TreeSet to tokenize a string
The output is sorted with numerics first followed by words
e.g. 13 26 45 and before etc.....................

Is there a tidy way to remove the numerics ?

Last bit of my code is :-

// print the words separating them with a space
for(String word : words) {
System.out.print(word + " ");
}
} catch (FileNotFoundException fnfe) {
System.err.println("Cannot read the input file - pass a valid file name");
}

Bob M
+Pie Number of slices to send: Send
It is an issue to modify a collection while traversing it, because of the ConcurrentModificationException.

A trivial way to avoid this in our case would be to prepare the list of the elements to remove in the first step
and then to remove them in the second step with the help of the method "remove" inherited from Set.


+Pie Number of slices to send: Send
OK

Do you mean "remove element by value" ?

My problem is that I am repeatedly doing this exercise with a different string each time and I do not know whether it contains any numerics or not

If it doesn't I don't need to do anything further but if it does I do wish to remove same

Bob M
+Pie Number of slices to send: Send
Is your problem how to remove something from the set or how to pick the entries to be removed or both?
+Pie Number of slices to send: Send
 

Bob Matthews wrote:I am using a TreeSet to tokenize a string
The output is sorted with numerics first followed by words
e.g. 13 26 45 and before etc.....................


I suspect that's because it's simply using String's natural order, which will place "numerics" first because numeric characters are lower in the collating sequence than letters.

However, that has nothing to do with whether the String is a valid number or not - it will place "1A" before "AA" as well.

Is there a tidy way to remove the numerics ?


Well, one way would be to iterate through the Set and remove any word whose first character is not a letter. You might want to look at the Character class API to see how you might do that.

However, an even better idea might be not to put those "numerics" into the Set to begin with.

Programming is a bit like medicine in that respect: "Prevention" is usually far better than "cure".

You might also want to have a look at the StringsAreBad page.

HIH

Winston
+Pie Number of slices to send: Send
Make the array into a Stream and filter it possible with a regex to match numbers.
+Pie Number of slices to send: Send
My input is a string of headlines text
After applying my TreeSet code I finish up with a string such as "12 435 as before criteria............"

I would rather not play with the input string but leave it as is
I am happy with the output ordered string

Now, all I wish to do is to remove the "12 435 " from the left side of the output string

My code so far is the following:-



Just not sure how to finish the task

Bob M
+Pie Number of slices to send: Send
Seems to me the best thing would be to just not put those non-words into the TreeSet.
+Pie Number of slices to send: Send
and how do I not put digits into the TreeSet ?

Bob M
+Pie Number of slices to send: Send
Before you add a word to the TreeSet, examine it to see if it's numeric. If it is, then don't add it. You already have the logic in your code, but it only rejects empty strings. Change it to reject numeric strings instead.
+Pie Number of slices to send: Send
Hi

how about this?

if((!word.equals("")) && (!word.matches("[0-9]+")))


Bob M
+Pie Number of slices to send: Send
That will find “natural numbers” all right, but not real numbers nor all integers.
+Pie Number of slices to send: Send
Don't use word.equals("") but word.isEmpty()
+Pie Number of slices to send: Send
 

Bob Matthews wrote:Now, all I wish to do is to remove the "12 435 " from the left side of the output string...


OK, but what if your 'headline' is "Murder at 10A Rillington Place"?

What is "10A" in that case? A "numeric"? It'll certainly be sorted like one - ie, it'll sort before any normal English word.

Do you want "10A" to be removed from your TreeSet? And what about "A1" (as in: "IMF upgrades US credit rating to A1")?

This is what I mean by 'numbers' - a String is NOT a number, and never will be: "37" + "37" is "3737"; not "74".

What you are seeing is the natural sort order for Strings, and you will have to decide exactly what kinds of Strings you want removed from your Set, and (possibly more importantly), what ones you want to keep.

Winston
This is my favorite show. And this is my favorite tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 786 times.
Similar Threads
How to write strings from a HashSet to an output file, each token on a separate line?
TreeMap question#1
How to keep formatting while reading files
StringTokenizer Class......
Punction Correct (Sentances)
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 28, 2024 17:46:25.