There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
fred rosenberger wrote:Have you considered that a regex may not be the right way to tackle this task?
There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
fred rosenberger wrote:so I would think the correct question would be "How do i extract only uppcase letters from a string?" There are many ways, but which you'd use depends on the specific details.
Why not iterate through the string, character by character, and only print/keep upper case letters?
Why not do a substituion, replacing all non-uppercase letters with a null character?
Why does it HAVE to be a regular expression?
Carey Brown wrote:Sounds like you've got it under control now. Here's some Java-8 ways to do it (I'm still on my Java-8 learning curve).
Output:
Stephan van Hulst wrote:Personally I think I would have used a regex after all:
Junilu Lacar wrote:Stephan's solution is a little cleaner, IMO. I still don't think you need a (complicated) regex though
or
The only difference between the two is whether you're trimming a leading space or a trailing space.
Simpler yet, you can use Collectors:
Mike London wrote:
Did you spend some time playing around to get the expected result as I would have or did you touch-type the code above?
Junilu Lacar wrote:If you want to strictly define words as any sequence of [A-Za-z] and ignore any non-word chars like commas, semicolons, apostrophes, and other kinds of punctation, you can do this:
The expression on line 4 uses the \p{Alpha} POSIX character class. You can find more like these to experiment with here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html -- which one you use all depends on how you want to define a "word" and what characters to consider as word boundaries.
The use of symbolic names is just to clarify intent.
There are three kinds of actuaries: those who can count, and those who can't.
Junilu Lacar wrote:If you want to strictly define words as any sequence of [A-Za-z] and ignore any non-word chars like commas, semicolons, apostrophes, and other kinds of punctation, you can do this:
Junilu Lacar wrote:Mike,
After reviewing this entire thread, I just realized that you never actually posted your own solution code before others and myself started giving you ours. If this is homework, then the cat's already out of the bag and there's no taking those solutions back. If you're going to submit any of these solutions as your homework, there's nothing we can do about it now. However, I would caution you that these are public forums and instructors are pretty good at finding plagiarized work.
We do have a standing policy about students doing their own homework. Just sayin'...
If this isn't homework, then no harm, no foul.
Stephan van Hulst wrote:I'm not actually quite sure how this is more clear than a simple regex that describes the exact pattern for an upper case word surrounded by word boundaries.
Junilu Lacar wrote:I just showed an alternative that didn't use an explicit while() loop but used a stream all the way instead.
Junilu Lacar wrote:OP: good to know that this isn't homework. Thank you for clearing that up.
As you can see, there are as many ways to skin a cat as there are in painting a still life of a fruit bowl. Beauty is in the eye of the beholder, so take all these suggestions and decide for yourself how you'll take them, understand them, and use them to improve your own coding style.
Junilu Lacar wrote:
Stephan van Hulst wrote:I'm not actually quite sure how this is more clear than a simple regex that describes the exact pattern for an upper case word surrounded by word boundaries.
I made no such claim nor was one intended to be implied. The solution you offered used a while loop. I just showed an alternative that didn't use an explicit while() loop but used a stream all the way instead. The context of the WORD_BOUNDARY symbolic constant was "define words as any sequence of [A-Za-z]"
OP should note that the "[^\\p{Alpha}]" expression is equivalent to "[^A-Za-z]" so either one could be used. It's all up to you to decide what you think is more readable or consider as "simpler". I actually think that "[^A-Za-z]" is more straightforward but I wanted to point OP to the other possible character classes he might consider using.
luck, db
There are no new questions, but there may be new answers.
Darryl Burke wrote:What's wrong with sentence.replaceAll("(?:\\b)[a-z]*(?:\\b)", "")? And if the double spaces left behind are an issue, that could be chained to .replace(" ", " ").
Carey Brown wrote:Output:
Junilu Lacar wrote:Stephan's solution is a little cleaner, IMO. I still don't think you need a (complicated) regex though
or
The only difference between the two is whether you're trimming a leading space or a trailing space.
Simpler yet, you can use Collectors:
Carey Brown wrote:These were the three test cases I attributed to Junilu. Did I get this wrong?
Stephan van Hulst wrote:My point is that one of those concerns is not part of the requirement. The requirement is literally "find all words that match my definition of a word". That you split the input along some possibly incompatible word boundary first is not necessary.
Junilu Lacar wrote:
Carey Brown wrote:These were the three test cases I attributed to Junilu. Did I get this wrong?
All three versions are functionally equivalent. Look further down in the thread from there. There are a couple more snippets I think where the word boundary is not defined as " " but rather as [^A-Za-z].
Politics n. Poly "many" + ticks "blood sucking insects". Tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
|