• Post Reply Bookmark Topic Watch Topic
  • New Topic

Java Regex find all words between second word and first decimal  RSS feed

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Considering the following csv file;
1.,John,Johnsson,1.31.22,+52.39,28
2.,Robert,Robertsson,Boston,2.08.03,+1.29.20,26
3.,Mick,Mickelsson,New,York,2.10.03,+1.31.20,24

Suggestions on a Java-regex that would find nothing on the first line,
",Boston" or "Boston," on the second and
",New,York" or "New,York," on the third?

Optimally it would find an unlimited amount of words between the second word and the comma preceeding the first following time value.



Particularly the "?=^,{3}"-part seems to not do as intended.


Cheers
 
Saloon Keeper
Posts: 8109
143
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to CodeRanch!

Your pattern only replaces strings that start with a zero width match on the start of the input followed by three commas.

You could write patterns that describe the various elements of your input, and then use a scanner to parse them:

Now you have a strongly typed object of which you can manipulate the members. If you're interested in the cities (or whatever the strings between the names and the times mean) and nothing else, just operate on that field of the object.
 
Rob Bank
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. It feels like I am close and I'm dealing with very small data i.e. small computations, so I'm keen on getting this done as a oneliner. On the other hand, I' won't get any closer than this so in the mean time I'm grateful for your alternative solution and will give that a go.

Building a bit further on my approach though. First, I noticed I left the multi-line mode part out. Also, these are the ones I've been working on;

and


Building on your comment below, I think this second one allows the three commas not to be consecutive.
So, as I see it, there are two options here. Either incorporate a lookahead, which I am unable to do successfully (The preceeding token is not quantifiable);

or, since the second Capturing Group (.*?) does match as intended, try to extract that instead of using lookahead and rather than using the full match (by adding \2 or similar, I'm too noob on regex to know whether something like this is fundamentally possible);


Cheers,
Rob
 
Rob Bank
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
After having looked more closely at your suggestion Stephan; wow, that's really elegant and strong! In my current code I'm doing some other text modifications also (before what I'm trying to accomplish with my regex line discussed here), and can conclude that transferring that stuff to this Participant object would make things much more transparent and likely less buggy. Appreciate very much also the level of customization!! Wow.
 
Rob Bank
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Rob Bank wrote:After having looked more closely at your suggestion Stephan; wow, that's really elegant and strong! In my current code I'm doing some other text modifications also (before what I'm trying to accomplish with my regex line discussed here), and can conclude that transferring that stuff to this Participant object would make things much more transparent and likely less buggy. Appreciate very much also the level of customization!! Wow.



Since I was unable to deploy a workable solution to "City cannot be resolved to a type" in the suggested Object, I continued on the oneliner. So for anyone with a similar problem, this does it;

I.e. the way to reference a Capturing Group was replaceAll("…", "$1") and not replaceAll("…\1", "")
 
Stephan van Hulst
Saloon Keeper
Posts: 8109
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I used the class City as an example. You have to declare and implement it yourself if you want a strongly typed object, or you can replace all mentions of City with String.
 
Rob Bank
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I come from a no-programming background and have for a couple of years done some minor projects out of own interest. From very simple batch scripts to VBA to VB scripting to HTML+CSS to Matlab to SQL and now I’ve taken on Java. My earlier productions have taken shape by trial-and-error (making most code really ugly btw) and I’m struggling with understanding the object oriented fundamentals, especially in Java. Thought I could make use of this stuff but turns out it was too big a piece to chew. Me asking about it at this point would be sandbox-level, not subject to questions here. However, having an example customized to my project will definitely help in learning the fundamentals so thumbs up.
 
Stephan van Hulst
Saloon Keeper
Posts: 8109
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I find the quickest route to understanding OO is to try and write solutions to challenges you've set yourself, using OO principles as you understand them. Then put them up for review. When you have experienced programmers commenting on why or why not you should be doing certain things, it really helps.

The most important thing you can take away from this topic is that you should always try to model your data as strongly typed objects. Avoid the String class for anything other than names and identifiers. That's why in my example I used City instead of String. In retrospect, I should also have used PersonalName to encapsulate firstName and lastName.

As you've noted yourself, having a class that accurately represents the concepts that you want to work with, you can easily reuse them for different purposes. Once you've parsed a line of CSV to a Participant, you can not only get the collection of cities from it, but also the duration and difference, as strongly typed Duration objects.

Keep it up!
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!