• Post Reply Bookmark Topic Watch Topic
  • New Topic

Java Regex find all words between second word and first decimal  RSS feed

 
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Considering the following csv file;
1.,John,Johnsson,1.31.22,+52.39,28
2.,Robert,Robertsson,Boston,2.08.03,+1.29.20,26
3.,Mick,Mickelsson,New,York,2.10.03,+1.31.20,24

Suggestions on a Java-regex that would find nothing on the first line,
",Boston" or "Boston," on the second and
",New,York" or "New,York," on the third?

Optimally it would find an unlimited amount of words between the second word and the comma preceeding the first following time value.



Particularly the "?=^,{3}"-part seems to not do as intended.


Cheers
 
Saloon Keeper
Posts: 8225
144
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to CodeRanch!

Your pattern only replaces strings that start with a zero width match on the start of the input followed by three commas.

You could write patterns that describe the various elements of your input, and then use a scanner to parse them:

Now you have a strongly typed object of which you can manipulate the members. If you're interested in the cities (or whatever the strings between the names and the times mean) and nothing else, just operate on that field of the object.
 
Rob Bank
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks. It feels like I am close and I'm dealing with very small data i.e. small computations, so I'm keen on getting this done as a oneliner. On the other hand, I' won't get any closer than this so in the mean time I'm grateful for your alternative solution and will give that a go.

Building a bit further on my approach though. First, I noticed I left the multi-line mode part out. Also, these are the ones I've been working on;

and


Building on your comment below, I think this second one allows the three commas not to be consecutive.
So, as I see it, there are two options here. Either incorporate a lookahead, which I am unable to do successfully (The preceeding token is not quantifiable);

or, since the second Capturing Group (.*?) does match as intended, try to extract that instead of using lookahead and rather than using the full match (by adding \2 or similar, I'm too noob on regex to know whether something like this is fundamentally possible);


Cheers,
Rob
 
Rob Bank
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
After having looked more closely at your suggestion Stephan; wow, that's really elegant and strong! In my current code I'm doing some other text modifications also (before what I'm trying to accomplish with my regex line discussed here), and can conclude that transferring that stuff to this Participant object would make things much more transparent and likely less buggy. Appreciate very much also the level of customization!! Wow.
 
Rob Bank
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Rob Bank wrote:After having looked more closely at your suggestion Stephan; wow, that's really elegant and strong! In my current code I'm doing some other text modifications also (before what I'm trying to accomplish with my regex line discussed here), and can conclude that transferring that stuff to this Participant object would make things much more transparent and likely less buggy. Appreciate very much also the level of customization!! Wow.



Since I was unable to deploy a workable solution to "City cannot be resolved to a type" in the suggested Object, I continued on the oneliner. So for anyone with a similar problem, this does it;

I.e. the way to reference a Capturing Group was replaceAll("…", "$1") and not replaceAll("…\1", "")
 
Stephan van Hulst
Saloon Keeper
Posts: 8225
144
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I used the class City as an example. You have to declare and implement it yourself if you want a strongly typed object, or you can replace all mentions of City with String.
 
Rob Bank
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I come from a no-programming background and have for a couple of years done some minor projects out of own interest. From very simple batch scripts to VBA to VB scripting to HTML+CSS to Matlab to SQL and now I’ve taken on Java. My earlier productions have taken shape by trial-and-error (making most code really ugly btw) and I’m struggling with understanding the object oriented fundamentals, especially in Java. Thought I could make use of this stuff but turns out it was too big a piece to chew. Me asking about it at this point would be sandbox-level, not subject to questions here. However, having an example customized to my project will definitely help in learning the fundamentals so thumbs up.
 
Stephan van Hulst
Saloon Keeper
Posts: 8225
144
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I find the quickest route to understanding OO is to try and write solutions to challenges you've set yourself, using OO principles as you understand them. Then put them up for review. When you have experienced programmers commenting on why or why not you should be doing certain things, it really helps.

The most important thing you can take away from this topic is that you should always try to model your data as strongly typed objects. Avoid the String class for anything other than names and identifiers. That's why in my example I used City instead of String. In retrospect, I should also have used PersonalName to encapsulate firstName and lastName.

As you've noted yourself, having a class that accurately represents the concepts that you want to work with, you can easily reuse them for different purposes. Once you've parsed a line of CSV to a Participant, you can not only get the collection of cities from it, but also the duration and difference, as strongly typed Duration objects.

Keep it up!
 
Rob Bank
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Appreciate the cheering. Made the threshold to ask this sandbox level question lower; after a ton of researches and attempts on this I'm just marginally closer and could use a push to a hallelujah moment..

How would I go about to invoke Participant, state what to parse to it and make it a list from a main in another class? As a start, just to reference firstName, I'm thinking something like;
, when gives;
[1.,John,Johnsson,1.31.22,+52.39,28, 2.,Robert,Robertsson,Boston,2.08.03,+1.29.20,26, 3.,Mick,Mickelsson,New,York,2.10.03,+1.31.20,24  ... ]

lines show error "Type mismatch: cannot convert from List<String> to List<Participant>"

Cheers,
Rob
 
Stephan van Hulst
Saloon Keeper
Posts: 8225
144
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Rob,

Java is a strongly typed language, meaning that you can't perform implicit conversions between two unrelated types. A String is not a Participant, and therefore a simple assignment will not convert a String to a Participant. You need to explicitly call the parse() method on each line of CSV to convert it.

Let's say that you're only interested in printing the ID, last name and the duration of each participant. This is how you could do it:

Now, since Java 8 we prefer a functional style of programming, and you can achieve the same thing like this:

In both cases, you still need a method to format (convert to String) a Duration:
 
Rob Bank
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Stephan,

Haven't gotten into typing mode yet but seems like the weeekend is saved
 
Rob Bank
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hope you don't mind two fast ones;
  • Why the colon in %s:? This would imply lastName's are followed by a colon, no?
  • Row 5-6: in my head, the ending parentheses shouldn't be there. E.g. id is a variable, not a method, no? "The method id() is undefined for the type Participant". Removing them and making the variables public in Participant does make the code compilable, however incompletely, nothing after it compiles. Hmm...
  •  
    Stephan van Hulst
    Saloon Keeper
    Posts: 8225
    144
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Rob Bank wrote:Why the colon in %s:? This would imply lastName's are followed by a colon, no?


    Yes. I just figured that "2. Robertsson: 02:08:03" was a nice way to format an object in this example. You can make it whatever you need.

    Row 5-6: in my head, the ending parentheses shouldn't be there. E.g. id is a variable, not a method, no?


    True, and if the example method I wrote was part of the Participant class, you could directly reference the fields. However, I wrote the example as if it was an external class making use of the Participant class. In that case, you need to expose your fields through getter methods, such as id(), lastName() and duration(). When I first wrote Participant I didn't include those getters for brevity's sake, but normally you would.

    nothing after it compiles


    What do you mean? Does it give you a compilation error?
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thank you!

    My understanding was that the %s: needed to match the string it represents, now I know you can manipulate strings with it.

    I do most of the stuff in my project under this if-statement
    By the poorly worded "nothing after it compiles" I mean that this
    piece of code breaks the if, which I guess should happen since it is not yet correctly implemented.

    In my first post, as I was looking for a simple regex, I thought that my data structure visualization was sufficient. So let me provide the non-misleading structure;
    A,1.,John,Johnsson,1.31.22,100
    A,2.,Robert,Robertsson,Boston,1.41.22,+10.00,95
    A,3.,Mick,Mickelsson,New,York,2.32.30,+1.31.20,92


    I added the first value course to the Participant object. It takes values A, B or C. I also added point (I award 0 - 100 points per participant, a sort of "goodness-of-performance" measure).
    This may be me misunderstanding the intention, but I removed both the if (this.cities.contains(null))... and || cities == null arguments since not every participant have a city. I've read up on getter but I'm not sure how to apply it in this context, here's my best shot;
    And in the main, since I understand the pre-Java-8-style better;

    Besides the getter implementation, I am thinking that the parsing is underperforming, maybe I'm misusing or incorrectly referencing line / lineOfCsv?
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    By the way, in case you're wondering, I put your method for formatting a Duration in the main.
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    And for the sake of not being misleading a second time, +1.31.20 should be +1.01.08
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    And the COURSE_PATTERN should be ([0-9])(.,[A-Z]).
     
    Bartender
    Posts: 3617
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Why is ".....,New,York,........." added as two cities: "New", and "York" ?
     
    Carey Brown
    Bartender
    Posts: 3617
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    A,1.,John,Johnsson,1.31.22,100
    A,2.,Robert,Robertsson,Boston,1.41.22,+10.00,95
    A,3.,Mick,Mickelsson,New,York,2.32.30,+1.31.20,92

    First line would be missing difference.
    Characters [A-C] are in the first field, not the second.
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    At URL read and corresponding txt file write, .replaceAll(" +", ",") gives ...,New,York,...

    My understanding was that Duration difference = scanDuration(scanner, () -> new ParseException("Missing difference.", 0)); adds a zero in case this.difference = null? Any case, if that's the case, I guess I should remove || difference == null , or an easy alternative would be to hardcode difference 0:00:00 at file generation to each course winner.

    According to regex101, the first capturing group ([0-9]) matches A, B or C. Also, see my update before your post.

    Also, I noticed I was missing this. on line 20 but the issue remains.
     
    lowercase baba
    Bartender
    Posts: 12592
    50
    Chrome Java Linux
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I wouldn't do it with a single regular expression.  I'd break it up into pieces...Have something that finds the position at the end of the second word. have something that finds the first time value after that point. then have something that finds everything in between.

    While writing a single regular expression may look cool, in six months will you remember everything about it?  What will you do if your requirements change?  What will the next person who comes along be able to figure out about your regex?

     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Sorry & Thanks Carey,

    Doublechecked the COURSE_PATTERN, you're right.

    Switching to e.g. (.)(,[0-9].,) once I get home. I'll test the code on a file not having any null fields also.
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hmm, yeah that's not going to work for id's >9 but that should be easy to fix.
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Heureca
    Much appreciated!
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    This got really ugly, long, and buggy - I believe I'm just scratching the surcafe of what the Participant object could provide in terms of flexibility.

    So I've previously written points to new lines based on course, producing the unwanted feature of different points in case shared place;
    A,1.,A1,John,Johnsson,1.31.22,+0.00,100
    A,2.,A2,Robert,Robertsson,Boston,1.41.22,+10.00,95
    A,2.,A2,Mick,Mickelsson,New,York,2.32.30,+1.01.08,92

    Now I look to iterate through the list to find these Mick Mickelsson's and raise their points. To simplify, I added a coursePlace Participant object. The below stuff copes well with the above three-lined file, but (i) not if there are more than two participants sharing a place and (ii) not if there is additionally, say, in course B, a shared place.

    Any thoughts on how to do this bugfree and more elegantly, or just bugfree?



     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8225
    144
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    The code is buggy and ugly because you're not separating your concerns. You need to write classes and methods that are responsible for smaller bits of logic. Right now you have one heap of code that does everything. A few pointers:

  • Don't declare variables until you need them. People used to declare variables all at the top because that's what the compiler required. It's horribly out of date now.
  • Declare your readers and writers in try-with-resources statements. It will make sure they get released after you're done.
  • Write your identifiers out in full. What does mPoint mean?
  • Use proper casing. If a word is a separate word in the English language, capitalize it if it appears in the middle of an identifier. Example: getFirstName, not getfirstName.
  • Avoid nesting many block statements. This is called the "arrow anti-pattern". You can do this by writing more methods and reordering your logic.
  • If one code is concerned with reading, and other code is performing business logic, they don't belong in the same method body.

  • Now, I tried to rewrite some of the code, but there are a few questions I have first:

    As Carey has noted, the list of cities is not really a list of cities, but just one city name split into separate words. Why did this happen? I'm not interested in technical answers, I want to know why the URL was translated that way.

    What does the first letter (A, B or C) represent? The code says "course", but that doesn't make sense to me.

    How are points calculated?

    Why are two participants in the same place, when they have different durations?
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks for the pointers! I learned a great deal.
  • Ok
  • Ok (nice to learn that try blocks take care of the closing by default!)
  • mPoint is short for matcherPoint. I see a lot of Pattern p = … and Matcher m = …  so this seemed like a suitable naming - switching.
  • Ok
  • I’ll think about what would be logical to split into classes/methods and with that as base maybe try some GuardClause conversions of remaining superfluous nesting.
  • Ok. If I understand correctly, e.g. lines 25-33 should be put outside the if-bracket. I guess this is readability related / common praxis more than computationally significant?

  • Since why something happened seems like a question longing for a technical answer, I’m not sure what info you are after. But cities is neither really really cities - it can be a club or a city or whatever the results uploader decides to put there. I have seen up to three words used on occasion; however mostly it is null. Let’s call it a property of the participant that in theory can be anything between zero and n strings. Heck, come to think of it, there might even be integers there There will not be a + sign though. Yes?

    Course & letter is the official naming, where the letter stands for level of challenge in alphabetically ascending order. If you’re feeling invincible, run the toughest Course A.

    Point calculation:
    A: 100, 95 and 90 for top-3, then -2points per place, down to a minimum of 1point/participant
    B = 0.88*A rounded to even numbers, down to a minimum of 1point/participant
    C= 0.77*A rounded to even numbers, down to a minimum of 1point/participant

    Why are two… Oops, because of my manual summary mistake. This is what it would look like on file;
    A,1.,A1,John,Johnsson,1.31.22,+0.00,100
    A,2.,A2,Robert,Robertsson,Boston,1.41.22,+10.00,95
    A,2.,A2,Mick,Mickelsson,New,York, 1.41.22,+10.00,92
     
    Carey Brown
    Bartender
    Posts: 3617
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    You are parsing the same CSV line twice. Why not parse once and make a list and from then on refer to the list.
     
    Carey Brown
    Bartender
    Posts: 3617
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    By adding these two methods  (sorry about the indentation) to Participant cleans up your other large block of code.
    What if your 'participant' and 'participant2' refer to the same object?

    I haven't run across mPoint.reset(line) before. What does reset() do? Javadocs were unhelpful.
    The resulting modified string is not being used.


     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8225
    144
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Rob Bank wrote:(nice to learn that try blocks take care of the closing by default!)


    Not any old try-block, try-with-resources. The syntax is different. Make sure to read up on them.

    mPoint is short for matcherPoint. I see a lot of Pattern p = … and Matcher m = …  so this seemed like a suitable naming - switching.


    Not really. This is called "Systems Hungarian" notation. Don't prefix your identifiers with a shorthand for the type. The type is clear from the declaration. Name it after what concept it represents in your code. In this case, pointMatcher. However, if you refactor your code so that you only declare variables when you need them, I'm not even sure that you need to differentiate it from other matchers.

    If I understand correctly, e.g. lines 25-33 should be put outside the if-bracket. I guess this is readability related / common praxis more than computationally significant?


    It's a basic principle of object oriented programming. Separate your concerns. If I'm maintaining code and I'm looking for a bug that has to do with how your model classes interact with each other, I don't want to be interrupted by lines that read from a file. You can greatly simplify your code by converting your entire CSV data to a business model first, and then perform logic on your business model. For instance, you can write a class named RaceResult that consists of a list of Participant objects. First parse the entire file to a RaceResult object, and add methods to it that process the list of participants.

    But cities is neither really really cities - it can be a club or a city or whatever the results uploader decides to put there. I have seen up to three words used on occasion; however mostly it is null. Let’s call it a property of the participant that in theory can be anything between zero and n strings.


    However, it's still ONE thing right? Even if the name is something like "New York" or "Run Forrest Run", it's still ONE string that should not contain commas in the middle. I understand that the data comes from an URL (why not from a HTTP request body?) and that the + signs are converted to commas. Why not just to spaces?

    Course & letter[/i] is the official naming, where the letter stands for level of challenge in alphabetically ascending order. If you’re feeling invincible, run the toughest Course A.


    I guess in my code I would call it challengeRating.

    A,2.,A2,Mick,Mickelsson,New,York, 1.41.22,+10.00,92


    So there's actually a lot of redundant data in your CSV. All you need is the participant name, the team name, the challenge rating and the duration. The rest you can calculate. This is how I would format the input data per participant:

    Now, if the CSV really were formatted like that, here's the code I would use to process it all:

    Using this code, I found that you probably have another mistake in your CSV. Robert and Mick have the same finish time and challenge rating, but they have a different number of points. Why?
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks a million guys. Been offline a while and will be looking closer at this starting tomorrow. Studying it & relating theory should take a few days.

    Meanwhile just a quick response on the questions.
    matcherPoint.reset(line) sets the matching region of line to start,end (matcherPoint is initiated to "" i.e. matcher.region(0,0))

    Yes, ONE thing / property. The choise of commas; can't recall any particular reason for it at start of project, other than that's the data separation formatting on file orders and generations at work and that it would be the best format considering a possible future database setup. I’m basically reading the source code of a URL. As far as I understand from a googling, that is the HTML request body, no? Don’t have access to my code currently so can’t say how it’s done technically.

    The initial thought of calculating point went way over my head so I settled for listing them – great. Calculating place also fixes an issue I’ve been having with incorrect place after removing some performances due to certain criteria – great. Since you didn’t know about that, I’m curious as to why you prefer calculating place and difference instead of reading them. I guess I will find that out while looking closer at your code though.

    Robert and Mick sharing place but having different point was an example of what I was trying to fix with my arrow anti-pattern loop. And while it fixed just that, the issues I had were;

    ... (i) not if there are more than two participants sharing a place and (ii) not if there is additionally, say, in course B, a shared place.

     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8225
    144
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Rob Bank wrote:Yes, ONE thing / property. The choise of commas; can't recall any particular reason for it at start of project, other than that's the data separation formatting on file orders and generations at work and that it would be the best format considering a possible future database setup.


    How so? How does distributing ONE property over multiple comma separated values help with a future database setup? It just makes processing more difficult, as you've seen at the start when you made this topic. I really recommend keeping the team name as one value without commas, and if the name itself contains a comma, you escape it or encode it in some way. Maybe CSV already has an escape sequence for this.

    I’m basically reading the source code of a URL. As far as I understand from a googling, that is the HTML request body, no? Don’t have access to my code currently so can’t say how it’s done technically.


    An URL doesn't have source code. Did you mean the source code of a web page? An URL is just the address of a web page. The source code of a web page is also not a request body. A request body is part of a HTTP POST request that you send from a web page to a server when you submit a form or perform an AJAX request.

    You can get data from an URL in the form of query parameters, but when you perform a request that triggers a change on the server, it's better to send the data in the request body.

    If you can, please show us how your application currently gets the data, because it's very confusing.

    I’m curious as to why you prefer calculating place and difference instead of reading them. I guess I will find that out while looking closer at your code though.


    Unless you're caching calculations, you want data to be normalized. That means in its simplest form, without redundancy. The place number, points and difference are redundant because you can calculate them from the other properties. Why? As you've seen, you had to alter your example data several times now, because they contained conflicting information. When you normalize data, you can't have conflicts.

    Robert and Mick sharing place but having different point was an example of what I was trying to fix with my arrow anti-pattern loop.


    And now you don't have to fix it at all, because those points are calculated, rather than stored. You can easily generate an overview of points per person like this:
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8225
    144
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I didn't get it earlier, but are you scraping the contents from a web page? That has nothing to do with an URL, nor request bodies. You're just processing HTML. If that's the case, I don't understand why words in a team name would be separated by + signs.
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Ahaa, I assumed you had a typo in previous post with the plus not being preceeded by a space. It is not a plus sign, the plus is just the regex one or more;

    At URL read and corresponding txt file write, .replaceAll(" +", ",") gives ...,New,York,...



    HTTP POST and AJAX what? In a rush so in layman terms, I read a web-page and get the text that includes all HTML-tags. I filter out the text I want as CSV using a set of .replaceAll(). Sorry for the confusion, I did not put much effort to the phrasing as by "reading a web adress" I assumed it was clear that I'm reading the contents "under it". I still don't have access to my code so I can't comment on the technical way in which the data is read.

    Why so? My earlier comment was based on the assumption that you were talking about the + sign in regex terms. I.e. that you were querying on my choise of separating each object (not sure I'm using the right word here, but with each object I mean each firstName, each lastName, each duration and so on..) with comma instead of space. Sure, it sounds more straightforward to extract teamNames by implementing other separation of their words than the other data's comma if I were to use it, but I'm looking to dump it alltogether. So, as I see it, I can either try to implement some .replaceAll at initial file generation to exclude the teamNames being read (which if can be done, I have no clue how to), or utilize the DURATION_PATTERN for which space / comma is indifferent, apart from that the pattern would have to look slightly different.

    Thanks, I get your point in principle on redundancy.
     
    Rob Bank
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Let me rephraze the clumsily written second last sentence.
    So, as I see it, I can either try to implement some .replaceAll when I'm modifying the total HTML string I have read, to exclude the teamNames (which if can be done, I have no clue how to), or utilize the DURATION_PATTERN for which space / comma is indifferent, apart from that the pattern would have to look slightly different.
     
    Carey Brown
    Bartender
    Posts: 3617
    47
    Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    So, it seems like you're doing a crude minimalist parse of HTML to generate a CSV file, and now are attempting to parse the CSV file. In the process some information is being lost, such as "New York" now appearing as "New,York". I'd be interested in seeing a few snippets of some raw HTML that you are processing, maybe there's a better way. What HTML parser are you using? JSoup?
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 8225
    144
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Ohhh my bad. I've been working with C# and the replace method doesn't take a regular expression.

    I'm with Carey. Show us the raw HTML and let us know what the ultimate goal of your application is.
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!