• Post Reply Bookmark Topic Watch Topic
  • New Topic

get the last two segments from a dot separated string  RSS feed

 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have something like:
xxx.x.x.x.x.33423.AMDAC-4

The number of dot separated segment in front of the string may vary, but it doesn't matter since I only need to extract the last two segments from the string (int this case : 33423 and AMDAC-4). How do I do this efficiently? I need to process hundreds of thousands of these strings every day. it is guaranteed that the segments will always be separated by dots only (no whitespaces in the string) thanks
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check out String.split()
 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was actually hoping for another solution since creating an array of string everytime I do this seems like a waste of memory. If every string consists of 10 segments, 100K of strings would produce a million of string array elements. is this the only way? thanks
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why would you keep them all in memory? I'd imagine you'd use a loop and do something with the parts that you need, discarding everything else by de-referencing them and letting garbage collection take care of memory management. Besides, this sounds like speculative optimization on your part. If you really want to know if performance is taking a hit, use a profiler, not your gut. I suspect it's not going to be anywhere near as bad as you think it would be if you just select the right program structure and processing strategy.
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Also, split() takes a regex and a limit. Given the right regex, you'll only have two strings to deal with: the part that you want and the part that you can discard - the limit would be 1. If performance is really a big deal, then you can just use a Pattern, compile it, and reuse it to process all strings that you need to process.
 
Ron McLeod
Bartender
Posts: 1603
232
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could use a combination of String#replace and String#split like this:


Output:
Input: xxx.x.x.x.x.33423.AMDAC-4 ==> seg#1=33423, seg#2=AMDAC-4
Input: xxx.x.x.x.x.33423.AMDAC-4.monkey.bars ==> seg#1=monkey, seg#2=bars
Input: banana ==> did not match expected pattern

 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ron, for future reference, the Ranch is NotACodeMill (←click) -- we think it's better to let people figure out the solution themselves. Giving helpful hints and constructive criticism is fine but spoon feeding solutions is strongly discouraged.
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And BTW, I have already tested a 16-character regex pattern that would be appropriate for what the OP needs to do. A method to process one string and return the desired portion of it would be 2 or 3 lines long. That's all you'd really need if you use the Pattern class.
 
Ron McLeod
Bartender
Posts: 1603
232
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok - understood. I really only offered-up one line of solution - the rest of the code was there to demonstrate it, but I get what you are saying.

Thanks for the advice.
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ron McLeod wrote:Thanks for the advice.

Thanks for understanding

Seems I misunderstood the OP's requirements: the regex string that would give the last two segments as separate strings is 18 characters long, at least the one I came up with that works. Hints: Check out Pattern.matcher(), Matcher.matches(), and Matcher.group().
 
Campbell Ritchie
Marshal
Posts: 56587
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another way to do it: use the lastIndexOf method (twice) and then the substring method.
 
Jayesh A Lalwani
Rancher
Posts: 2762
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Really, if you have an application that is hyperfocused on performance, and needs to be able to handle a lot of operations like these, you should try to avoid parsing Strings like these altogether. There's no good ways to parse a string, only ways that are not bad. The questions to ask yourself when you are hyperfocused on performance is not "how can I parse this String faster" but "How do I avoid parsing this string"

Chances are that when you start looking at performance. You will find a lot of things that can be improved before you thinking about improving your parsing of input data. If and when you have absolutely identified that parsing of this data is a bottleneck, and you the performance gained by improving your input parsing far outweigh the effort spent in doing the optimization then you have 2 options

a)change the wire format to carry structured (and compressed) data. Don't send a string that looks like xxxx.xxx.xxx.xxxx.xxx. Send an object that contains 5 strings. You can serialize the objects. You can implement your own serialization. Or you can use something like google protobuf that provides a good amount of compression in the wire format
b) Change the Input stream so that it's not protocol agnostic.

Let me explain here, after I add an caveat:- You should really be doing this as the last resoirt. No one except the people who build low level frameworks do this.

Generally, when we build a system that has different pieces of software talking to each other, we thnk of the software in layers. The lower layers are generally made to be reusable and agnostic to the requirements. So, for example, if you are implementing some code that is responsible for parsing a file that contains data structured in a certain format, the most usual way of building it is to use one of Java's Input IO classes to read parts of the file in memory, parse the data in memory, and chuck out data you don;t need, while keeping the data that you do need. There are many advantages tot his design including but not limited to:- Your business logic (logic to parse the data ) is separated from the mechanism of reading the data (reading the file). This is what you should do 99.9% of the time. There is however one disadvantage:- performance. The IO classes are really blind to the structure of the data and cannot optimize reading the data based on the structure. For example, in your case, if Java's input stream reader knew that you wanted to chuck out data that was in between the first 4 dots, it would have just chucked it out for you while it was reading the characters over the stream, right? You wouldn't even have to parse the String. By breaking the principle of keeping things seperate, you gain some performance benefit

Again, most poeple don't need to do this kind of optimization. There are places where such kind of optimization makes sense. For example, if you look at Tomcat code, you will find streams that have native knowledge of HTTP protocol. Tomcat people did this because they wanted Tomcat to serve very high loads. Another example is Netty, which is a framework that allows you to implement your own high performance protocols. For example, I prototyped my own "REST"* server using Netty that reduces overhead by a factor of 4 as compared to Tomcat by using techniques like these. We ended up not using it because we decided that completing the implementation was not worth it. Really, is it worth to have the overhead of the parser go down from 4ms to 1 ms? Usually, it;s not worth it!

*REST is in quotes because the server was really a pared down HTTP server that appeared to support HTTP protocol, but supported features only required for REST calls (for example, no session management)
 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, but the problem is, I'm just a vendor. While I've been saying that to my manager, in the end, the customer has the final say, and they refuse to change how they do things, so.....
Thanks for the reply.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!