• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Junilu Lacar
  • Martin Vashko
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Scott Selikoff
  • salvin francis
  • Piet Souris

How to merge consecutive values into a range?

 
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorted String İnput: 1, 2, 2A, 2B, 2C, 3, 3A, 4, 5, 6, 32C, 32D, 50, 51/1, 51/2, 60, 61, 62, 200-2E, 200-2F, 200-2G, 200-2H, 201C, 201/21P, 201/21R, 201/21S,300,300A,301-2A, 542/2K, 542/2L,583-1, 583-585D, 583-585E, 605, 605A, 605B,605C 800A.
Question is  about merging consecutive values into a range. E.g. 4,5,6 are consecutive values, so range is [4-6], and 2A, 2B, 2C are consecutive values, so range is 2[A-C]. No other values can be in a range.


output:
1  
2
2[A-C]
3
3A
[4-6]
32[C-D]
50
51/1
51/2
[60-62]
200-2[E-H]
201C
201/21[P-S]
300
300A
301-2A
542/2[K-L]
583-1
583-585[D-E]
605
605[A-C]
800A





code output= merged: [1, 2, 2A, 2B, 2C, 3, 3A, [4-6], 32C, 32D, 50, 51/1, 51/2, [60-62], 200-2E, 200-2F, 200-2G, 200-2H, 201C, 201/21P, 201/21R, 201/21S, 300,300A,301-2A,542/2K, 542/2L,583-1, 583-585D, 583-585E, 605, 605A, 605B,605C 800A]



for example, I sorted 4,5,6 consecutive values in the range [4-6].
I have done consecutive numbers "[]" in this format but I couldn't make the letters. Sorted list should not be disrupted.

 
Marshal
Posts: 3838
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch!

I think we need more precise specification of the requirements. Generally, the input strings need to be split into parts on which the detection of consecutive ranges will happen. For example, "4" contains just one part. "2A" contains two: "2" and "A". "583-585D" contains how many? I can think of "583", "585" and "D", but perhaps "583-585" needs to stay together. What are all the rules that apply here?

Should he ranges be merged only for the last part of the string, or for the other parts too? For example, should "3A", "4A" and "5A" be merged into "[3-5]A"?

Why isn't "51/1" and "51/2" merged into "51/[1-2]" in the example output?

Without precise specification, no one can even tell whether a solution does meet all the requirements or not.
 
kiraz cevik
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.
Expected output is to only combine ranges for the trailing alphabets. 3A "," 4A "and" 5A numbers are consecutive but not letters.So it won't be sorted.for example : 3A 4B 5C = [3-5][A-C]

Values that are pure numbers can be in a range, e.g. [8-11]. Values that only differ by a letter at the end can be in a range. If the trailing letters are consecutive, they must be sorted.e.g 200-2E, 200- 2F, 200-2G, 200-2H =>200 [E-H]
e.g 200-2E, 200- 2F, 200-2H => 200-2[E-F] , 200-2H. It's broken.

"51/1" ve "51/2" "51 / [1-2]" No need to edit it.

I have to check the end of the value.

If there is a letter at the end of the value and the next value is a consecutive letter, I must include them in the range.
 
Martin Vashko
Marshal
Posts: 3838
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What we know so far:

Only the trailing part of the string has to be processed. The entire trailing part contains either digits or letters.

Questions:
  • What about uppercase and lowercase letters - are they the same or different? Is it ok to transform "a" and "B" to "[a-b]"? Or "[A-B]" Or not at all?
  • The trailing part is made of all consecutive digits/letters found at the end of the string? Consider "1234" and "1235". What's the desired output: "[1234-1235]" or "123[4-5]"?
  • Does the trailing part contain more than one letter at all? Consider the sequence "A01", "A02", ..., "A30". Is "A[01-30]" the desired output?

  • "51/1" ve "51/2" "51 / [1-2]" No need to edit it.


    Sorry, I don't understand this. Can you explain it in more detail, please?
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Just think about this string, please.
    Sorted String İnput: 1, 2, 2A, 2B, 2C, 3, 3A, 4, 5, 6, 32C, 32D, 50, 51/1, 51/2, 60, 61, 62, 200-2E, 200-2F, 200-2G, 200-2H, 201C, 201/21P, 201/21R, 201/21S,300,300A,301-2A, 542/2K, 542/2L,583-1, 583-585D, 583-585E, 605, 605A, 605B,605C 800A.
    All possibilities should be made according to this string.I've written all the necessary examples in the array.
    No lower case.I wrote what should be in the range of output.
    I've listed consecutive numbers.e.g 60 61 62 [60-62]
    The consecutive letters at the end of the values should be put together.20A, 20B => 20 [A-B] =>To be included in the range, numbers must be the same, letters must be consecutive.
     
    Marshal
    Posts: 66575
    251
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Welcome to the Ranch again.
    I am afraid that last example doesn't add anything to the discussion. Please explain the rules you are using; something like 200‑2F might mean 200F 201F 202F or it might include 200A 201B 202C and 202F. Youi haven't made the rules clear yet.
    When you make the rules really clear, those rules will help explain how to write your code.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    THANKS

    200F 201F 202F pure numbers aren't. 200, 201 , 202 => [ 200-202]

    Only consecutive letters should be placed in the range.
    2A, 2B, 2C =>2[A-C]

    32C, 32D =>32[C-D]
     
    200-2E, 200-2F, 200-2G, 200-2H =>200-2[E-H]

    201/21P, 201/21R, 201/21S   =>  201/21[P-S]

    542/2K, 542/2L  => 542/2[K-L]

    583-585D, 583-585E => 583-585[D-E]

    605A, 605B,605C =>605[A-C]

    To be included in the range, numbers must be the same, letters must be consecutive.I've written all possibilities

     
     
    Campbell Ritchie
    Marshal
    Posts: 66575
    251
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    So what's 200‑2 and how does it differ from 200, 201, 202?
     
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    It seems to me you have these  rules:

    General pattern:
    number[chars][A-Za-z]

    number has to be consecutive to have a sequence
    If chars is present, then they have to be equals() to be in a sequence
    If the trailing letter is present, they have to be consecutive to have a sequence

    A sequence is then abbreviated using: first + "-" + last

    Where first is the first value in the sequence and last is the last value in the sequence.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    A project is also required.These are the numbers of something..I need to get the numbers into the range so they take up less space.
     
    Martin Vashko
    Marshal
    Posts: 3838
    66
    Netbeans IDE Oracle Firefox Browser
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Is this a homework, or a real-world problem? I ask because in a real-world problem, I'd expect that the example you listed in the first post might not represent all possible patterns in the input data.

    You mentioned "[8-11]" earlier. Does it mean that "Z8", "Z9", "Z10" and "Z10" will be merged into "Z[8-11]"? Even if the token length doesn't match?

    I understand that all trailing digits are processed as a token. Therefore, "ABC1234" and "ABC1235" will be merged into "ABC[1234-1235]". What about consecutive letters? Will "123AZ" and "123BA" be merged into "123[AZ-BA]" (because, assuming an English alphabet, BA is the next item in sequence after AZ). Similarly, should "8Z", "8AA" be merged into "8[Z-AA]"? (See how columns in Excel beyond the first 26 are named if you have trouble to understand these two examples.)

    I still don't understand why "51/1" and "51/2" in the example weren't merged into "51/[1-2]". Did you make a mistake in your original post, or is it part of the requirement?

     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    So how does 300A, 301B, 302C get merged? Would it be [300-301][A-C] [300-302][A-C]?
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    yes.thank you. @Junilu Lacar
    English is not my native language.I'm sorry I couldn't explain.
     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Martin Vashko wrote:
    I still don't understand why "51/1" and "51/2" in the example weren't merged into "51/[1-2]". Did you make a mistake in your original post, or is it part of the requirement?


    It seems my interpretation is correct or at least very close. Those items are not merged because the "chars" part of the pattern does not equals().

    So I think this:

    51/1, 52/1, 53/1 ==> not merged

    51/1A, 51/1B, 51/1C ==> 51/1[A-C]

    51/1A, 52/1B, 53/1C ==> [51-53]/1[A-C]

    @OP please confirm my understanding is correct.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    a real-world problem. "Z8", "Z9", "Z10" ve "Z10"  =>No initial letters .
    The letters are always at the end.There will only be one letter at the end of the value.
     
    Martin Vashko
    Marshal
    Posts: 3838
    66
    Netbeans IDE Oracle Firefox Browser
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Junilu Lacar wrote:So how does 300A, 301B, 302C get merged? Would it be [300-301][A-C]?


    Assuming this is true, what would we do with this sequence: 300A, 300B, 300C, 301A, 301B, 301C, 302A, 302B, 302C?

    Edit: and why not [300-302][A-C]?
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    thanks yes Junilu Lacar ..
    [300-302][A-C]  must be.
     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Martin Vashko wrote:

    Junilu Lacar wrote:So how does 300A, 301B, 302C get merged? Would it be [300-301][A-C]?


    Assuming this is true, what would we do with this sequence: 300A, 300B, 300C, 301A, 301B, 301C, 302A, 302B, 302C?


    I see what you're getting at but that's probably getting a little too fractal. It seems the rules are more straightforward: 300[A-C], 301[A-C], 302[A-C] is how I would expect that to be merged. You're probably thinking that would now look like a pattern that would be again merged to [300-302][A-C], right? I think you only do one level of merge.
     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Martin Vashko wrote:

    Junilu Lacar wrote:So how does 300A, 301B, 302C get merged? Would it be [300-301][A-C]?


    Assuming this is true, what would we do with this sequence: 300A, 300B, 300C, 301A, 301B, 301C, 302A, 302B, 302C?

    Edit: and why not [300-302][A-C]?



    Sorry, that's what I meant to write.
     
    Bartender
    Posts: 3668
    151
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Two other quick questions:

    1) are the letters always consequtive? Or can we have, say, 1A, 1C, 1Q?
    2) are the letters always in alphabetic order, like 1A, 1B, 1C? Or is 1C, 1A, 1B possible?

    Sorry if these questions are already answered, I might have missed them.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I sorted the string.

    The order of the string should not be broken.
     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Piet, by my understanding,

    1A, 1C, 1Q would not be merged

    1C, 1A,1B ==> 1C, 1[A-B]

     
    Martin Vashko
    Marshal
    Posts: 3838
    66
    Netbeans IDE Oracle Firefox Browser
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Piet Souris wrote:Two other quick questions:

    1) are the letters always consequtive? Or can we have, say, 1A, 1C, 1Q?
    2) are the letters always in alphabetic order, like 1A, 1B, 1C? Or is 1C, 1A, 1B possible?

    Sorry if these questions are already answered, I might have missed them.


    1) They may or may not be consecutive. When they are, all consecutive items in a sequence are merged using brackets, as shown in the first post.
    2) The list of strings is sorted before being processed. Now this might pose a problem: "8A, 9A, 10A" would be sorted into "10A, 8A, 9A", making sequence detection more complicated.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I'm sorting the string first.
    Then I'm sorting the consecutive.
    I'm waiting for your help
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Junilu Lacar wrote:Piet, by my understanding,

    1A, 1C, 1Q would not be merged

    1C, 1A,1B ==> 1C, 1[A-B]


    exactly
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Piet Souris wrote:Two other quick questions:

    1) are the letters always consequtive? Or can we have, say, 1A, 1C, 1Q?
    2) are the letters always in alphabetic order, like 1A, 1B, 1C? Or is 1C, 1A, 1B possible?

    Sorry if these questions are already answered, I might have missed them.



    the letters always in alphabetic order
     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Here's how I would approach it.

    Track the three parts of the pattern. Only the number part is going to be required. If the number changes and is consecutive from the previous number in the input, you have at least a number sequence. If the optional chars part is present, then they need to be equals. At the same time track the optional letter part. If you have a sequence in either the number or letter part or both, do the range merge once one of them is broken.

    That is

    1C, 2D, 3F ==> [1-2][C-D], 3F

    1C, 2D, 4E ==> [1-2][C-D], 4E

    The middle chars part, if present will end a sequence when it changes between consecutive input values.
     
    Saloon Keeper
    Posts: 6462
    61
    Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    This all seems less than concise. I see this as you have an input stream that can be compressed to an output stream in such a way as that the output stream could later be uncompressed into the original input stream.

    The compression process would start with first sorting the stream which would require breaking each element down into three components: number, chars, and letters. Or, as a regular expression: "(([0-9]+)([^a-zA-Z0-9]*))([a-zA-Z]+)". The sorting order would be based, first on the numeric value of group(2) followed by the character order of group(3) followed by character order of group(4).

    Ranges would be based on entries with identical numeric values and identical characters and followed sequential letter values.

    So, where have I gone wrong? And if I've gone wrong can someone correct this in clearly stated rules? This seems to be very elusive here.
     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Carey Brown wrote:Or, as a regular expression: "(([0-9]+)([^a-zA-Z0-9]*))([a-zA-Z]+)".



    That regex will not match input like 51/1 or 51-1 and similar input.

    I got better match results with the sample inputs that OP gave with this: (([0-9]+)([^0-9]*[^a-zA-Z]*))([a-zA-Z]?)
     
    Junilu Lacar
    Marshal
    Posts: 14501
    240
    Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Junilu Lacar wrote:I got better match results with the sample inputs that OP gave with this: (([0-9]+)([^0-9]*[^a-zA-Z]*))([a-zA-Z]?)


    I don't think even this is sufficient still. Best approach is to write some automated tests that you can use to break down different cases and validate against a candidate regex.
     
    Piet Souris
    Bartender
    Posts: 3668
    151
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I am experimenting with a TreeMap with a suitable Comparator. A bit clumsy right now, but this is how far I am.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    There will be no lower case [a-z] after the numbers.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Piet Souris wrote:I am experimenting with a TreeMap with a suitable Comparator. A bit clumsy right now, but this is how far I am.



    I looked at the output of the code.
    Close to what I want.
     
    Martin Vashko
    Marshal
    Posts: 3838
    66
    Netbeans IDE Oracle Firefox Browser
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    I looked at the output of the code.
    Close to what I want.


    That's good news! Can you tell us how is this solution different from what you need?
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Piet Souris wrote:I am experimenting with a TreeMap with a suitable Comparator. A bit clumsy right now, but this is how far I am.


    code output:
    1: [ ]
    2: [ , A, B, C]
    3: [ , A]
    4: [ ]
    5: [ ]
    6: [ ]
    32: [C, D]
    50:[]
    51/1:[]
    51/2:[]
    60: [ ]
    61: [ ]
    62: [ ]
    200-2:[E, F, G, H]
    201:[C]
    201/21:[P, R, S]
    300:[ , A]
    301-2:[A]
    542/2:[K, L]
    583-1:[]
    583-585:[D, E]
    605:[ ,A, B, C]
    800:[A]

    Why does [] constantly produce it and doesn't merge in range?
    How to combine letters in the range as follows?
    output:
    1  
    2
    2[A-C]
    3
    3A
    [4-6]
    32[C-D]
    50
    51/1
    51/2
    [60-62]
    200-2[E-H]
    201C
    201/21[P-S]
    300
    300A
    301-2A
    542/2[K-L]
    583-1
    583-585[D-E]
    605
    605[A-C]
    800A

     
    Piet Souris
    Bartender
    Posts: 3668
    151
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    hi Kiraz,
    the map that I created was only meant to be an intermediate result. It was my intention to go on processing. My further ideas were:

    create an enum

    and have new new Class:


    Now, I would have a method that processes my intermediate results in the map, and creates a List<FinalToken>. Supose you have this entry in the map:

    Because of the empty char at the start of the list (coming from the Token 300 without any chars) I create two FinalTokens: with 300 and type JUST_A_NUMBER, and 300 with list [A, B, C], with type NUMBER_WITH_ChAR.
    The last one will output the toString result: 300[A-C].
    So I have a List<FinalToken>. My last step is to go from top to bottom, and when I encounter type JUST_A_NUMBER, I will check if the next couple of elements are also of type JUST_A_NUMBER and have consecutive ints. Then I will output the [300 - 350] form.

    Well, sounds more complicated than it is (I hope), but I have not yet had time (or courage) to implement all this (well, I do have the enum).

    This was what I had in mind. Wondering if others have come up with someting more simple.
       
     
    Carey Brown
    Saloon Keeper
    Posts: 6462
    61
    Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Carey Brown wrote:I don't understand how P, R, S is a sequence when it is missing the 'Q'.



    Our alphabet does not have Q but can be added.
     
    kiraz cevik
    Greenhorn
    Posts: 23
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Piet Souris wrote:hi Kiraz,
    the map that I created was only meant to be an intermediate result. It was my intention to go on processing. My further ideas were:

    create an enum

    and have new new Class:


    Now, I would have a method that processes my intermediate results in the map, and creates a List<FinalToken>. Supose you have this entry in the map:

    Because of the empty char at the start of the list (coming from the Token 300 without any chars) I create two FinalTokens: with 300 and type JUST_A_NUMBER, and 300 with list [A, B, C], with type NUMBER_WITH_ChAR.
    The last one will output the toString result: 300[A-C].
    So I have a List<FinalToken>. My last step is to go from top to bottom, and when I encounter type JUST_A_NUMBER, I will check if the next couple of elements are also of type JUST_A_NUMBER and have consecutive ints. Then I will output the [300 - 350] form.

    Well, sounds more complicated than it is (I hope), but I have not yet had time (or courage) to implement all this (well, I do have the enum).

    This was what I had in mind. Wondering if others have come up with someting more simple.
       


    Thank you. Do I create an enum class separate from the code you originally wrote?Is the FinalToken class created separately
     
    Piet Souris
    Bartender
    Posts: 3668
    151
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Carey Brown wrote:I don't understand how P, R, S is a sequence when it is missing the 'Q'.


    I asked a question about that. The answer was (if I recall corretly) that these letters are always consecutive. Otherwise I guess it will be even more complex....
     
    The two armies met. But instead of battle, they decided to eat some pie and contemplate this tiny ad:
    Java file APIs (DOC, XLS, PDF, and many more)
    https://products.aspose.com/total/java
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!