Rish Gupta

Greenhorn

Posts: 29

posted 4 years ago

Hi, I am not sure what these are called but here's what I'm trying to do.. please bear with me, i'm not good at explaining.. Say we have a string S='abcabc' now we need to figure out how many similar strings we will get if we go cyclic. for example using the above case:

0th cycle : abcabc //this is the oroginal string

1st cycle : bcabca //here i th character = i+1 (so we are shifting by one) and putting the first character in the end

2nd cycle : cabcab

3rd cycle : abcabc //here we get the same string as the original string so counter is incremented by 1

4th cycle : bcabca

5th cycle : cabcab

6th cycle : abcabc //here we are back to where we started from so counter is again incremented by 1 and program exits returning 2

I hope you get what I'm trying to do and below is the code that I came up with:

The problem is that the above code is time consuming (O of N^2), I am trying to achieve it in O of N..

any ideas would be appreciated. TIA

0th cycle : abcabc //this is the oroginal string

1st cycle : bcabca //here i th character = i+1 (so we are shifting by one) and putting the first character in the end

2nd cycle : cabcab

3rd cycle : abcabc //here we get the same string as the original string so counter is incremented by 1

4th cycle : bcabca

5th cycle : cabcab

6th cycle : abcabc //here we are back to where we started from so counter is again incremented by 1 and program exits returning 2

I hope you get what I'm trying to do and below is the code that I came up with:

The problem is that the above code is time consuming (O of N^2), I am trying to achieve it in O of N..

any ideas would be appreciated. TIA

posted 4 years ago

What's N? Is N the number of strings or is N the number of characters. If it's number of strings, you are already O(n) . If it's number of characters, You cannot get O(n) using a brute force method. Equals method of string is O(n). So any method that cycles the string and then compares it with the original string will be O(n^2)

posted 4 years ago

As Jayesh says, a brute force approach will always be O(n²). I suspect you can get that down quite a bit by eliminating strings that can't possibly rotate by understanding some of the properties that such a string must possess: one of which is that unless ALL characters are the same, its length cannot be prime (because it must be made up of subsequences of length > 1 that repeat). Or possibly easier: if the length is prime, then all letters MUST be the same.

So, if you factorize the length, you'll then be able to work out the lengths of subsequences that you need to check for, which still may take some time, but NOT O(n²), and if you start with the smallest first (* - see below), you'll hopefully be able to eliminate them the quickest.

That said, I've never tried the problem; but that's how I'd start going about it if I was looking for an

Winston

(*) I'm not by any means sure that you

- 1

Rish Gupta wrote:The problem is that the above code is time consuming (O of N^2), I am trying to achieve it in O of N..

any ideas would be appreciated. TIA

As Jayesh says, a brute force approach will always be O(n²). I suspect you can get that down quite a bit by eliminating strings that can't possibly rotate by understanding some of the properties that such a string must possess: one of which is that unless ALL characters are the same, its length cannot be prime (because it must be made up of subsequences of length > 1 that repeat). Or possibly easier: if the length is prime, then all letters MUST be the same.

So, if you factorize the length, you'll then be able to work out the lengths of subsequences that you need to check for, which still may take some time, but NOT O(n²), and if you start with the smallest first (* - see below), you'll hopefully be able to eliminate them the quickest.

That said, I've never tried the problem; but that's how I'd start going about it if I was looking for an

*efficient*method (as opposed to a simple one ).

Winston

(*) I'm not by any means sure that you

*should*start out with the shortest first but, just off the top of my head, it "feels" right.

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

Maxim Karvonen

Ranch Hand

Posts: 121

12

posted 4 years ago

- 2

Winston already offered one idea. You don't need to find all offsets, you may get a result using a smallest one.

Another idea is that dealing with "looping" strings is not always efficient. We may try to "unroll" your string into an infinite sequence. Or maybe finite, try to figure it yourself. Not a very new idea, some tasks on programming contests are also based on it. For your example you will get

I think, you should get the idea. Your original string is substring of some other string. And at this point the only thing you need is some efficient (O(N)) substring algorithm. I prefer Knuth–Morris–Pratt algorithm for such tasks (it's very simple in implementation), but you may use any other substring algorithm with same complexity.

Another idea is that dealing with "looping" strings is not always efficient. We may try to "unroll" your string into an infinite sequence. Or maybe finite, try to figure it yourself. Not a very new idea, some tasks on programming contests are also based on it. For your example you will get

I think, you should get the idea. Your original string is substring of some other string. And at this point the only thing you need is some efficient (O(N)) substring algorithm. I prefer Knuth–Morris–Pratt algorithm for such tasks (it's very simple in implementation), but you may use any other substring algorithm with same complexity.

posted 4 years ago

- 1

I was thinking ( and I think that's what WInston is going for) is that if you split the string using the factors of the length, you don;t need to roll it

FOr example, if you have

If you had

2 parts of 3 characters aba bab no match

3 parts of 2 characters ab ab ab match on 2th and 4th roll

6 parts of 1 character a b a b a b no match

always a match on 6th roll

So, total matches = 3

If you had

2 parts of 3 characters aaa aaa match on 3rd toll

3 parts of 2 characters ab ab ab match on 2th and 4th roll

6 parts of 1 character a a a a a a match on 1st and 5th roll

always a match on 6th roll

So, total matches = 6

You can figure out an algorithm. Trying to estimate the cost:- for each

I am not good at this math, so I might be wrong. Just from gut feel I think it would be N Log N.

FOr example, if you have

*abcabc*The length is 6. So, you can split it into 2 parts of 3 characters or 3 parts of 2 chanracters, or 6 parts of 1 characters. So, let's spit into 2:-*abc*and*abc*. Now, since both these parts are equal, you know that if you would have rolled the string 6 times, it would come up to be the same as the original string on the 3rd rol. Splitting into 3 parts of 2 characters give you*ab*,*ca*and*bc*. They are not equal. So, you will not get it on the 2nd and 4th roll. if you split it into 6 parts of 1 characters, you again don't find that they are not equal, so you don't get the same string on 1st, and 5th roll. You always get the same string on the last 6th roll. So, you get the string on the 3rd and 6th roll. So, the answer is 2If you had

*ababab*2 parts of 3 characters aba bab no match

3 parts of 2 characters ab ab ab match on 2th and 4th roll

6 parts of 1 character a b a b a b no match

always a match on 6th roll

So, total matches = 3

If you had

*aaaaaa*2 parts of 3 characters aaa aaa match on 3rd toll

3 parts of 2 characters ab ab ab match on 2th and 4th roll

6 parts of 1 character a a a a a a match on 1st and 5th roll

always a match on 6th roll

So, total matches = 6

You can figure out an algorithm. Trying to estimate the cost:- for each

*f*that is a factor of N, you will have to run a string comparison N/f times. So, if number of factors is F, you will run string comparison F*N/F times = N. However, the strings will be small. So, the cost will be like O(N*N/F). SInce, F = N/Log N, you should get O(N log N).I am not good at this math, so I might be wrong. Just from gut feel I think it would be N Log N.

Rish Gupta

Greenhorn

Posts: 29

posted 4 years ago

K, So I did what you guys suggested (big thanks for all your inputs) and here is what I came up with.. could you confirm if this was at least Oof(N log N) if not OofN..

@Maxim - I din't really get what you suggested.. actually I have never heard of Knuth–Morris–Pratt algo.. I'll try to do some reading online on it and see if I can come up with something better..

thanks a lot for everyone's input

@Maxim - I din't really get what you suggested.. actually I have never heard of Knuth–Morris–Pratt algo.. I'll try to do some reading online on it and see if I can come up with something better..

thanks a lot for everyone's input

Rish Gupta

Greenhorn

Posts: 29

posted 4 years ago

Whoa hoss. You're going too fast too soon. For starters, that code isn't right, because it doesn't take into account the fact that your "string" is cyclic.

Deal with ONE thing at a time - and personally, I'd say that the first thing to deal with is the comparison itself. And next you need to deal with the

Question 1: If I have a "cyclic" string (CS), and I want to compare it with a

Question 2: If remove an arbitrary number of characters from the

Answering those two questions

Winston

Rish Gupta wrote:K, So I did what you guys suggested (big thanks for all your inputs) and here is what I came up with.. could you confirm if this was at least Oof(N log N) if not OofN..

Whoa hoss. You're going too fast too soon. For starters, that code isn't right, because it doesn't take into account the fact that your "string" is cyclic.

Deal with ONE thing at a time - and personally, I'd say that the first thing to deal with is the comparison itself. And next you need to deal with the

*context of your problem*.

Question 1: If I have a "cyclic" string (CS), and I want to compare it with a

*substring*(SS) to see if SS is

*repeated*in CS, what is the absolute

*maximum*number of characters that I have to compare?

Question 2: If remove an arbitrary number of characters from the

*start*of my CS string, and place them at the end,

*in the same sequence*, should my program return a different result or not?

Answering those two questions

*correctly*will give you a good head start on your problem (but it's only the start; if you want

*efficient*, this is NOT a simple problem). And if you're into amateur magic, Question 2 covers

__exactly__the same ground as

*cutting*(but NOT shuffling) a deck of cards.

Winston

Articles by Winston can be found here

Rish Gupta

Greenhorn

Posts: 29

posted 4 years ago

Hi Winston,

I tested the code with about 20 different inputs (S string) and it has given me the right answer each time, could you give me an example that the code might not work for.. Maybe I'm thinking in just one direction. I'm not shifting the string to check if it is cyclic or not, I'm doing something similar to what you asked in question 1 to which I think the answer would be N-1 worst case scenario.

Basically here's what i designed the code to do:

It gets all the factors of string length (let's take 'abcabc' as an example, it'll return 1, 2, 3 and 6)

Now we start with 1, the first factor, so the code compares 'a' to 'b' and since they are different 1 is eliminated.

Next factor is 2, so the code compares 'ab' to 'ca' and since they are different 2 is eliminated.

Next factor is 3, so we compare 'abc' to 'abc', match is found and since we have reached the end of the string we have an answer which is string length divided by 3.. So final answer is 2, which is the right answer .

(If there were 3 more characters say 'abcabcxxx', 'abc' would have been compared again to xxx and answer printed only if it were a match otherwise the code would have moved on to the next factor)

I hope it makes sense, it would be a big help if you gave an example for which the code might not work except empty string, I haven't tested for it.

For starters, that code isn't right, because it doesn't take into account the fact that your "string" is cyclic.

Hi Winston,

I tested the code with about 20 different inputs (S string) and it has given me the right answer each time, could you give me an example that the code might not work for.. Maybe I'm thinking in just one direction. I'm not shifting the string to check if it is cyclic or not, I'm doing something similar to what you asked in question 1 to which I think the answer would be N-1 worst case scenario.

Basically here's what i designed the code to do:

It gets all the factors of string length (let's take 'abcabc' as an example, it'll return 1, 2, 3 and 6)

Now we start with 1, the first factor, so the code compares 'a' to 'b' and since they are different 1 is eliminated.

Next factor is 2, so the code compares 'ab' to 'ca' and since they are different 2 is eliminated.

Next factor is 3, so we compare 'abc' to 'abc', match is found and since we have reached the end of the string we have an answer which is string length divided by 3.. So final answer is 2, which is the right answer .

(If there were 3 more characters say 'abcabcxxx', 'abc' would have been compared again to xxx and answer printed only if it were a match otherwise the code would have moved on to the next factor)

I hope it makes sense, it would be a big help if you gave an example for which the code might not work except empty string, I haven't tested for it.

posted 4 years ago

No, but I'd definitely try it with different lengths of string and different patterns; especially ones that "almost" cycle. And sorry for saying your code was wrong; I must have misread your

Just FYI, you can use the modulus operator for cyclic ops, which saves a bit of code (probably not time though), viz:although once you've done the factoring, you shouldn't need to "wrap" your comparisons at all (I left out the "ignorecase" stuff because you can deal with that by just converting the whole string to lower/upper case)

Actually, it will be length(CS) - length(SS) because you already know that SS is equal to itself.

Yup that's about right. I presume that you don't bother with 6, since any string must "rotate" to itself. And notice that it doesn't matter whether you have 'abcabc' or 'bcabca' (which is a "rotation") because the first three characters still repeat. That's what I was referring to in Question 2.

You just have to be a bit careful with string lengths like 20, which is 2 x 2 x 5, to make sure that you get ALL factors (ie, 1, 2, 4, 5 and 10).

You could also shave a bit of time off by "smart" checking. For example, for a string of length 20, when checking your 'length 2' substring, you could check position 10 first, because if it doesn't match, then you know that both 2 AND 10 are eliminated; and if you then check positions 4 , 8, 12 and 16 and find a mismatch, you know that 4 is eliminated too. However, keeping track of all the stuff so that you don't repeat checks may be more trouble than it's worth.

I'm afraid my maths isn't good enough to give you the exact O() formula for time, but it will be something like f * O(n), where f is the number of factors in n - and I suspect there's a general formula for that too.

HIH

Winston

- 1

Rish Gupta wrote:I tested the code with about 20 different inputs (S string) and it has given me the right answer each time, could you give me an example that the code might not work for.

No, but I'd definitely try it with different lengths of string and different patterns; especially ones that "almost" cycle. And sorry for saying your code was wrong; I must have misread your

`compare()`method. However, the advice is sound:

*do one thing at a time*; and don't move on until you know that what you've got is working.

Just FYI, you can use the modulus operator for cyclic ops, which saves a bit of code (probably not time though), viz:although once you've done the factoring, you shouldn't need to "wrap" your comparisons at all (I left out the "ignorecase" stuff because you can deal with that by just converting the whole string to lower/upper case)

doing something similar to what you asked in question 1 to which I think the answer would be N-1 worst case scenario.

Actually, it will be length(CS) - length(SS) because you already know that SS is equal to itself.

Basically here's what i designed the code to do:

It gets all the factors of string length (let's take 'abcabc' as an example, it'll return 1, 2, 3 and 6)

Now we start with 1, the first factor, so the code compares 'a' to 'b' and since they are different 1 is eliminated.

Next factor is 2, so the code compares 'ab' to 'ca' and since they are different 2 is eliminated.

Next factor is 3, so we compare 'abc' to 'abc', match is found and since we have reached the end of the string we have an answer which is string length divided by 3.. So final answer is 2, which is the right answer .

(If there were 3 more characters say 'abcabcxxx', 'abc' would have been compared again to xxx and answer printed only if it were a match otherwise the code would have moved on to the next factor).

Yup that's about right. I presume that you don't bother with 6, since any string must "rotate" to itself. And notice that it doesn't matter whether you have 'abcabc' or 'bcabca' (which is a "rotation") because the first three characters still repeat. That's what I was referring to in Question 2.

You just have to be a bit careful with string lengths like 20, which is 2 x 2 x 5, to make sure that you get ALL factors (ie, 1, 2, 4, 5 and 10).

You could also shave a bit of time off by "smart" checking. For example, for a string of length 20, when checking your 'length 2' substring, you could check position 10 first, because if it doesn't match, then you know that both 2 AND 10 are eliminated; and if you then check positions 4 , 8, 12 and 16 and find a mismatch, you know that 4 is eliminated too. However, keeping track of all the stuff so that you don't repeat checks may be more trouble than it's worth.

I'm afraid my maths isn't good enough to give you the exact O() formula for time, but it will be something like f * O(n), where f is the number of factors in n - and I suspect there's a general formula for that too.

HIH

Winston

Articles by Winston can be found here

posted 4 years ago

It also occurs to me that there might also be cases where you don't want to go in strict order of size. For a case like length 40 for example, it might be better to check 2, then 5 (rather than 4) since if 5 repeats even once, you've just eliminated 4 as a possibility.

However, without a mathematician to explain the actual process (to me as well ), I suspect you could tie yourself in knots with that sort of stuff.

Winston

- 2

Winston Gutkowski wrote:You could also shave a bit of time off by "smart" checking...

It also occurs to me that there might also be cases where you don't want to go in strict order of size. For a case like length 40 for example, it might be better to check 2, then 5 (rather than 4) since if 5 repeats even once, you've just eliminated 4 as a possibility.

However, without a mathematician to explain the actual process (to me as well ), I suspect you could tie yourself in knots with that sort of stuff.

Winston

Articles by Winston can be found here

posted 4 years ago

Got it. (I think )

Once you've eliminated '1' (all characters the same), check

Furthermore, if you find

If R is more than

So, taking your example of 'abcabc', let's double it up and introduce a mismatch: 'abcabcabc

Length is now 12, so factors are 1, 2, 3, 4 and 6.

Going by the procedure above and having eliminated '1', we next start with 3 (the highest

Since r is > 0, we know that '2' can't possibly repeat, and since p * (r + 1) is 9, we also know that 4 can't repeat either. Finally, 9 is greater than 12 / 2, so we also know that no other length can possibly repeat.

But if the string is 'abc

Phew. Fun exercise.

And if anyone can shoot down my logic, please let me know.

Winston

- 3

Winston Gutkowski wrote:You could also shave a bit of time off by "smart" checking...

Got it. (I think )

Once you've eliminated '1' (all characters the same), check

__prime__factors in

*descending*order, keeping a

__count__of repetitions even if you don't find a complete set of them, because if you find even

*one*, you've just eliminated

__all__factors below that number.

Furthermore, if you find

*more*than one, the

*total*number of characters taken up with repetitions, R = p * (r+1) (p=prime, r=# of repetitions), is important to know.

If R is more than

__twice__the value of any

*other*factor, you can eliminate those too; and if it's more than half the length of the entire string, you can eliminate any other possibility of a repetition.

So, taking your example of 'abcabc', let's double it up and introduce a mismatch: 'abcabcabc

**x**bc'.

Length is now 12, so factors are 1, 2, 3, 4 and 6.

Going by the procedure above and having eliminated '1', we next start with 3 (the highest

*prime*factor), and discover that there are 2 "repetitions" of 'abc' (but not 3, so it's not a full set).

Since r is > 0, we know that '2' can't possibly repeat, and since p * (r + 1) is 9, we also know that 4 can't repeat either. Finally, 9 is greater than 12 / 2, so we also know that no other length can possibly repeat.

But if the string is 'abc

**x**bcabcxbc', there are

*no*repetitions of 3, so we next check 2 (if there were any other prime factors > 2, we'd do those first, in

*descending*order), then 4, and finally 6 where we find our cycle.

Phew. Fun exercise.

And if anyone can shoot down my logic, please let me know.

Winston

Articles by Winston can be found here