• Post Reply Bookmark Topic Watch Topic
  • New Topic

Dummy algorithm in Java to calculate the longuest repeted substring  RSS feed

 
Frank Poupard
Ranch Hand
Posts: 65
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello, I try to implement a way to find the longuest repeted substring given a string (known as the LRS problem). I intend to do it in a very simple fashion no matter the algorithm's efficiency.

I came up with the following code, which compiles but does not print anything. To me, it was supposed to print any substring longer than 2 ; even if I wanted only the longuest one (did not manage to arrange it). But it does not work at all...Any insight about what is wrong and what I should do ? Thanks in advance.

 
Peter Muster
Ranch Hand
Posts: 74
5
Eclipse IDE Python Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your code seems flawed in multiple places.

The unconditional break statement causes your code to finish once the inner for loop has finished once. Your variable i will never be anything but 0. If you remove the break statement however, B will be printed indefinitely because the condition of your while loop will never change if it is true once. This is because you do not change any of the values that are evaluated by the while loop inside the loop. You don't set isElsewhere, i or j inside the loop so it will execute indefinitely. Also if statements without a block are bad practice in my opinion. They make the code harder to read and it's difficult to tell in flawed code if they were meant to be made without a block or if it was a mistake.

So instead of

I'd write


Using a debugger to step through the code will help you understand how it works, especially in an example as easy as this (easy as meaning without input parameters, external resources, recursion or any other confusing constructs).
 
Campbell Ritchie
Marshal
Posts: 55772
163
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Christian Pflugradt wrote:Your code seems flawed in multiple places. . . .
t least you can see where it is flawed. Because of the incorrect indentation it is very difficult to follow the structure of the code. I think OP thinks line 19 has some relationship to line 15, but it hasn't. You are right to suggest additional {}.
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Frank Poupard wrote:Hello, I try to implement a way to find the longuest repeted substring given a string (known as the LRS problem). I intend to do it in a very simple fashion no matter the algorithm's efficiency.

Fair enough; but, as Einstein said: Everything should be as simple as possible, but no simpler.

One thing I can assure you of: You will NOT solve this problem by coding. So, my advice: Stop.

As you may have gathered from looking at the Wikipedia pages on it, this is not a simple problem. I don't know how to solve it optimally, and I've been programming for nearly 40 years - and most of the fastest solutions seem to have been developed in the '90's.

However, there are a few things I can glean simply from the requirements:
1. For a source String of length n, the substring I'm looking for will be <= n-1 characters long if overlapping substrings are allowed, or n/2 if they aren't (see below).
2. In order to quickly accept or eliminate candidates, I'll need a fast way of storing and finding substrings - and in Java, HashMaps (←click) are often the best solution for stuff like that.
3. The number of possible substrings of any length in a String of length n is a triangular number - n (for length 1) + n-1 (for length 2) + n-2 (for length 3) +.... and so on.
4. If there are no repeated characters (substrings of length 1) in the string, then there can't possibly be any repeating substrings; and I think this property holds true for any length of substring.
If so, we can start at x=1, and simply build maps of substrings of length x until we find a length that does not contain any duplicates, at which point we know that the longest possible one was x-1.

How do we know whether there are any duplicates? Simple: keep a list of the start points of each substring we check and, if it has already been found (ie, our map already contains it as a key), ADD it to the existing list. Then, any substring in our map that has more then 1 start point may be a duplicate (see below).

The only other thing you have to decide (or you may have been told) is whether a "duplicate" can overlap an existing substring or not. If not, it adds an extra step to the procedure, but it also means you only have to look for substrings up to length n/2.

Now I'm pretty sure that this isn't the fastest way to do it, but I'm also pretty sure it will work. And note that I've done it without writing a line of code.

So, a lesson for the future: You don't solve problems by coding, you solve them by thinking.

Hope it helps.

Winston
 
Frank Poupard
Ranch Hand
Posts: 65
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all for your insightful comments. I'm working on it and will post something when I come up with some decent code.
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Frank Poupard wrote:Thanks all for your insightful comments. I'm working on it and will post something when I come up with some decent code.

You're welcome. And I look forward to seeing your results.

Winston
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!