Hi guys/gals, I'm having an issue with a bruteforce problem. I've been provided a corpus file containing a novel amount of text and then a smaller file contain 7 or so search words (some are within that text). I need to use brute force implementation of looking for a sequence of characters within the text. Apparently I should have a loop that goes through each character index in the corpus and compares the substring at that index to the search string. The idea is to find the first matching index of the search string within the corpus string. I have written the for loop using the general algorithm for brute force but can't seem to figure it out. It's been hours and this is kind of a last resort. Another one of the issues is that I've never had to work with ArrayList and a string regarding parameters within a for loop. If anyone has any advice, ideas I would really appreciate it.
Welcome to the Ranch! (And thanks for posting such readable code.)
Jaydon Cobb wrote:Apparently I should have a loop that goes through each character index in the corpus and compares the substring at that index to the search string.
You might want to start by getting that clarified. It looks to me like your "corpus" is a list of strings, which seems reasonable. But then I don't know what a "character index" is with respect to that corpus. So that's why I say that clarification is necessary.
I also notice that the big string you're creating to search into, you're converting that all to lower case. But then when you're searching into it, you're converting the terms from your corpus to upper case. That guarantees that you won't ever find anything. But I think there's more problems than that one. For example your inner loop which increments j never actually uses j anywhere in the loop code. Probably not right. Anyway clarification of the requirements would help.
So if my assumptions about your requirements are correct, you are searching for all occurrences of your search string in a larger string.
Here's my thinking/approach:
Let's say we have the following string to search: "this is a very long string to try to test!" and we are searching for the string: "tr".
We essentially have to search every possible substring of length 2 within this larger string, since we know any matching substring will be the same length just by definition.
So we'd start at index 0 through 2. This gives us "th". We compare and find no matches, so we increment the counter and go to index 1 through 3. This gives us "hi". We continue on like this.
Eventually we'd get matches on "string" and "try" and it'd return 2.
You can set up this process through a single for loop.
There's no need to convert everything to lowercase. ".equalsIgnoreCase" is a method that exists for this purpose.
Finally I'd break this apart into multiple different methods, even from what you have there. I'd make a method that takes in a String to search through, a String to search for, and returns a number of occurrences.
Here's a code sample with some TODO comments. Try to fill these in, and it'll work.
Some more general coding advice:
This while loop looks weird t me, since it seems like it'll just break out of the loop immediately on the first iteration.
I'd write this with a for each:
I don't think printing an ArrayList will give you any meaningful output, it probably just prints a memory address (maybe I'm wrong here?)
That's a lot of code written to read a file into a String. Try writing an implementation using a Scanner and you might find this easier to make (if you really want to, you can write this method in a single line you just have to take advantage of the delimiter feature)
I'd declare this type as a List<String> rather than ArrayList<String> to make it more general:
That seems like an awfully complicated way to see if one String is contained in another String. Especially when the String class already includes a method specifically designed for that. Are you required to avoid using that method?
Could you hold this puppy for a sec? I need to adjust this tiny ad:
Try Free Java/.NET Libraries for Word Excel PowerPoint and PDF