• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Frits Walraven
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • salvin francis
  • fred rosenberger

Struggling with bruteforce algo

 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys/gals, I'm having an issue with a bruteforce problem. I've been provided a corpus file containing a novel amount of text and then a smaller file contain 7 or so search words (some are within that text). I need to use brute force implementation of looking for a sequence of characters within the text. Apparently I should have a loop that goes through each character index in the corpus and compares the substring at that index to the search string. The idea is to find the first matching index of the search string within the corpus string. I have written the for loop using the general algorithm for brute force but can't seem to figure it out. It's been hours and this is kind of a last resort. Another one of the issues is that I've never had to work with ArrayList and a string regarding parameters within a for loop. If anyone has any advice, ideas I would really appreciate it.

Here is said code, for loop (non-working, just returns not found) is at the bottom:
https://hasteb.in/texacuju.gradle
 
Saloon Keeper
Posts: 7175
65
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Most people on this site will not follow an unknown link. Could you please cut and paste the code into a post and UseCodeTags (<-link).
 
Jaydon Cobb
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
Marshal
Posts: 25594
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch! (And thanks for posting such readable code.)

Jaydon Cobb wrote:Apparently I should have a loop that goes through each character index in the corpus and compares the substring at that index to the search string.



You might want to start by getting that clarified. It looks to me like your "corpus" is a list of strings, which seems reasonable. But then I don't know what a "character index" is with respect to that corpus. So that's why I say that clarification is necessary.

I also notice that the big string you're creating to search into, you're converting that all to lower case. But then when you're searching into it, you're converting the terms from your corpus to upper case. That guarantees that you won't ever find anything. But I think there's more problems than that one. For example your inner loop which increments j never actually uses j anywhere in the loop code. Probably not right. Anyway clarification of the requirements would help.
 
Jaydon Cobb
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


EDIT: nvm fixed
 
Rancher
Posts: 179
15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So if my assumptions about your requirements are correct, you are searching for all occurrences of your search string in a larger string.

Here's my thinking/approach:
Let's say we have the following string to search: "this is a very long string to try to test!" and we are searching for the string: "tr".
We essentially have to search every possible substring of length 2 within this larger string, since we know any matching substring will be the same length just by definition.
So we'd start at index 0 through 2. This gives us "th". We compare and find no matches, so we increment the counter and go to index 1 through 3. This gives us "hi". We continue on like this.
Eventually we'd get matches on "string" and "try" and it'd return 2.
You can set up this process through a single for loop.
There's no need to convert everything to lowercase. ".equalsIgnoreCase" is a method that exists for this purpose.
Finally I'd break this apart into multiple different methods, even from what you have there. I'd make a method that takes in a String to search through, a String to search for, and returns a number of occurrences.

Here's a code sample with some TODO comments. Try to fill these in, and it'll work.






Some more general coding advice:


This while loop looks weird t me, since it seems like it'll just break out of the loop immediately on the first iteration.


I'd write this with a for each:



I don't think printing an ArrayList will give you any meaningful output, it probably just prints a memory address (maybe I'm wrong here?)



That's a lot of code written to read a file into a String. Try writing an implementation using a Scanner and you might find this easier to make (if you really want to, you can write this method in a single line you just have to take advantage of the delimiter feature)


I'd declare this type as a List<String> rather than ArrayList<String> to make it more general:
 
Paul Clapham
Marshal
Posts: 25594
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That seems like an awfully complicated way to see if one String is contained in another String. Especially when the String class already includes a method specifically designed for that. Are you required to avoid using that method?
 
Could you hold this puppy for a sec? I need to adjust this tiny ad:
Try Free Java/.NET Libraries for Word Excel PowerPoint and PDF
htttp://www.e-iceblue.com/free-apis.html
    Bookmark Topic Watch Topic
  • New Topic