Full stops, commas, tabs, double spaces, or anything like that, can be used to change the text into a regular grammar, which can easily be parsed with regexes.Dave Tolls wrote:. . . tabs in the original line. . . .
Dave Tolls wrote:It's not the title, it's the description that's being abbreviated.
Liutauras Vilda wrote:1. Consider converting embeddedAuthor to upper cases (or lower case) and only then look for "BY", so you wouldn't need to check "By", "by" and "bY".
2. I'd follow right away Carey's advice to write tests first, otherwise you won't notice how you'll break something while fixing something else.
3. Array probably isn't the right data structure for that task. Consider using Map (look for HashMap implementation). Might think of a structure to achieve mapping as Map<String, List<String>>. Might not, need to think carefully.
4. Since you are going to use a lot System.out.print... create simple method with a short name for debug purpose so you'd less clutter your code, as:
So you could write:
debug("Author", i);
debug("Book title", i);
...
Will join discussion again most likely later..
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
Carey Brown wrote:
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
How would you differentiate "by" with "bystander"?
Liutauras Vilda wrote:
Carey Brown wrote:
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
How would you differentiate "by" with "bystander"?
" by ", wouldn't work?
Carey Brown wrote:
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
How would you differentiate "by" with "bystander"?
peter m hayward wrote:
Carey Brown wrote:
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
How would you differentiate "by" with "bystander"?
i find the Index = embeddedAuthor.indexOf("by"); in the string hence no need to worry about words that contain by such as baby is it only looks for by but as it has been pointed out converting to a given case removes the need to check for typos e.g. By or bY or BY by will be cover so iwill be converting all to one case
Carey Brown wrote:
Liutauras Vilda wrote:
Carey Brown wrote:
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
How would you differentiate "by" with "bystander"?
" by ", wouldn't work?
Yes, that should work, but that's exactly my point. The devil's in the details.
Campbell Ritchie wrote:
Full stops, commas, tabs, double spaces, or anything like that, can be used to change the text into a regular grammar, which can easily be parsed with regexes.Dave Tolls wrote:. . . tabs in the original line. . . .
Otherwise, it is really easy to do by hand, because we are used to free grammars; we speak in them all the time.
Carey Brown wrote:
peter m hayward wrote:
Carey Brown wrote:
Campbell Ritchie wrote:I can't see the word “by” in the example in the first post.
How would you differentiate "by" with "bystander"?
i find the Index = embeddedAuthor.indexOf("by"); in the string hence no need to worry about words that contain by such as baby is it only looks for by but as it has been pointed out converting to a given case removes the need to check for typos e.g. By or bY or BY by will be cover so iwill be converting all to one case
In your code you have
Index = embeddedAuthor.indexOf("by");
this will find "by" embedded in "byabc", "abcby", and "abcbyxyz". indexOf() doesn't look for whole words.
peter m hayward wrote:
Ha! just released i posted an early version of the code and your are 100 % correct which is why i changed it to " by " in the later version as i noticed it found by in baby
Dave Tolls wrote:Ah, well...magic?
For some reason I thought there might tabs in the original line. I see there's nothing actually stated to that effect.
Carey Brown wrote:
peter m hayward wrote:
Ha! just released i posted an early version of the code and your are 100 % correct which is why i changed it to " by " in the later version as i noticed it found by in baby
So what if you have "by peter m hayward" ?
Liutauras Vilda wrote:@OP
I'm curious, if not a secret, is it some kind of industrial application or something else?
So, what is your next plan?
In which case, don't you have the description author's name, book title and everything else in the database? So who needs to dissect such Strings? You can get those details from the database and work out their total length, and then you have the description by itself to shorten. Who needs to look for by?peter m hayward wrote:. . . mysql stuff . . .
Campbell Ritchie wrote:
In which case, don't you have the description author's name, book title and everything else in the database? So who needs to dissect such Strings? You can get those details from the database and work out their total length, and then you have the description by itself to shorten. Who needs to look for by?peter m hayward wrote:. . . mysql stuff . . .
Maybe the publisher's websites will have details in a form you can scrape; otherwise I still think this might be a task impossible to automate. Sorry.Yesterday, I wrote:. . . I suspect you may actually have an impossible task. . . .
peter m hayward wrote:that is in the incoming data
peter m hayward wrote:and the isbn
peter m hayward wrote:me a few family member are running an online book store
peter m hayward wrote:I am trying to solve a string width issue were the object is to have a string that is no more than 80 characters in length and must include basic information
Liutauras Vilda wrote:
There is ISBN's database where all info could be found about the book in a nice format. They have an API to pull it.
Don't get me started about those stupid light bulbs. |