• Post Reply Bookmark Topic Watch Topic
  • New Topic

Removing one array of strings from another  RSS feed

 
james falk
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

Happy Thanksgiving, for those of you in the States...

I have been working on a program all night, and have got it down to one last step: I have to remove an array of strings from another array of strings.
I have what's called an 'ignore list' to go from, which is in an array called exactly that. I also have an array called tempArray, which has a list of words in it, some of which are on the ignore list. Here is the code I have so far:


It's throwing a null pointer exception where indicated, and I can only deduce that tempArray[i] equals null at some point, and can therefore not be compared using the .equals() method. I tried incorporating this:



instead, but that didn't work. My strategy was to replace all the instances of a word on the ignore list with a null value, and then use another method I wrote to take out all the nulls. Kinda stumped here. Any thoughts?

 
james falk
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I asked too soon...I ended up changing around a bit like this:


to get it to work. Sorry for any inconvenience, but hopefully this will get someone out of a similar bind. Cheers!
 
Wesleigh Pieters
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


you will still get a NullPointerException if you try to compare one object with another that hasn't been initialised.

however if you check if the object is null itself then you will get what you are looking for



edit I am also pretty sure that you could compare and check with ignoring case in one loop.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
james aggeles wrote:Sorry for any inconvenience, but hopefully this will get someone out of a similar bind. Cheers!

No probs, but there are a few things you could do to make your code cleaner:
1. Use Lists, not arrays.
2. One of the steps in your program is to attempt to find a word in your "ignore" list. Why not make that a method? Viz:indeed, if you use a List you don't even need to do this, because List already has methods to do exactly what you want.

Winston
 
james falk
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks guys. I will look into lists and learn what I can. I definitely could have used one loop for both problems Wesleigh. I am stuck on one final detail for this project, and although it's not really related to the arrays I was talking about earlier, maybe you could help out. So the program takes the text off a webpage and parses it out to determine what words are used, and how often. As per my earlier question, sometimes there is an ignore list that uses a certain constructor, sometimes not. I have got the program to work in all instances but one: if there's no text at all, like when some like " -- " gets edited and then put into one of the arrays. If there's no text, my program will print out that there was a 'word' that is blank, no space, just "", if you get the picture. But it counts it nevertheless, and I need it to not do that. I tried this:


But no luck. Suggestions?

edit: I just looked over this code, and I can see where it's not right. I have been up a very long time at this point, so I think I am gonna lay my head down for awhile and get back to this. Any thoughts on my question would be great though. Thanks a lot!
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
james aggeles wrote:If there's no text, my program will print out that there was a 'word' that is blank, no space, just "", if you get the picture. But it counts it nevertheless, and I need it to not do that.

Well, I think you need to define first exactly what a "word" is, because you clearly have some restrictions. And I hate to say, but simply requiring them to be alphabetic isn't likely to work, because you'll have words like "shan't" and "cut-off" (not to mention "fo'c's'le") that don't comply.

Winston
 
Wesleigh Pieters
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
can you not stop it from reading that in and adding it to the original array in the first place?

you are getting array out of bounds error?

also your for (int i = 0; i < count; I++) { - loop that will possibly not iterate over the entire array as count will be set by the number of instances where the element is not equal to "", but what if said elements are near the beginning of the array and it will then miss out valid entries near the end.
 
Joel Christophel
Ranch Hand
Posts: 250
1
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
james aggeles wrote:If there's no text, my program will print out that there was a 'word' that is blank, no space, just "", if you get the picture. But it counts it nevertheless, and I need it to not do that.

Well, I think you need to define first exactly what a "word" is, because you clearly have some restrictions. And I hate to say, but simply requiring them to be alphabetic isn't likely to work, because you'll have words like "shan't" and "cut-off" (not to mention "fo'c's'le") that don't comply.
Winston


For this, I've used the following in one of my programs, and it seems to work well. *Note that curly apostrophes and quotes are used.


This basically decides when punctuation is part of a word and when it is part of the sentence structure. If it's part of the structure, it is removed.
 
james falk
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wrote a pretty simple method to take off the punctuation, and it seems to work. To answer your question, a 'word' is something between spaces. I then strip off the punctuation and put them in an array. The problem is, there are spaces in the document that look like this: "blah blah blah -- blah blah blah", so the program sees "--" as a word, and then when the punctuation is removed I am left with an empty space in the array. And when I say empty space, I mean it: "" <-- empty space. I was hoping I could just iterate through the array and take out all the empty spaces. I will keep working on it. I am really grateful for your input, so thanks for the help!

edit: I also should have mentioned that any punctuation is not needed, so for words like "can't", they will just end up as "cant", which I know isn't exactly rigorous, but it's all the project calls for. I am pretty stumped on this one. I looked over your program Joel, and I don't know all the syntax (like I have never used: (String x : arrayName) before, so I am not sure how to integrate it into my program. I really just need to delete the placeholders in the array that contain the spaces, which is why I wrote that little addendum I posted earlier, but something in the program isn't right. Somehow it's not recognizing the words that contain "" in the array, and so therefore won't sort through them. Is there another syntax to define when a string contains nothing? Null? Or is null different?

Here's the part of the program that creates the initial array:



Is there something simple I can do here to make sure the instances of "--" don't get used in the array?
 
Wesleigh Pieters
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
null is different.

like I said your solution over looks the placement of those elements when it tries to build the second array. easiest would be an ArrayList and then using it's remove method. or if you don't want duplicates a Set etc.

http://docs.oracle.com/javase/7/docs/api/

edit (your new array should be the length of your old array - count ) but an ArrayList would still be better imo
 
james falk
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the API link, but I don't see anything about intializing the ArrayList. I googled it but didn't see anyone creating an arraylist and then putting an array into it directly. do i have to create a for loop to do this with the .add command?
 
Wesleigh Pieters
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
james aggeles wrote:Thanks for the API link, but I don't see anything about intializing the ArrayList. I googled it but didn't see anyone creating an arraylist and then putting an array into it directly. do i have to create a for loop to do this with the .add command?


there are a few ways you could do this but easiest i think



then you can just check if it contains and remove.



just out of my head so you may need to fix syntax etc and test
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wesleigh Pieters wrote:then you can just check if it contains and remove.

<nitpick>
Logically, Wesleigh's code is fine; however, you'll probably find thatworks quite a bit quicker, especially with a RandomAccess list like ArrayList.
</nitpick>

Winston
 
Wesleigh Pieters
Ranch Hand
Posts: 81
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks Winston, I would always like to learn better ways, I am fairly new to programming and Java.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joel Christophel wrote:For this, I've used the following in one of my programs, and it seems to work well...

Fine, but it's incredibly procedural. The code might answer the question "what do I do with a word?", but it doesn't answer the question "what is a word?" (at least, not clearly). It also seems to be combining the functions of deciding what is a word, and what is inside quotation marks; and that's usually not a good idea. You might find this article useful on that subject.

My suggestion would be to deal with quotations completely separately; and at the very least I'd take all the stuff inside your
for (String x : arrayName) { ...
loop and put it in a method.

You seem to be well on the way to writing a procedural parser, which would be fine if you were writing it in C; but you're not. Java is an object-oriented language, so my advice is to StopCoding (←click) for a bit, and sit down with a pencil and paper and think hard about what you're trying to do (Note: NOT how you're going to do it).

Parsers are tricky things, but you'll find that they lend themselves very nicely to object-orientation and layering. Unfortunately you have to break out of the "I've got this character and this situation, what do I do now?" way of thinking.

HIH

Winston
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wesleigh Pieters wrote:thanks Winston, I would always like to learn better ways, I am fairly new to programming and Java.

You're welcome; and don't worry about it too much, because what you wrote tells me that your approach is spot on - ie, you're thinking about WHAT needs to be done, not HOW you're going to do it. And believe me, that's a great lesson to learn...and a hard one to teach.

Keep it up.

Winston
 
Joel Christophel
Ranch Hand
Posts: 250
1
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Joel Christophel wrote:For this, I've used the following in one of my programs, and it seems to work well...

It also seems to be combining the functions of deciding what is a word, and what is inside quotation marks; and that's usually not a good idea.


I agree with the procedural aspect, and that's easy to fix. But the code, without getting overly in-depth, is defining when an apostrophe is not part of a word.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joel Christophel wrote:I agree with the procedural aspect, and that's easy to fix...

Ooof. You think so? Being an old procedural programmer myself, it took me ten years; and strangely enough, it was the exact problem that you're facing that made me realize that the way I was thinking about it was totally wrong.

But the code, without getting overly in-depth, is defining when an apostrophe is not part of a word.

I'm well aware of the intricacies of parsers, having written a few myself; and I still say that your structure is wrong.

Parsing is a layered process - some of my mathematician colleagues here might even say a recursive one - but you can actually unroll some of that recursion if you deal with each 'layer' (or parsing rule) individually. But trying to attack it by treating your source (or text) as simply a linear stream of identical characters is, I'm afraid, not the way to go.

Without writing a small novella, it's difficult to explain exactly what you're doing wrong; but your code above tells me that you're not isolating the issues properly. The only advice I can give you is: rather than look at it from the 'outside' - "I am a program, I have some text that I need to parse" - try and see if you can turn the problem around in your head - "I am some text, and I need to be parsed. Parse me.".

Sorry I can't be more specific (and I'll be happy to try and explain a bit further if you have some specific questions); but trust me, this was part of my "Eureka" moment in Object-oriented thinking, and I've never looked back since.

Winston
 
Campbell Ritchie
Marshal
Posts: 56578
172
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote: . . . Parsing is a layered process - some of my mathematician colleagues here might even say a recursive one . . .
I am not a mathematician, but my parser is definitely recursive. Since it is written in reversible FORTH, it is completely incomprehensible to the human eye.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!