Forums Register Login

Binary search tree with two values per node

+Pie Number of slices to send: Send
Hello,

I am looking to write a binary tree in which each node holds both a string and an int value. I want to sort via the string. The integer would strictly be used to record the number of times the string appears. Any suggestions on how to do this, or do you know of code that already exist similar to what I'm looking to do that cam be studied?

Thanks
+Pie Number of slices to send: Send
 

Scotty Steven wrote:Hello,

I am looking to write a binary tree in which each node holds both a string and an int value. I want to sort via the string.

Thanks



You mean a 'String'? Remember Java is case-sensitive.

Scotty Steven wrote:

The integer would strictly be used to record the number of times the string appears.

Thanks



So that means the String's' can repeat and the tree can hold duplicates. You have not specified how you would handle inserts/updates/deletes of duplicate nodes. Does an equal node traverse the right subtree or does it traverse the left subtree of this node? How does a delete operation correct the count value in the affected nodes ( i.e for cases when you have count > 1 ). What is your algorithm ( let us know in plain text ) going to be?

Scotty Steven wrote:

Any suggestions on how to do this, or do you know of code that already exist similar to what I'm looking to do that cam be studied?

Thanks



What have you tried? Plus I think we require more details about your problem statement and we need to know your algorithm. Implementation comes much later.

Chan.



+Pie Number of slices to send: Send
 

Chan wrote:

You mean a 'String'? Remember Java is case-sensitive.



Yes, I mean 'String'. I understand that Java is case-sensitive. The sentence I wrote was not in code, so I sometime fall back to my default English language which is also case-sensitive.

Chan wrote:

So that means the String's' can repeat and the tree can hold duplicates. You have not specified how you would handle inserts/updates/deletes of duplicate nodes.



No. I guess I wasn't clear enough when I said "The integer would strictly be used to record the number of times the string appears." If a duplicate String appears, instead of inserting the String that already exists, the count integer would advance one. So, the first time the word "programming" appears, it is inserted into the tree along with an integer which would hold the starting value of 1. The next time the word "programming" appears, when the search algorithm finds the duplication, it would trigger the count integer to advance to 2 and would not insert the duplicate into the tree. The compared Strings would be setup to not be case-sensitive.

Chan wrote:

Does an equal node traverse the right subtree or does it traverse the left subtree of this node? What is your algorithm ( let us know in plain text ) going to be?



Let's go with if greater than, go right. If lesser than, go left.

How does a delete operation correct the count value in the affected nodes ( i.e for cases when you have count > 1 ).



Fair enough. In this project, it will be insert only. This is strictly to keep track of the number of times a String appears in a text file that will be read by the program. No deletions will occur. However, I think a re-balancing of the tree should occur after a couple of insertions.

Scotty Steven wrote:


Any suggestions on how to do this, or do you know of code that already exist similar to what I'm looking to do that cam be studied?

Thanks

Chan wrote:

What have you tried? Plus I think we require more details about your problem statement and we need to know your algorithm. Implementation comes much later.

Chan

.

I haven't coded anything yet. I am trying to figure out how to make this work and looking for a starting point. I know how I'd like it to work, but can not quite figure out how to begin. As such, I decided to request help from people more knowledgeable then myself on the subject. I thought someone could point me towards my goal, or perhaps knew of existing code that I could use as a starting point, but would require reworking it to make it fit.
+Pie Number of slices to send: Send
 

Scotty Steven wrote:

Chan wrote:

So that means the String's' can repeat and the tree can hold duplicates. You have not specified how you would handle inserts/updates/deletes of duplicate nodes.



No. I guess I wasn't clear enough when I said "The integer would strictly be used to record the number of times the string appears." If a duplicate String appears, instead of inserting the String that already exists, the count integer would advance one. So, the first time the word "programming" appears, it is inserted into the tree along with an integer which would hold the starting value of 1. The next time the word "programming" appears, when the search algorithm finds the duplication, it would trigger the count integer to advance to 2 and would not insert the duplicate into the tree. The compared Strings would be setup to not be case-sensitive.




Now that makes sense. So with each insert, you only increment the count if the String is equal. With this the following would be moot.

Scotty Steven wrote:

Chan wrote:

Does an equal node traverse the right subtree or does it traverse the left subtree of this node? What is your algorithm ( let us know in plain text ) going to be?



Let's go with if greater than, go right. If lesser than, go left.




Cause you are simply incrementing the count, if you get an equal node. So you don't need to traverse any sub tree. Or is there something I haven't understood?

Scotty Steven wrote:

Chan Ag wrote:How does a delete operation correct the count value in the affected nodes ( i.e for cases when you have count > 1 ).



Fair enough. In this project, it will be insert only. This is strictly to keep track of the number of times a String appears in a text file that will be read by the program. No deletions will occur. However, I think a re-balancing of the tree should occur after a couple of insertions.




Cool. But delete ( regardless of whether you are going to decrement the count or delete the node altogether ) would also not be complicated cause the tree doesn't have duplicate nodes.

Scotty Steven wrote:
I haven't coded anything yet. I am trying to figure out how to make this work and looking for a starting point. I know how I'd like it to work, but can not quite figure out how to begin. As such, I decided to request help from people more knowledgeable then myself on the subject. I thought someone could point me towards my goal, or perhaps knew of existing code that I could use as a starting point, but would require reworking it to make it fit.



Have you tried this and this link already? Do you find them helpful?
1
+Pie Number of slices to send: Send
Do you need a tree with Strings and counts? Or do you need a tree with Strings and counts of Strings?

I meant the two to be different. You may find a much easier solution if you work out that difference.
1
+Pie Number of slices to send: Send
Here is the project in a nut shell. Some students have other students write papers for them. This practice has been going on for decades. I am theorizing that each writer has their own style and as such, repetitively use the same words over and over. What I am setting out to do is create a program that will count the writers use of each word in document, and compare that to other written papers by the same student. I'm theorizing that a pattern should start to appear and any deviation might indicate a change in authors.

So, as part of my research, I am looking at 100's or even thousands of papers, one at a time by multiple authors and extracting the word counts from each as part of the data collection stage of my thesis. As such, some of these papers will be ten's of thousands of words long (I will be starting with published authors to establish if the theory is worthy of further research). Therefore, searching the tree efficiently to populate the words and increment the word counts efficiently is important and therefore sorting is important.

Once the tree has been completely populated, the tree needs to be sorted from largest number of occurrences of a word to smallest.
1
+Pie Number of slices to send: Send
While I'm interested in the code implementation, I'd really like to see the results of your research. Sounds fascinating.
1
+Pie Number of slices to send: Send
You need to look up a data structure for holding your Strings/Counts. Have you looked at Java's Collections API to see which type you think best fits your needs? There are two different base-interfaces you can start from, depending on the answer to Campbell's question (if I understood it correctly). Once you have the base interface chosen, then you can narrow down an implementation which does what you need - I think there is already a good implementation for you in either of the two branches you choose, and there are variations if you are working in a multi-threaded environment.
+Pie Number of slices to send: Send
 

Steve Luke wrote: . . . Have you looked at Java's Collections API . . . ? . . .

No. no, no, don't look there, because I happen to know one page in that link has a little application which counts words in text. It is really long and complicated: about ten lines.
+Pie Number of slices to send: Send
 

Earlier, I wrote: . . . about ten lines.

No, I was mistaken. even if you miss out comments and blank lines, it comes to twelve lines.

Now I know what the application is for, I would suggest you do the counting first and the sorting afterwards.
+Pie Number of slices to send: Send
There are algorithms which people use to try and identify authors' styles. It is much more complicated than counting words, I am afraid.
+Pie Number of slices to send: Send
 

No. no, no, don't look there, because I happen to know one page in that link has a little application which counts words in text. It is really long and complicated: about ten lines.

Earlier, I wrote:
. . . about ten lines.
No, I was mistaken. even if you miss out comments and blank lines, it comes to twelve lines.



Okay. Got anything more specific? I have been educating myself on Binary Search trees, which seem to fit what I'm looking for. I've been teaching myself by playing with one that has to do with numbers and traversing the tree with pre-ordered,in-ordered, and post-ordered, but that is as far as I've gotten.

Now I know what the application is for, I would suggest you do the counting first and the sorting afterwards.



I agree.

There are algorithms which people use to try and identify authors' styles.



I am aware that there are more complicated was of doing this. They explore word use patterns, but ignore reuse of words. With students checking their work through grammar correction sites such as Grammerly.com, those style tests are becoming less reliable. My thesis is to show that a much simpler way exists; that the number of occurrences/some word count will also show the likelihood that a particular word belongs to an author.
1
+Pie Number of slices to send: Send
 

Scotty Steven wrote: . . . Got anything more specific? . . .

Not unless you manage to find the little app called Freq or similar on the Map page.

. . . My thesis is to show that a much simpler way exists; that the number of occurrences/some word count will also show the likelihood that a particular word belongs to an author.

Oh, it's a thesis? Go for it!!

While I wasn't looking for the Map page, I noticed there is a sorted map interface, in the same “trail” of the Java Tutorials. I had quite forgotten about that; you might be able to use a sorted map to do the whole thing in one fell swoop. The more I read in the Java Tutorials, the easier the whole thing seems to be.
No, tomorrow we rule the world! With this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 3065 times.
Similar Threads
Binary Search Tree
trees
tree datastructure algorithm
B+ Tree in Java
Datastructure: Minimum weight of a binary tree
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 02:54:18.