There are three kinds of actuaries: those who can count, and those who can't.
There are three kinds of actuaries: those who can count, and those who can't.
Piet Souris wrote:Given an ‘a’ at level 1 (so in fact I have a word that starts with an ‘a’), and I ask what possible outcomes I have one level down, I see the following possibilities:
1) The ‘empty’ character, meaning end-of-word
2) ‘b’
Both with probability 0.5.
Piet Souris wrote:
If I define my own notation, I’d say that the ‘entropy’, given an ‘a’ at level 1, is:
- { 0.5 * log_2(0.5) + 0.5 * log_2(0.5) } = - log_2(0.5) = - (-1) = 1
So H(a, 1, 1) (is entropy, given an ‘a’ at level 1 and going 1 level deep) = 1.
Alicia Perry wrote:I have a trie data structure that stores a sequence of English words.
In each node I saved:
the character that it represents the level at which the node is located a counter that indicates how many words "pass" for that node the entropy of the next character of the node.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Winston Gutkowski wrote:If I'm right, then emprically, a TrieNode needs only three things:
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
There are three kinds of actuaries: those who can count, and those who can't.
Winston Gutkowski wrote:Actually, if I'm being really picky, it probably only needs two: - content, and childList...
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
There are three kinds of actuaries: those who can count, and those who can't.
Winston Gutkowski wrote:Actually, you've stored a bit more than that, and I wonder if some of the things that you store aren't redundant.
Winston Gutkowski wrote:Also: DON'T make instance variables public. EVER.
Winston Gutkowski wrote:Well, Java has a very useful structure called an IdentityHashMap, that allows you to quickly retrieve values for specific objects. So, you could either set up one for each value, or you could create a NodeInfo class that contains all "non-essential" information about a TrieNode, and store that in an IdentityHashMap<TrieNode, NodeInfo>.
Then, whenever you encounter a TrieNode and you want to know its "count", you simply look it up in the Map.
Some of these values (eg, 'count') could also probably be calculated while the Trie is being built; although I have to admit that it "smells" like a value that might be better calculated via a method as required - but TBH I don't know.
Piet Souris wrote:
Not sure about your last reply. The way I look at it is what I already wrote. Given that I have an 'a' at level 1, then according to me there are two possibilities: either we're dealing with the word 'a' or we are dealing with the word 'ab'. Two possibilities therefore, with an entropy of 1. Now, in order to get that entropy, we must calculate the p for every child, taking that childs 'count' into consideration.
And that's why I talked about a special kind of null-character, so that the formula would only have to look at the children, and not also to inspect the 'isEnd' variable.
Winston Gutkowski wrote:
In fact, they include an even "simpler" structure:
where, instead of a List, each Node points to its "first" child; but that child may have "siblings".
Piet Souris wrote:Anyway, I've written my own version, according to what I understand of it.
I think it is about Alicia's version. but I added my own 'TrieEndOfWord class, to see if it works
There are three kinds of actuaries: those who can count, and those who can't.
There are three kinds of actuaries: those who can count, and those who can't.