# How to check whether two trees have identical structure ?

Sanjay Murugesan
Greenhorn
Posts: 13
I am trying to figure out a way to check whether two multiway trees are identical or not in java. When i say identical, i mean whether two trees have the same physical layout ( given that rearranging the nodes is allowed )..

For instance

I dont care abt the values of each node in the tree and am concerned only abt finding whether two trees have identical structure. Can anyone suggest me what data structure I can use to store the above data so that I can later compare them to see whether they are identical.

What i am planning to do is to convert each tree into some numeric values and then compare them. For instance..

|-- D
|
A--|-- C
|
|-- B can be converted to 13000 [ 1 stands for first node... 3 stands for 3 child node of first node.. followed by 3 zeros since none of the child nodes ( D, C , B) have further branches.

But my algorithm is coming to be little complex when tree becomes complex. One tree is getting mapped to too many numeric values and hence comparision is becoming tedious.
Any suggestions/pointer/help would be appreciated.

Thanks
[ May 16, 2005: Message edited by: Sanjay Murugesan ]

Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
Do you know how to do a recursive walk through a tree? If each node has right & left children it might look like:

Or do your nodes have collections of children? You showed one with three.

What you need is some way to track the progress through the tree. Let's see if we can make visit() return something ...

Try that with pencil & paper on some of your sample trees and see if it gives a string representation of the structure with none of the content.

Does that give you something to try?

Sanjay Murugesan
Greenhorn
Posts: 13
Hi Stan..

Really appreciate your quick reply..I totally understand what you are trying to hint here through your code..

But I think I didn't make my point across properly in my post..I am quite comfortable with the tree parsing stuff and infact am confident of converting any idea into code.. But what i am looking for is some good idea that would accomplish my task of mapping the tree into some output String or number.. Let me tell what i am trying to achieve here by comparing two trees..
I am trying to dynamically come up with a flowmap ( an jpeg image ) that consists of number of blocks with lines connecting them just like the one i had drawn as sample. And the text on each block of jpeg image is editable and hence can be changed run time.

I will be forming a tree structure during run time basen on some run time parameters and once tree is formed i need to select a flowmap image ( from the bunch i have ) that represent the tree struture i formed..

Now, in the run time I can form any of the following tree structure

but since the flow ( represented by tree ) is similar for all the above trees, i should be selecting the same flowmap image for all the above.. In other words all the above trees should be mapped to same flowmap image. I cant afford to have three flowmap images for the same flow in the server.
Now my question is how can i do the mapping such that all 3 gives same output string that I can map to some flowmap image ( and ofcourse no other tree structure should generate this output string ) ..One of the mapping that i can think off top of my head now for sample tree 3 is assigning

1**1 ( 1 to the power of 1 ) to A = 1

2**1 to B = 2

2**2 to C = 4

2**3 to D = 8

3**1 to E = 3

3**2 to F = 9

3**1 to G = 3

3**2 to H = 9

--------------

So the total for graph 3 comes to 39 .. Same way sample 1) and 2) will give a count of 39 which i can map to a particular flowmap image..

But the mapping i had used above wont work for all the scenarios and there may be a possibility that totally different tree can give the same output number 39. So i am looking for some good mapping alogrithm that will generate same unique output string for similar trees and different for non-similar trees.

I think i should involve some prime numbers for mapping.what do you say?

[ May 16, 2005: Message edited by: Sanjay Murugesan ]
[ May 16, 2005: Message edited by: Sanjay Murugesan ]

Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
The trace I sketched above would be ambiguous. You could uniquely name each node by the path you took to get there ... left, middle or right branch. Your last graph would come out

A=X origin
B=X.Left
C=X.Middle
D=X.Right
E=X.Middle.Left
F=X.Middle.Right
G=X.Right.Left
H=X.Right.Right

Left Right and Middle could be 1,2,3 or however many children one node can have. Maybe make a sorted list of nodes for each tree and compare the lists.

I would also try to do this on the fly without recording the structures by walking both trees at once.

Wow, is it that easy?

Geoffrey Falk
Ranch Hand
Posts: 171
1
Interesting problem. You need to define some "canonical form" that all trees of the same structure map to the same tree. You make sure the subtrees are canonical form, and sort them somehow...

There is a beautiful solution, which this margin was too small to contain!

Geoffrey

Stefan Wagner
Ranch Hand
Posts: 1923
@Stan:
I guess your solution doesn't find similar trees, only identical ones.

@Sanjay:
Both trees are identical, if you find for every subtree of tree A a subtree in tree B, and no additional subtree.

If you remove every identic subtree, you only need to compare both to be empty at last.

Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
Um, Stephan, define similar and identical? We might or might not agree here.

I think the looonnnggg and short solutions I suggested will both find identical structure, ie each corresponding node has the same number of children or the same right & left branches, while ignoring the content completely. Was that the question?

Ryan McGuire
Ranch Hand
Posts: 1085
4
Originally posted by Stan James:
Was that the question?

No, the task was to find trees that are identical even if you ignore left versus right.

Case 1: Obviously if your trees only have one node each, they're "similar".

Case 2: If one tree has a parent node and a child to the left whiloe the second tree has a parent and a child to the right, they are "similar" but not "identical".

I think Sanjay is looking for a way to determine if two trees are similar.

I think Geoffrey hit the nail on the head with his "canonical form" suggestion. (And the "margin" comment was beautiful.)

EDIT:
Wait, I think a modification of some of Stan's earlier code should do the trick, at least for binary trees:

To extend this to N-ary trees, you need some way to compare all combinations of children between nodeA and nodeB. i.e. You need to be able to find a 1-1 pairing between children of nodeA and children of nodeB where each pair returns true for visit(). If there is such a pairing, then visit(nodeA, nodeB) is true.

Ryan
[ May 18, 2005: Message edited by: Ryan McGuire ]

Stefan Wagner
Ranch Hand
Posts: 1923
@Stan:
Ryan explained what I meant and what was mentioned in the thread-opening.

btw.: my name is Stefan, though Stephan is somewhat similar

Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
Woops, sorry about the name. I scrolled back to look and must have changed the spelling in my head while scrolling back. And I really did miss the question. Rats.

My recursive method could be adapted to not care whether it goes right or left. I think it would have to use brute force to try every possible path through the tree. Comparing A's child-1 to B's child-1 might fail while comparing A's child-1 to B's child-2 would match.

I'm stumped on what the canonical description of the flow would look like to make the three samples given near the top work out. Maybe you could walk the tree once and annotate each node with the total number of children that can be reached from it, then convert them to text in that order?
[ May 19, 2005: Message edited by: Stan James ]

Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
Ok, build a list of paths to the leaves. The path is the number of children for each node along the way. The three trees in the sample at the top yield the same list:

3
3.2
3.2
3.2
3.2

Of course they add the entries in different orders. We have to sort them before comparing the lists. If you can have more than 9 children, pad the numbers with leading zeros to make the strings sort right.

Here's a node that makes such a list. It was a few minutes work; it puts a zero on the end for its own number of children (that's a feature for one-node trees) and doesn't put in the dots. I'll leave that to the reader along with sorting and comparing the two lists.

I used insertion-order-preserving linked lists and sets to facilitate unit testing. Some kind of sorting List for the results would be a nice feature. A sorting Set is right out becuse it doesn't allow duplicates.
[ May 19, 2005: Message edited by: Stan James ]

Byron Estes
Ranch Hand
Posts: 313
You could put the result in a hash table where the path signature/pattern is the key and the value is the number of times the path signature/pattern occured in the tree. This could be put into an class where you override the
the equals method to allow/encapsulate the comparison.

...or if you want to make it really fast you could build a master path pattern for the entire tree by sorting the path signatures by their occurence (primary) and then by their signature pattern itself (secondary). If you then built a string out of them in this order, you have what I think would be a decent key that you could then store in a database with your flows, images etc. Then you could evaluate a situation, build it's key and then use the power of the database to find various alternatives.