Yes you are completely right, I think we need a tree to maintain the structure, and insert to the tree in a depth first order from left to right (while reading from right to left from the expression), keeping in mind that a parent node can have "n" childs where "n" is the number of operands the operator can have, and operands can't have childs.
for example for the second expression:
baz PAIR biz PAIR boz PAIR buz bez PAIR SET ”
we start parsing from right to left
we reached to an end because we have only operands at the bottom (buz and bez) left that can't have childs and there are still tokens in the expression so it is ill-formed.
for the first expression:
foo bar PAIR baz PAIR biz PAIR boz PAIR buz bez PAIR SET PAIR SET
we could build a full tree structure and consumed all the expression tokens, so its well-formed.
what about this solution??
drawing the tree was very difficult