I like the Quiotix Parser because it provides a slick visitor interface which I prefer over walking the DOM. I have some description of visitor and a link to Quiotix from
HERE. In short, you'd parse an HTML string and write a visitor to extract any attributes you like from all the nodes.