• Post Reply Bookmark Topic Watch Topic
  • New Topic

HTML comparator in Java  RSS feed

 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello guys,

I need to compare 2 HTML files from java.

Please note that those html files have some Java script code embedded in them.

which classes/libs java provides to achieve this or do you guys know any free ware which helps achieve this.
Please write back.

Thanks,
Manish Ramrakhiani.

 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Compare in what way? The source? The output? As a DOM?

In any case, there are Java "diff" implementations out there, and Java XML diffs as well--did none of those meet your needs?
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for rplying David.

I need to compare the outputs of both the htmls.

to be precise I need to compare the UI part of it.

Could you please tell me in brief what all options i have.
I am still a greenhorn in java.

Thanks,
Manish Ramrakhiani.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I guess I'm asking how detailed the comparison needs to be: are you simply checking for the presence of specific fields or other UI elements? Is their order significant? Is their appearance significant? Answering those questions may impact what the best solution is.
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well,
No the order is not important.
Not even the appearance.
But all the UI elements present in say HTML 1 are present in HTML 2 are not, is important.

However I will be curious to know all the options even when the order and appearance too is important.

Thanks alot again.
I desparately need to achieve it.

Thanks,
Manish Ramrakhiani
 
Mazer Lao Tzu
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have used an HTML parser called NekoHTML in the past. Its API is similar to an XML parser, but it handles a lot of oddities of HTML that make it invalid for most XML parsers.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree; using an HTML parser would probably be the easiest and most robust solution. Preferably one that allows CSS selectors, which would make testing for the presence of tags trivial.

Another option, although there is a performance penalty, would be to use something like Selenium, which actually drives a browser. It also allows CSS selectors to find elements.
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks all for replying.
I will see what is best suited for me.

I am wondering if jdk has something to achieve this?

Please let me know if any body has any clue.

Thanks,
Manish Ramrakhiani.
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Guys,
Just to clear the things.
I want to compare the sourcr code of 2 HTMLs.

Thanks,
Manish Ramrakhiani.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's not really what you said before--before you said you wanted to check for the presence of elements, which is a different thing.

Do you want to compare the DOM, the text, or what? There are a bunch of diff implementations in Java, some XML diff implementations which might be usable for well-formed XML, and so on. But without knowing what you want *precisely* it's impossible to help further.
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry to bother you guys,

by DOM do you mean that I need to convert the htmls into some node like structure ?
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It depends on what you're actually trying to do, as I've said.
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It will be great if could convert the htmls into some tree like structure and then compare them.

I hope i am making sense.
really sorry David but i am kind of new to all this.

Thanks,
Manish.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Again, it depends on what kind of comparisons you're trying to do. If it's well-formed (relatively unusual for HTML, but if you control the HTML, it's doable) then many XML parsers might work. The previously-mentioned neko handles more HTML and allows standard XML querying.
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks alot David.
Read some stuff on neko, seems intresting.

Could you please give me some pointers/reference on how do I use it..some examples would be great.

Thanks again
Manish.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Er...

http://nekohtml.sourceforge.net/usage.html

And look at the samples provided with the project.
 
Manish Ramrakhiani
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks again David.

I came across some HTMLCleaner
http://htmlcleaner.sourceforge.net/.

Seems to be intresting.

However I had a query that some classes from jdk like
HTMLDocument, HTMLDocument.Iterator, HTML.tag, HTMLReader etc does not do the same task?

I mean are they not capable of converting a HTML into a DOM or a xml may be.

If not then what exactly is the use of those classes?

Thanks,
Manish Ramrakhaiani.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!