• Post Reply Bookmark Topic Watch Topic
  • New Topic

Saving the same HTML page produces different size files  RSS feed

 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Java Specialist,

Could some explain the reason why the same web page read by 2 Java programs (DnldURL.java, copyURL.java) and Internet Explorer (Save file) could turn out to have different sizes. Let's look at the code of both Java programs below:

DnldURL.java can be referenced from http://www.devdaily.com/java/edu/pj/pj010011/pj010011.shtml.

Output from command line:

java DnldURL > DnldURL.out (27,589bytes) - DnldURL looked up http://www.xyz.com.


--------------------------------------------------------------------------

copyURL.java can be referenced from http://www.javaworld.com/javatips/jw-javatip19.html.

Output from command line:

java copyURL http://www.xyz.com copyURL.out (27,495bytes)


---------------------------------------------------------------------------

The same webpage turned out to be 112,782bytes when it is saved as a file using Internet Explorer 7.0.

I am running JDK 1.6.0_06 on Windows XP platform.

Can anyone explain the difference even though they all looked the same? I would like to convert this page to XML before parsing it. Which Java program suits this requirement best?

This question has also been posted on http://forums.sun.com/thread.jspa?threadID=5339311.

Thanks a lot,

Jack
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There's several factors that can influence an HTML file's size.
You can have server-side code checking the user agent and rendering client-specific HTML.
With IE you will get different sizes depending on if you are saving a file as "complete" (with accompanying images) or as "source only" because the complete version will need to have the image subdirectory prepended to all the image references.
I've also seen IE and Firefox add the occasional random tag to fix problems in the source. The "view source" copy would not match the on-disk copy.
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would very much appreciated if you do not respond to my question, if this is how you feel about it.

Jack
 
Paul Clapham
Sheriff
Posts: 22185
38
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jack Bush:
I would very much appreciated if you do not respond to my question, if this is how you feel about it.
Jack, did you post that in response to a post that was subsequently deleted? Because right now it looks like it's responding to Joe Ess's post, which doesn't seem to express any negative feelings towards your question at all.
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jack, are you referring to the signature line that appears underneath my post? That's something someone said about me.
 
Jack Bush
Ranch Hand
Posts: 235
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joe,

Looks like you enjoyed this comment made about you. I have misunderstood and thought that it was meant for me. Thank you for your suggestion then.

As for the line [How To Ask Questions On JavaRanch], who was it meant for? Please give further example from this post if it was for me.

Apologies for the misunderstanding but I am surprised to learnt that you like tagging an offensive comment about yourself around.

Paul, looks like this is a misunderstanding.

Jack
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am surprised to learnt that you like tagging an offensive comment about yourself around


Sometimes one can steal someone's thunder by taking ownership of an insult. To tell the truth, in the exchange those words were posted, I was being a bit of a jerk.
As for our FAQ, HowToAskQuestionsOnJavaRanch, it's a good idea to read through it. The better question you ask, the more help we can be.
 
Paul Clapham
Sheriff
Posts: 22185
38
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jack Bush:
Paul, looks like this is a misunderstanding.
Yes, it does. I have seen Joe's signature so often that it went right past me. It didn't occur to me you were taking it as part of the post. Anyway, it looks like the misunderstanding is cleared up now. Thanks all.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!