Forums Register Login

Saving the same HTML page produces different size files

+Pie Number of slices to send: Send
Hi Java Specialist,

Could some explain the reason why the same web page read by 2 Java programs (DnldURL.java, copyURL.java) and Internet Explorer (Save file) could turn out to have different sizes. Let's look at the code of both Java programs below:

DnldURL.java can be referenced from http://www.devdaily.com/java/edu/pj/pj010011/pj010011.shtml.

Output from command line:

java DnldURL > DnldURL.out (27,589bytes) - DnldURL looked up http://www.xyz.com.


--------------------------------------------------------------------------

copyURL.java can be referenced from http://www.javaworld.com/javatips/jw-javatip19.html.

Output from command line:

java copyURL http://www.xyz.com copyURL.out (27,495bytes)


---------------------------------------------------------------------------

The same webpage turned out to be 112,782bytes when it is saved as a file using Internet Explorer 7.0.

I am running JDK 1.6.0_06 on Windows XP platform.

Can anyone explain the difference even though they all looked the same? I would like to convert this page to XML before parsing it. Which Java program suits this requirement best?

This question has also been posted on http://forums.sun.com/thread.jspa?threadID=5339311.

Thanks a lot,

Jack
+Pie Number of slices to send: Send
There's several factors that can influence an HTML file's size.
You can have server-side code checking the user agent and rendering client-specific HTML.
With IE you will get different sizes depending on if you are saving a file as "complete" (with accompanying images) or as "source only" because the complete version will need to have the image subdirectory prepended to all the image references.
I've also seen IE and Firefox add the occasional random tag to fix problems in the source. The "view source" copy would not match the on-disk copy.
+Pie Number of slices to send: Send
I would very much appreciated if you do not respond to my question, if this is how you feel about it.

Jack
+Pie Number of slices to send: Send
 

Originally posted by Jack Bush:
I would very much appreciated if you do not respond to my question, if this is how you feel about it.

Jack, did you post that in response to a post that was subsequently deleted? Because right now it looks like it's responding to Joe Ess's post, which doesn't seem to express any negative feelings towards your question at all.
+Pie Number of slices to send: Send
Jack, are you referring to the signature line that appears underneath my post? That's something someone said about me.
+Pie Number of slices to send: Send
Joe,

Looks like you enjoyed this comment made about you. I have misunderstood and thought that it was meant for me. Thank you for your suggestion then.

As for the line [How To Ask Questions On JavaRanch], who was it meant for? Please give further example from this post if it was for me.

Apologies for the misunderstanding but I am surprised to learnt that you like tagging an offensive comment about yourself around.

Paul, looks like this is a misunderstanding.

Jack
+Pie Number of slices to send: Send
 

I am surprised to learnt that you like tagging an offensive comment about yourself around



Sometimes one can steal someone's thunder by taking ownership of an insult. To tell the truth, in the exchange those words were posted, I was being a bit of a jerk.
As for our FAQ, HowToAskQuestionsOnJavaRanch, it's a good idea to read through it. The better question you ask, the more help we can be.
+Pie Number of slices to send: Send
 

Originally posted by Jack Bush:
Paul, looks like this is a misunderstanding.

Yes, it does. I have seen Joe's signature so often that it went right past me. It didn't occur to me you were taking it as part of the post. Anyway, it looks like the misunderstanding is cleared up now. Thanks all.
I guess I've been abducted by space aliens. So unprofessional. They tried to probe me with this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 746 times.
Similar Threads
Different Cursors
garbage collection / memory leakage
Using Runtime to execute other programs
download file which is on different server
All Classes in a Package?
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 07:49:37.