• Post Reply Bookmark Topic Watch Topic
  • New Topic

replicating the way a browser works

 
mj zammit
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey
The java application i am building is in a sense replicating the way a browser works.
I am using HTTPClient to help me do so.
When i am getting the html contents of the Web page, www.naturenet.com, at the same time it tries to retrieve the css files of this web page. Which fails since the css files will have a relative value and not an absolute. I noticed that the css files are located in the Web page's html <link> tag.
Does this mean that when retrieving the html page of this site it will also scan through the html retrieved and try to get all the href values from the link tag?
Also, if this is being done, does that mean that as soon as i get the html i must change all these relative href values to absolute for HTTPClient to be able to retrieve them?

Any comments will be greatly appreciated, since they have always helped me move in the right direction
 
Joe Ess
Bartender
Posts: 9361
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by mj zammit:

Does this mean that when retrieving the html page of this site it will also scan through the html retrieved and try to get all the href values from the link tag?


No. You have to read the HTML and make a separate request for each resource.
 
mj zammit
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
okay.
When i run my application I noticed it is automatically calling for these css files whilst i call for a specific Web page. Now how can i intercept this action for me to change the relative values to absolute values before it tries to retrieve them?
Where can i look to solve this problem?
 
Joe Ess
Bartender
Posts: 9361
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not sure I understand your application. You say you are using HTTPClient and when you request a page with it, HTTPClient also requests the CSS files which are linked within that page?
 
mj zammit
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It should be noted that my java application is a web server.
It accepts the clients requests, parses them, and sends the url (ex:www.naturenet.com) to a proxy class that retrieves the html using HTTPClient.
What i need to understand is how it is calling for the css files automatically, if all i ask is for it to get the Web page.
 
Joe Ess
Bartender
Posts: 9361
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by mj zammit:
It should be noted that my java application is a web server.


Are you invoking your application with a web browser? Then it's the web browser making another call to your web server in order to resolve the resources in the HTML it received from your app.
 
mj zammit
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes i am invoking my application with a Web browser.
I see...
So i must correct the href values on the html before i send it to the Web browser?
 
Joe Ess
Bartender
Posts: 9361
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by mj zammit:
So i must correct the href values on the html before i send it to the Web browser?


That would be one way to do it.
I have to ask, is there a particular reason you are approaching your application in this manner (give a web server a page to download) instead of doing the more "traditional" transparent proxy? Then you don't have to correct anything, just handle the subsequent requests from the browser as-is.
 
mj zammit
Ranch Hand
Posts: 49
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am getting the html web page so i can do some transformations to it.
I am using this architecture mainly because of time constraints and did not have the necessary resources to learn about transparent proxy.
Also what i need at the moment is a working prototype...
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!