I been trying to find ... really just good reads that help explain the details of what is necessary when interacting with a web server from a java point of view.
I've played a little with the Apache APIs, some with jutil, and of course native Java, and most of it I've been able to muddle through and understand, but I got stuck at trying to access pages from a web page that require authorization. I HAVE successfully written a couple of classes that do in fact authenticate to the server and I receive the OK response to my credentials. But that's where it stops, because after I've authenticated, then try to read pages that are customized for my login, the pages I get back are as if I never logged in. So I'm assuming that when I log in, I need to somehow keep that login token and also SOMEHOW pass it back to the server when I make further requests from it ... and it's that connection of post authentication and the HOW of pulling down information based on authentication that I cannot seem to locate any information on. It's like everyone out there talks about the process of getting info from the server, and they also talk a lot about how to log into a server, but I can't find anything that discusses with any significant explanation, how to maintain a logged in session while querying the server.
Any enlightenment would be wonderful, and thank you for taking the time to help.
Joe Areeda wrote:I think what you're missing is cookies. Here's one ofmany intros: https://docs.oracle.com/javase/tutorial/networking/cookies/
Yeah that isn't an overly complicated read... oh boy!
No, what I NEED is a solid example of authenticating to a web site, then pulling up different pages from that site with that login token being sent with each page query.
Tim Moores wrote:For that kind of programmatic web access, I generally advise to use the HtmlUnit library.
I did download and experiment with this library, but I'm still at a loss for being able to authenticate to a website, then make new requests to that site while making sure the web server sends those requests with my authenticated credentials. The documentation for HtmlUnit seems very ... limited ... perhaps? With no clear examples of what I am trying to do.
Maybe this will help clarify a little: Let's say you're using Chrome and you've gone to a site which asks you to log in. That URL might be something like this https://www.mywebsite.com/auth/login ... now that you're logged in, you need to pull up a page such as https://www.mywebsite.com/myprofile.html ... now with a web browser, I don't have to think about my credentials being passed to the second-page query, it just happens for me - magically it would seem.
This is the behavior I need to model in Java...
Hopefully you were already on the same page as this, but I thought maybe I would state it just in case.
I have various codes using it somewhere around here, and will try to dig those out, but likely not today. The way to provide authentication credentials is shown in http://stackoverflow.com/questions/29760463/htmlunit-basic-auth-issues
And then, cookies need to be enabled, but a brief web search only finds people having trouble doing that. I'll check that later as well.
Instead, it operates more in a hit-and-run mode. Every HTTP(S) request you make connects to the server, pushes the request payload up and receives a response back. Then it disconnects. Repeat over and over again.
Technically. In actual fact, there's generally a "keep-alive" mechanism that reduces some of the work for repeated connections, but as far as the app code is concerned, every request is a fresh connection.
Because of that. you cannot know the requester's identify merely because of the connection the way you would if you logged in as a time-sharing user on a traditional computer system. So instead
The preferred way to do that is to include an identify cookie in the request (jsessionid). The nice thing about this is that the built-in java.net http classes manage cookies for you automatically. As, of course, do all major web client apps (browsers etc.)
But sometimes that's not possible. There may be legal or physical constraints on using cookies. Or you may be working in a shop where the resident "genius" fatuously declares "Cookies are BAD because 'X' and are expressly forbidden in all our apps". Where "X" is generally some ignorant statement.
Anyway, where cookies cannot go, there's a technique known as "URL rewriting". Well-designed webapps use this as a fallback mechanism because instead of carrying the session ID in a cookie, it's appended to the URL itself. For example, "https://coderanch.com/forums/posts/reply/1234;jsessionid=99e3ad7c". This is done by feeding the bare URL to the url rewriting method which is on HttpServletResponse (I think). You then post the rewritten links to the response (web page) that you send out so that clicking on them will ensure an identification for the next request.
Regardless of whether the session ID comes in from a cookie or from a URL, however, the session ID determines which user made the request. The session ID itself carries no data, It's just a "random" hash key into the server-side sessions dictionary, which allows the sessionID to resolve to something resembling or containing an HttpSession object, thus allowing the server APIs to find things when they're needed, such as the remoteUser login ID from the JEE security manager or session-scope application objects.
you should never attempt to cache the session ID on the client side. It's only a hash key and it's only guaranteed valid until the next Http(s) request is made to the server. In particular, it is a documented security feature in Tomcat that when an application user switches to https, a new session ID is generated and replaces the one that had been used previously. That keeps "man-in-the-middle" attackers from accessing secured resources by using an unsecured session ID.
Or in other words, the cookie doesn't go out to the client and stay there. Every HttpServletResponse updates the jsessionid cookie.
When container-managed security is in control, the trigger that brings up the login form or dialog is a request by an un-authenticated user for a secured URL. When a URL that requires authorization is requested, the server sidelines the request and presents the login and processes the login. Then, if the login succeeds, the sidelined original request is re-activated internally so that the login process is completely transparent to the webapp. In fact, there aren't even any hooks to let a webapp know that a user has logged in or out in the J2EE spec. Partly because if you're using something like a single-signon security system, no such event may ever occur within the webapp.
A login form does have a URL, based on its WAR resource path, but attempting to login by using that URL directly will not work because the server context is not set up properly for login. The page will render, but the proper backend processing will not be dispatched.