• Post Reply Bookmark Topic Watch Topic
  • New Topic

How do I copy and paste to a text file?  RSS feed

 
carl sjostrom
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Every day I use I.E. and go to about 30 web pages and I click "Select All" then "Copy" and then I "Paste Unformatted Text" into my word processor and then I "save as text file" after which I can use C to parse through the text file to get the data I am after.

Needless to say that navigating to 30 pages per day is very tedious, so I would like to write a JAVA program to automate this process for me.

This does not sound like a difficult thing to do but since all I know about JAVA I learned from Starbucks, I will need a little help to know where to get the information I need to start this project.
Any help is appreciated as I have spent hours wandering the Internet without getting any smarter.

thanx

carl
 
Bear Bibeault
Author and ninkuma
Marshal
Posts: 66304
152
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well JSP isn't a technology that's going to help you in this so I'm moving this to Java in General (intermediate) for further discussion.

There are classes in the java.net packages that will help you out in this, but first you'll need to get the basics of java programming uner your belt.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This Sun Tutorial will give you some idea what it takes to read a web page into a Java program.

That part leaves you with HTML in a giant string. Then you'll need to parse the HTML to find the bits of data you want. I like the Quiotix Parser for this task. This part is non-trivial, but if you've done sophisticated stuff in C maybe you'll pick it right up.

Oh, and Welcome to the Ranch! The best approach around here is to take a shot at the smallest part of the problem you can think of, and if you get stuck post some code for us all to look at. Why not try some of the exercises in the Sun network tutorial and see if you can retrieve one page into a String? Maybe write class Page make this work:
 
Matt Gatten
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could also use the Url and InputStream classes and read the stream. Then you can 'scrape' everything between the <body> tags. Just a thought. I'm still a sorta newbie myself so maybe someone else could post a sample. I've done it a few times but all of my code is at work. Hope that helps.
 
Keith Lynn
Ranch Hand
Posts: 2409
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here is a sample class I wrote that contacts two different URLs, read the HTML, and parses it. I hope this helps.

 
Stuart Ash
Ranch Hand
Posts: 637
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by carl sjostrom:
Every day I use I.E. and go to about 30 web pages and I click "Select All" then "Copy" and then I "Paste Unformatted Text" into my word processor and then I "save as text file" after which I can use C to parse through the text file to get the data I am after.

Needless to say that navigating to 30 pages per day is very tedious, so I would like to write a JAVA program to automate this process for me.

This does not sound like a difficult thing to do but since all I know about JAVA I learned from Starbucks, I will need a little help to know where to get the information I need to start this project.
Any help is appreciated as I have spent hours wandering the Internet without getting any smarter.

thanx

carl


If I may suggest, I think HtmlUnit might be just what you need. This lets you object-orientedly connect to any URL and pull out exactly the data you need. This might solve both your problems at one go, avoiding you to have to first copy-paste and then parse.

Let's us know how you did it finally.
 
carl sjostrom
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So much help and so little time.
Thanks for all the information, it is more than I expected, it will keep me busy (happy) for a long time. As bad as I am at programming, I just love it more than anything.

thanx again

carl
 
carl sjostrom
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay it works (sort of).
Java is easier than I thought it would be.
And here is my next problem.
My simple program will download the text of a web page and save it to a .txt file, as long as I do not need a password to access the page.
So how do I include my user id and password info to get access to the pages that are password protected?

For now I am trying to get into my Yahoo! mailbox.
Here is what my code looks like so far.

 
Tony Morris
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
you should note that if your code is deployed on an application server, it contains a potential resource leak. Specifically, if anything after your opening of the PrintWriter throws an exception, your PrintWriter will remain open indefinitely. There is one specific method call that explicitly declares that it may throw an exception, which makes it all the more plausible.

The correct solution to this common issue (at least in my experience) is to use an embedded try/finally. Avoid the common, but erroneous idiom of assigning the PrintWriter reference to null, hthen using 'nullness' to determine if the PrintWriter was opened.

Here is something related:
http://jqa.tmorris.net/GetQAndA.action?qids=38&showAnswers=true
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!