• Post Reply Bookmark Topic Watch Topic
  • New Topic

Reading from multiple files using Scanner to extract data  RSS feed

 
Iain Emsley
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm trying to write a programme which searches through a file directory and extracts the emails from each file (for insertion into a database for an internal login system so it never sees the actual files) and have got myself out of my depth. Separately my two scripts read the fle directory and return the list and the pattern matching one finds the emails addresses succesfully but I have been unable to get the two to work together.

What I want to do is to look at the foo directory and return bar.list/baz.list and then associate all the email addresses inside the files with the filename to get
user1@host.com bar.list
user2@host.edu baz.list
user1@host.com baz.list (same user as in bar.list but is associated with this file as well)
Eventually I want to also strip the .list part but right now, I'm trying to get the associations working and printing out. With the next stage I'll need to insert this into a MySQL database - am I best off storing this as a hashMap?
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The indentation here is jumping around somewhat randomly, so it's hard to tell exactly what you intend here. (This is why mixing spaces and tabs is bad - it gives different results in different environments.) But this line looks very suspecious:

That final acts as an emptly statement - this line is equivalent to

So your loop statement really has no effect on anything, which is probably not what you want. It's unfortunate that Java even allows this syntax without identifying it as an error.

That seems like a reasonable way to do this, if you don't care what order they results come out in. If you do care, then either a TreeMap or a LinkedHashMap might be better. But if you don't care, then HashMap is fastest.
 
Iain Emsley
Ranch Hand
Posts: 60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'll certainly think about the TreeMap or LinkedHashMap for the output once I've sorted the loop. I've rewritten the filefinder since I ran into an issue about Scanner needing to read a string and not being able to get new Scanner(new File()); to see the dir that my previous code relied on but I'm stil having issues with the input string which I've converted from an array.
On compilation I get:
Exception in thread "main" java.io.FileNotFoundException: [M:\foo\MAIN\BEDEWORK.LIST, M:\foo\MAIN\BEDEWORKSPRIVATE.LIST, M:\foo\MAIN\DONNA-NEW.LIST, M:\foo\MAIN\DONNA-NEWER.LIST, M:\foo\MAIN\DONNA-TEST.LIST] (The filename, directory name, or volume label syntax is incorrect)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(Unknown Source)
at java.util.Scanner.<init>(Unknown Source)
at org.stfc.bedework.FindEmail.main(FindEmail.java:17)
I'm assuming that the compiler is not seeing the windows path which I believe Java prefers is M:\\foo\\MAIN\\BEDEWORK.LIST as input. Is there a way of changing this filestream coming in to represent the real path or have I attempted to over engineer the Scanner file input (All I want it to do at the moment is to read the foo directory, find the .list files and then scan them for email addresses and another pattern)?
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

The names chosen here make no sense at all. The first thing is a filter, and the second thing is an array, which is like a list. Why is the first thing called a list, while the second is called a filter? This may seem like nitpicking, but I think it may indicate a misunderstanding of what these objects are, which ties into what happens next:

Here is where things go seriously wrong. (I assume the "new Scanner" line is the one that's throwing the exception.) Why are you using Arrays.toString()? You had a nice array, where each file was a separate element. By calling Arrays.toString() you lump them all into a single long string, joined by commas and spaces. What is that good for? The call to new Scanner(new File(name)) expects name to refer to a single file. The fact that name is, instead, a concatenation of several different files is why you're getting an error here. That's why the FileNotFoundException gives you a list of files - it should be looking for one file at a time, but you've passed ita string with many file names in it.

I suggest you reqwrite this with no Arrays.toString() at all. The badly-named listFilter is an array containing Files. Try writing a loop to access each element (each File) in the array, and create a Scanner for each File.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!