• Post Reply Bookmark Topic Watch Topic
  • New Topic

scanning a text file for email addresses  RSS feed

 
David Borchgrevink
Ranch Hand
Posts: 93
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
so i have this exercise:

write a program that scans a text file for possible e-mail addresses. Addresses look like this:

someone@somewhere.net
Read tokens from the input file one by one using hasNext() and next(). With the default delimiters of Scanner, an entire e-mail address will be returned as one token. Examine each token using the indexOf() method of String. If a token contains an at sign @ followed some characters later by a period, regard it as a possible e-mail address and write it to the output file.

Programs such as this scan through web pages looking for e-mail addresses that become the targets of spam. Because of this, many web pages contain disguised e-mail addresses that can't easily be automatically extracted


and above that is says to modify a program from our chapter which is this:



this is the first time hasnext() and next() have been introduced to me, so whereas scan.nextInt() looks for integers, does hasNext() looks for strings or characters? the wording in the exercise text is confusing to me. so basically i should create a text file with a ton of strings and within that jumble of text, stick a few email addresses; then when the program asks the user for the input file name, use that text file's name correct? pretty sure i have it up to that point, but using the indexOf() is what i'm having trouble wrapping my head around. do i look for the index of "@" and "."? i could conceptually see how i could say if the indexOf(".") is three spaces before the end then i know it's a .com or .net or .org or whatever. but how would i use indexOf("@") when the "user name" AND the "provider" (i.e. @yahoo or @google or @whatever) have an infinite number of lengths? we haven't had literally any discussion on input/output in classes so i am totally green to this.

even if it's not code, if someone could help me wrap my head around how this would work, i'd really appreciate it. thanks in advance
 
Joel Christophel
Ranch Hand
Posts: 250
1
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Borchgrevink wrote:
whereas scan.nextInt() looks for integers, does hasNext() looks for strings or characters?

Just to get this straight, nextInt() is to next() as hasNextInt() is to hasNext(). next() returns the next token. Tokens are separated by delimiters, and your assignment tells you to use Scanner's default delimiter, which is a space. Therefore, hasNext() will return the next chunk of text that precedes a space.

David Borchgrevink wrote:
so basically i should create a text file with a ton of strings and within that jumble of text, stick a few email addresses; then when the program asks the user for the input file name, use that text file's name correct?

Correct, but make sure to add spaces to your jumble of text, or else the Scanner will see the whole thing as one token.

David Borchgrevink wrote:
do i look for the index of "@" and "."? i could conceptually see how i could say if the indexOf(".") is three spaces before the end then i know it's a .com or .net or .org or whatever. but how would i use indexOf("@") when the "user name" AND the "provider" (i.e. @yahoo or @google or @whatever) have an infinite number of lengths?

You're actually making this more complex than your prompt is asking you to. All the prompt asks you to do is ensure the String contains an @ sign and a period that comes more than one character after the @ sign.
 
David Borchgrevink
Ranch Hand
Posts: 93
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joel Christophel wrote:
David Borchgrevink wrote:
whereas scan.nextInt() looks for integers, does hasNext() looks for strings or characters?

Just to get this straight, nextInt() is to next() as hasNextInt() is to hasNext(). next() returns the next token. Tokens are separated by delimiters, and your assignment tells you to use Scanner's default delimiter, which is a space. Therefore, hasNext() will return the next chunk of text that precedes a space.

David Borchgrevink wrote:
so basically i should create a text file with a ton of strings and within that jumble of text, stick a few email addresses; then when the program asks the user for the input file name, use that text file's name correct?

Correct, but make sure to add spaces to your jumble of text, or else the Scanner will see the whole thing as one token.

David Borchgrevink wrote:
do i look for the index of "@" and "."? i could conceptually see how i could say if the indexOf(".") is three spaces before the end then i know it's a .com or .net or .org or whatever. but how would i use indexOf("@") when the "user name" AND the "provider" (i.e. @yahoo or @google or @whatever) have an infinite number of lengths?

You're actually making this more complex than your prompt is asking you to. All the prompt asks you to do is ensure the String contains an @ sign and a period that comes more than one character after the @ sign.


okay. that definitely helps. this was also the first time i had seen the term "token(s)" so that threw me off as well, but i'll see what i can tackle now that i know that. thank you for the response!
 
Joel Christophel
Ranch Hand
Posts: 250
1
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Borchgrevink wrote:
okay. that definitely helps. this was also the first time i had seen the term "token(s)" so that threw me off as well, but i'll see what i can tackle now that i know that. thank you for the response!


You're welcome! If you have any more questions, feel free to post back.

Here's some clarification about next() and nextInt(): they both return the next token, but next() returns it as a String and nextInt() returns it as an int. If you use nextInt() and and the token contains things other than digits, it will throw an error.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!