• Post Reply Bookmark Topic Watch Topic
  • New Topic

Reading text from a file and using a delimiter  RSS feed

 
Jeremy Wages
Ranch Hand
Posts: 141
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I need to read text from a file (I've completed this part).  The file contains 8 lights with a question then an answer.  Ex:
Do you have a dog? Yes
Do you like cheese? No

etc.

We're supposed to use a delimiter to create a new line after the question mark.  With my current code it replaces the question mark with the new line, so it prints like this:

What is your favorite color
Blue

but I want it to be:

What is your favorite color?
Blue

Is this possible?  Maybe I'm looking in the wrong spots on the documentation?

Here is my code:


Thanks for any help!

 
Carey Brown
Saloon Keeper
Posts: 3323
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Without knowing more about your project, it could be as simple as
 
Jeremy Wages
Ranch Hand
Posts: 141
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:Without knowing more about your project, it could be as simple as



Thank you! That works.  Why does it work though?  The delimiter is creating a space in place of the question mark, is is System.out.println(fileIn.next() + "?"); just saying I want to include the question mark?
 
Norm Radder
Rancher
Posts: 2240
28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The delimiter is creating a space in place of the question mark

Can you post some output that shows the space you are talking about.  I do not think delimiters are returned by the scanner as part of the scanned String
 
Jeremy Wages
Ranch Hand
Posts: 141
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Norm Radder wrote:
The delimiter is creating a space in place of the question mark

Can you post some output that shows the space you are talking about.  I do not think delimiters are returned by the scanner as part of the scanned String. 


I shouldn't have said space.  It creates a new line at the question mark like it is supposed to, but I wanted to include the question mark so my question still had its punctuation.  Two questions from my text file read as:

What is your name? John
Where are you from? The United States of America

I wanted it to display as:
What is your name?
John
Where are you from?
The United States of America

Before the solution I was getting this output:
What is your name
John
Where are you from
The United States of America
 
Norm Radder
Rancher
Posts: 2240
28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think the StringTokenizer class has methods and settings that will return the delimiter as a the String.

It creates a new line at the question mark

Not really.  The parsed String does not end with a new line.
The new line comes from the println method.
 
Jeremy Wages
Ranch Hand
Posts: 141
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Norm Radder wrote:I think the StringTokenizer class has methods and settings that will return the delimiter as a the String.

It creates a new line at the question mark

Not really.  The parsed String does not end with a new line.
The new line comes from the println method.


I'll definitely check the StringTokenizer class out.  It'll probably be useful in upcoming assignments! Thank you!
 
Carey Brown
Saloon Keeper
Posts: 3323
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeremy Wages wrote:
Carey Brown wrote:Without knowing more about your project, it could be as simple as

Thank you! That works.  Why does it work though?  The delimiter is creating a space in place of the question mark, is is System.out.println(fileIn.next() + "?"); just saying I want to include the question mark?

This doesn't change how the delimiter is parsed in the scanner, all it is doing is taking one string (next()) and appending another string to it ("?"), the string just happens to be a  question mark.
 
Jeremy Wages
Ranch Hand
Posts: 141
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:
This doesn't change how the delimiter is parsed in the scanner, all it is doing is taking one string (next()) and appending another string to it ("?"), the string just happens to be a  question mark.


Yeah, I figured that out.  It also prints a "?" as the last line.  I'm trying to solve that now.
 
Carey Brown
Saloon Keeper
Posts: 3323
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Perhaps this snippet of code would help you understand how the parsing is behaving. Notice the double quotes in the output.

 
Carey Brown
Saloon Keeper
Posts: 3323
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here is a fuller example with a delimiter as a somewhat complex regular expression. Your pattern was "\\? ", a question mark followed by a space. What you want is one that looks for a space that is preceded by a question mark but don't include the question mark in the delimiter, that is, don't strip off the question mark. To do that requires the "look behind" syntax: "(?<=\\?) ". In order to parse the answers (e.g. "John") as a separate parse result, you need to explicitly handle the new-lines, so the combined pattern is: "(?<=\\?) |\\n|\\r\\n". As you can see regular expressions are powerful but tricky.


The output is:

The quotes were added in the print statement to show where the delimited text began and ended. As you can see you get alternating strings of questions followed by answers.

Edit: I'm basing this on the assumption that your text file contains both questions and answers similar to that shown for the "lines" variable.
 
Jeremy Wages
Ranch Hand
Posts: 141
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:Here is a fuller example with a delimiter as a somewhat complex regular expression. Your pattern was "\\? ", a question mark followed by a space. What you want is one that looks for a space that is preceded by a question mark but don't include the question mark in the delimiter, that is, don't strip off the question mark. To do that requires the "look behind" syntax: "(?<=\\?) ". In order to parse the answers (e.g. "John") as a separate parse result, you need to explicitly handle the new-lines, so the combined pattern is: "(?<=\\?) |\\n|\\r\\n". As you can see regular expressions are powerful but tricky.


The output is:

The quotes were added in the print statement to show where the delimited text began and ended. As you can see you get alternating strings of questions followed by answers.

Edit: I'm basing this on the assumption that your text file contains both questions and answers similar to that shown for the "lines" variable.


That assumption is correct.  I'm not quite understanding the delimiter though.  "(?<=\\?) "  The first question mark throws me off. 
 
Jeremy Wages
Ranch Hand
Posts: 141
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:Perhaps this snippet of code would help you understand how the parsing is behaving. Notice the double quotes in the output.



I thought this example would output similar to what my program was originally outputting except each line would be surrounded by \(text here)\, but I was wrong.
 
Carey Brown
Saloon Keeper
Posts: 3323
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeremy Wages wrote:I'm not quite understanding the delimiter though.  "(?<=\\?) "  The first question mark throws me off. 

In regular expressions, things that start with "(?" are control constructs. One of the most often used control is "(?i)" which says to ignore case in the regular expression. "(?<=...)" says look behind for a match of "..." (to be filled in) but don't include it in the matched result. In your case "..." is replaced with an escaped question mark, hence "(?<=\\?)".

Regular expressions can get very nasty. Pity on those who have to maintain it.

Here's a nice regex cheat sheet.
http://www.rexegg.com/regex-quickstart.html
 
Carey Brown
Saloon Keeper
Posts: 3323
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did some research and came up with this delimiter pattern that works
This still has the look behind syntax I was describing but I've added a "non-capturing group". What this means is a subpattern that gets matched but then thrown away (ie non-capturing). In your case you want to match a pattern for your answers but then throw the answers away and only keep the questions. The general syntax is "(?: ... )" where "..." is replaces with a pattern for the group you want to throw away. If you use a pattern like
"(?:.*)" it will match all the characters that make up the answer. Unfortunately it will also leave the new-line which gets sucked in for the next match. So, to absorb the new-lines as well you have to match "(\\n|\\r\\n)?". So, when this is all put together the results are:
It is often best to document complex regular expressions like this:

There's another approach that may be simpler for you and that is using String.split(). Example:
This will result in an array of length 2 where index 0 has the question and index 1 has the answer. Just ignore the answer.
 
Junilu Lacar
Sheriff
Posts: 11493
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The look behind regex solution is one of those that I'd consider as "too clever for its own good". As Carey's sig ironically says, regex can be a problem in itself, and that's in terms of understandability mostly. You can't look at that regex and say " Ah, yes, that makes a lot of sense!" unless you live, breath, and dream in regex, which very few people do.

A more straightforward solution would be to go with the simpler and more obvious \\? as the delimiter, then make the assumption that the input will conform to the specifications, which the more complex regex also does. You can then use two calls to next() per input line, the first one to read the question, sans punctuation, same as before. You'd still tack the question mark back on for the display, as before. The second call to next() would just get displayed with appropriate new lines around it. It's one extra line of code, potentially. If you really want just one line of code, you can do this:

That format string might not look very readable but at least it's easier to eyeball and grok than the look behind regex.
 
Junilu Lacar
Sheriff
Posts: 11493
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By the way, the second call should probably be nextLine() instead of just next(), so you read in the rest of the line after the question mark.
 
Junilu Lacar
Sheriff
Posts: 11493
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If it were me, I'd forego brevity for clarity and write
 
Junilu Lacar
Sheriff
Posts: 11493
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carey Brown wrote:It is often best to document complex regular expressions like this:

Yeah, that again illustrates the trouble with regex; it forces you to do things like that. In certain circles, that would be considered as piling more "stuff" on what's already "stuff", to use a less coarse word. This would actually be a good example of how comments can be a code smell.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!