Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Cannot find the carriage return character  RSS feed

 
Miguel Capo
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there,

I am new to working with encodings etc.

I am having the following problem and it will be great if someone could give me a hand.

I have a UTF-8 encoded file. The file is an XML file. I need to parse the file and then I need to search and replace only some nodes in that file. When I do search and replace for simple character, like 'a' and replace it with 'b' everything works fine.
I am having problem searching for the Carriage Return in the string, from what I can see the CR is stored as 3 bytes "E2 80 A9". I am trying to use regular expressions to do this task and I just can't get the expression that will find that carriage return.

I have tried:

scanText = scanText.replaceAll("\\\\r" , "<br />");
scanText = scanText.replaceAll("\\p{Zp}" , "<br />");

The text appears as ? which I am guessing is because I cannot replace it and the dos screen cannot display it properly

Thanks
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
According to the XML standard, there shouldn't be any carriage returns in an XML file. Only \n (linefeed) should be used to separate lines[1]. However, to be safe, you look for all three of the common separators: I've never found a use for those \p{Z} Unicode separator constructs.

[1] http://www.w3.org/TR/2006/REC-xml-20060816/#NT-S
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Alan Moore:
According to the XML standard, there shouldn't be any carriage returns in an XML file. Only \n (linefeed) should be used to separate lines[1].


But I think they could occur in CDATA sections, so in the general case they would still need to be treated, right?

Generally, using regexps to process XML is not such a good idea. Sooner or later that will come back to haunt you. Use an XML library and iterate through the contents instead.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!