Win a copy of Practical SVG this week in the HTML/CSS/JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Cannot find the carriage return character

 
Miguel Capo
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there,

I am new to working with encodings etc.

I am having the following problem and it will be great if someone could give me a hand.

I have a UTF-8 encoded file. The file is an XML file. I need to parse the file and then I need to search and replace only some nodes in that file. When I do search and replace for simple character, like 'a' and replace it with 'b' everything works fine.
I am having problem searching for the Carriage Return in the string, from what I can see the CR is stored as 3 bytes "E2 80 A9". I am trying to use regular expressions to do this task and I just can't get the expression that will find that carriage return.

I have tried:

scanText = scanText.replaceAll("\\\\r" , "<br />");
scanText = scanText.replaceAll("\\p{Zp}" , "<br />");

The text appears as ? which I am guessing is because I cannot replace it and the dos screen cannot display it properly

Thanks
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
According to the XML standard, there shouldn't be any carriage returns in an XML file. Only \n (linefeed) should be used to separate lines[1]. However, to be safe, you look for all three of the common separators: I've never found a use for those \p{Z} Unicode separator constructs.

[1] http://www.w3.org/TR/2006/REC-xml-20060816/#NT-S
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Alan Moore:
According to the XML standard, there shouldn't be any carriage returns in an XML file. Only \n (linefeed) should be used to separate lines[1].


But I think they could occur in CDATA sections, so in the general case they would still need to be treated, right?

Generally, using regexps to process XML is not such a good idea. Sooner or later that will come back to haunt you. Use an XML library and iterate through the contents instead.
 
What could go wrong in a swell place like "The Evil Eye"? Or with this tiny ad?
the new thread boost feature: great for the advertiser and smooth for the coderanch user
https://coderanch.com/t/674455/Thread-Boost-feature
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!