• Post Reply Bookmark Topic Watch Topic
  • New Topic

Regular expression to strip unwanted characters  RSS feed

 
Hafeez Pallikonda Khader
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The below code gives me wrong result. Basically i want only a-z and _ as the starting character


 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hafeez Pallikonda Khader wrote:The below code gives me wrong result. Basically i want only a-z and _ as the starting character



It would help us a bit if you give us more details -- probably starting by giving us an examples of what is right.

Henry
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You say what the output is but not exactly what you expect for that example and there is ambiguity in your specification. Do you mean you want to get rid of leading characters if they are not in you valid set? If so then I would expect your desired out to be "St #" but this does not make sense when looking at your regex. Could you say exactly what you expect as the output for the example " #19 98St # " .
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
output is 1998St which is wrong

What is the correct output, then? A regexp like "^[^a-z_]+" would remove all leading characters that are not letters or _ ; that's how I interpret "i want only a-z and _ as the starting character".

Edit: .... which is just about what Richard said.
 
Hafeez Pallikonda Khader
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I expect the output to be "St"

Surely 1998 does not fall in a-z and not an _
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then the regxp I mentioned does that. If you want to remove the special characters everywhere (and not just as starting characters, as you said initially), remove the first "^" from the regexp; it causes the regexp to match only at the beginning of the string.
 
Hafeez Pallikonda Khader
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Its not about special characters at the start. Please look at my edited code.
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hafeez Pallikonda Khader wrote:Its not about special characters at the start. Please look at my edited code.


If you just want "St" as a result your regex is very very wrong. I can't work out from what you have posted what the general specification is ! You seem to want to keep only alpha characters and the underscore in which case the regex should simply be ""[^a-z_]+" but the lack of a decent specification means I'm just guessing!
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please don't edit your previous posts like that. Now all the following posts don't make sense any more. If you have new code, just put it into a new post.
 
Hafeez Pallikonda Khader
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sure. I'll open a new post with the code
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hafeez Pallikonda Khader wrote:Sure. I'll open a new post with the code


No !!! Just add a response containing the new code! And please read again ALL the previous responses.
 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I think there is an misunderstanding of regular expression here. The regex defines what to match, and what to replace it with. It doesn't define what the result should look like.

This means that if the first character is not a-z, it will be replaced. If the second letter is also not a-z, that part of the regex doesn't apply. The ^ means the beginning of line of the input -- it is not the beginning of line of the output.

Henry
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Posted on behalf of Hafeez:

The below code gives me wrong result. Basically i want after replace all operation.

  • letters(a-z) and Underscore(_) as the starting character
  • Following that, there can be letters(a-z), numbers(0-9), Underscore(_), Hyphen(-) and period(.)


  •  
    Richard Tookey
    Bartender
    Posts: 1166
    17
    Java Linux Netbeans IDE
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hafeez Pallikonda Khader wrote:
    The below code gives me wrong result. Basically i want after replace all operation.

  • letters(a-z) and Underscore(_) as the starting character
  • Following that, there can be letters(a-z), numbers(0-9), Underscore(_), Hyphen(-) and period(.)


  • Though I can't be certain because I'm still not certain what the OP wants, I think replaceAll() is the wrong method to use. I think find() should be used with regex "([a-z_][a-z0-9_-.]*)" with group(1) giving the desired result.

    (UD: edited to make clear that the quote is from Hafeez, not from me)
     
    Henry Wong
    author
    Sheriff
    Posts: 23295
    125
    C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Richard Tookey wrote:
    Though I can't be certain because I'm still not certain what the OP wants, I think replaceAll() is the wrong method to use. I think find() should be used with regex "([a-z_][a-z0-9_-.]*)" with group(1) giving the desired result.


    Maybe we are speculating on what the OP wants differently, but I think the original code should work. It just needs a slight change to the regex. Here...



    Henry
     
    Richard Tookey
    Bartender
    Posts: 1166
    17
    Java Linux Netbeans IDE
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Henry Wong wrote:[
    Maybe we are speculating on what the OP wants differently, but I think the original code should work. It just needs a slight change to the regex. Here...



    Henry


    Possibly! Without a decent specification or several examples of input and desired output I don't know what the OP wants.
     
    Hafeez Pallikonda Khader
    Greenhorn
    Posts: 9
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Works perfectly. Thanks a lot


     
    Hafeez Pallikonda Khader
    Greenhorn
    Posts: 9
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I'm doing this to filter invalid xml element names FYI
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!