• Post Reply Bookmark Topic Watch Topic
  • New Topic

Cleaning Text  RSS feed

 
Ranch Hand
Posts: 250
1
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have an array of Strings. Most of the strings contain only letter characters. However, some may have numbers or punctuation next to them. Some may be numbers. How would I make a loop to play around with the delimiter / tokens so as to only end up with strings that are words with letters (hyphenated words are acceptable) ?
 
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can use String.ReplaceAll method:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29
 
Joel Christophel
Ranch Hand
Posts: 250
1
Chrome Eclipse IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is exactly what I want, but how would I ignore all numbers?
 
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joel Christophel wrote:This is exactly what I want, but how would I ignore all numbers?




http://docs.oracle.com/javase/tutorial/essential/regex/
http://www.regular-expressions.info/

Of course, you want to ignore not just numbers, but other stuff too. So look at that, at the other answer, at the docs we both linked to, and spend some time working it out for yourself. Post again if you get stuck.
 
Marshal
Posts: 56610
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could also loop through the String char by char (this might not work for Chinese Japanese or Korean text), checking each char with the appropriate methods of the Character class. If the particular char fulfils your requirements, append it to a StringBuilder. Beware: if you create new StringBuilder('a'), you do not get a StringBuilder with the letter a in.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:You could also loop through the String char by char (this might not work for Chinese Japanese or Korean text),


Why not?
 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:You could also loop through the String char by char (this might not work for Chinese Japanese or Korean text), checking each char with the appropriate methods of the Character class. If the particular char fulfils your requirements, append it to a StringBuilder.

Or put the whole string in a StringBuilder and loop through that removing the characters you don't want.
 
Campbell Ritchie
Marshal
Posts: 56610
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeff Verdegan wrote: . . . Why not?
Because rather than isLetter() returning true, you will have isSupplementalCodePoint() returning true. I don’t know whether that will miss Chinese etc., letters.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:
Jeff Verdegan wrote: . . . Why not?
Because rather than isLetter() returning true, you will have isSupplementalCodePoint() returning true. I don’t know whether that will miss Chinese etc., letters.


I see. I thought you just meant that you couldn't iterate a Chinese, etc. String char-by-char. Didn't twig to the isLetter() part. Thanks.
 
Sheriff
Posts: 22846
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:
Jeff Verdegan wrote: . . . Why not?
Because rather than isLetter() returning true, you will have isSupplementalCodePoint() returning true. I don’t know whether that will miss Chinese etc., letters.


Well, no, Chinese characters are in the BMP, i.e. they are "ordinary" Unicode characters. Presumably isLetter() would return false, for them, since there isn't a Chinese alphabet.
 
Greenhorn
Posts: 7
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Joel Christophel wrote:This is exactly what I want, but how would I ignore all numbers?


while i try to execute the program given its giving me error like "Cannot invoke replaceAll(String, String) on the array type String[]" inside the FOREACH loop while tyring to replace. i used Eclipse Europa, Java 1.5 installed. i dont know whats the problem. can some one explain me
 
Rancher
Posts: 1776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
you should have tried to use the replaceAll() method on the array type variable str and not on the String variable s. If not post the code you tried.
 
Jagannath Duraisamy
Greenhorn
Posts: 7
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
John Jai wrote:You should have tried to use the replaceAll() method on the array type variable str and not on the String variable s. If not post the code you tried.


yes my dear john i found it and thanks
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!