Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Help with string - array

 
P Derlyuk
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do i separate a string like "ATGCCACTATGGTAG" into an array like [ATG, CCA, CTA, TGG, TAG]?
Any help would be great!
 
Wayan Saryada
Ranch Hand
Posts: 105
IntelliJ IDE MySQL Database Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

You could split the string by doing a loop. Take a substring that contains three characters each. Add this three-characters substring into a List. When all characters are read you can convert the List into an array.

You might also want to try to use a regular expression to do the split using the String.split() method.
 
P Derlyuk
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I did something along those lines.
Thanks!

 
Henry Wong
author
Marshal
Pie
Posts: 21197
81
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
P Derlyuk wrote:I did something along those lines.
Thanks!



You (1) used a for loop and the substring() method to get the three letter components, so that you can (2) build a string llst of components separated by a space, so that you (3) can then call split to get an array of the components ??? Would it not have been easier to just use a for loop and the substring() method to get the three letter components ?

Henry

 
Campbell Ritchie
Sheriff
Pie
Posts: 49411
62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is a problem with substring and very long Strings, which may cause you problems if your DNA represents a whole organism, even one as simple as Caenorhabditis elegans. If you simply use substring, the backing array for the long String is preserved, which can be unnecessarily expensive on memory. You can sort that out by using
... new String(dna.substring(0, 3));
That problem may not occur if you only use Java7+

There is another problem which will occur if you use the + operator on Strings repeatedly: memory filled up. Every use of + is associated with creation of several Objects, and after a few thousand this starts to exhaust your memory. Garbage collection will retrieve that memory, but you can watch your program become slower and slower. 10000 bp: you can see the delay. 1000000 bp: you can leave the program to chunter away to itself while you have dinner.
Suggested solution: put the String into a StringBuilder (←link) whose length is dna.length() + dna.length() / 3 (you cannot do this for Strings ≥ 1610612736 bp because of overflow errors). Inset " " every 3rd place. I suggest you start inserting 3 places from the end and count backwards; it is easier. I think you find the 3 from the end with dna.length() - 4.

But, as previously stated you are better off creating an array length dna.length() / 3 and using new String(dna.substring(i, i + 3) to populate it. You can predict the length of the array, so you don’t need to go via a list.
Don’t copy‑and‑ paste from this post because I have used nbsp characters.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic