• Post Reply Bookmark Topic Watch Topic
  • New Topic

Aligning extremely long Strings  RSS feed

 
Jester Mangekyou
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear all,
thank you for looking at my first post. My problem has multiple components, dealing with long strings (from files Gbs in size), and then finding common substrings that will allow the long strings to be aligned in the most efficient manner (substrings do not necessary have to appear in a co-linear format, ie substring a at start of string 1 is not necessarily found at the start of string 2). Another pertinent point, is that gaps can be introduced in either sequences in order to gain a overall alignment, however, penalties are incurred for this. The aim is the find the best alignment of the two strings whilst incurring the least number of gaps. This is a bioinformatics problem related to aligning Genome sequences.

The other component involves how to deal with long strings whilst maintaining efficiency and minimising memory consumption. I have some experience in Java (1 term at uni) and hope to learn from you all.
Thank you for your attention
karthalikirens
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jester Mangekyou wrote:thank you for looking at my first post. My problem has multiple components, dealing with long strings (from files Gbs in size), and then finding common substrings that will allow the long strings to be aligned in the most efficient manner

Funny how questions come in bunches, isn't it? If you're simply interested in a fast substring finder, you might want to have a look at Boyer-Moore, or its Horspool variant (which is a lot simpler, but has worse worst-case execution). Alternatively, there is good old KMG.

Winston
 
Jester Mangekyou
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston,
Yes problems are always like that... its what makes it fun thou. Thank you for the information. I will check out these algorithms and post back an update.
Thank you
 
Campbell Ritchie
Marshal
Posts: 56570
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... and welcome to the Ranch
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!