Alisha Burke

Greenhorn

Posts: 27

posted 11 years ago

I think this is an intermediate question so I will post my question here. I am currently in my second year course in Bioinformatics. As part of my subject requirement, I have been asked to come up with a Dynamic Programming for global alignment with traceback algorithm using Java Programming.

Right now, I have completed the first part of my program(the matrix) and its working properly. However, I have no idea on how to approach the 2nd part of the program (optimal alignment with traceback).

As my program is modeled for biologist, I would do my best to explain it in simple English terms. First off, the users must be able to enter 2 DNA sequence termed SEQUENCE 1 and SEQUENCE 2 (with no gap penalty and with linear gap penalty options imposed) include traceback algorithm.

For example: SEQUENCE 1: AAGT

SEQUENCE 2: AGT

Gap Penalty: -2

Match Score: 1 (if x=y ie AA or GG then match score is +1)

-1(if x ≠y ie AG or GT then mismatch score is -1)

Program will then produce as follows:

X A A G T

X 0-2-4-6-8

A-2 1-1-3-5

G-4-1 0 0-2

T-6-3-2-1 1

Note X stands for gap.

The second part of the program (the one that I am stuck with) is suppose to produce the optimal global alignment from the matrix.

So in this example, there are 2 optimal alignments:

A � G T - A G T

| | | and | | |

A A G T A A G T

Note: | stands for a match

- stands for a gap

How do I get the optimal global alignment is from the matrix with the traceback algorithm. Basically the traceback algorithm serves as a pointer that lead us back to the parent cells (in simple words we must find the path of choices that led to this optimal alignment)

The pointers start from the final cell (last row last column) and it follows any path back to point 0,0.

So the previous matrix must include the arrows which serves as pointers as shown below:

X A AG T

X 0 ←-2 -4-6-8

A -2 d 1←d -1-3-5

G -4 -1 0 d 0-2

T -6 -3 -2-1 d 1

Note: If arrows are pointing upwards ↑ or to the left ← means there is a gap which is

represented by a �-�

If arrows are pointing diagonally which I indicated as a d in the matrix (cant attach

a diagonal arrow) means there is a match and is represented by a � | �

So this is the part I am having problem with (traceback algorithm with the optimal alignment).

What I have thought so far:

>>Saving the pointers in each cell by using the if-statements as follows:

M[i,j] = max{M[i-1,j-1] + s(S1,S2[j]), M[i-1,j]+g, M[I,j-1] + g} ;

Initialize P[i,j]=�

If(M[i,j]==M[i-1,j-1]+s(S1[i],S2[j])) then P[i,j] = P[i,j]

If(M[i,j]==M[i-1, j]+g) then P[i,j] = P[i,j]

If(M[i,j]==M[i,j-1]+g) then P[i,j] = P[i,j]

However, as several optimal alignments may exist for a pair of sequences I am a bit lost on finding the right way to approach this problem. For example if there are two arrows branching off from one cell as you can see in the previous example (ie diagnol arrow(d) and left arrow(← , then I would firstly have to save this cell in memory, then I would have to traceback one arrow/pointer until it reaches Cell 0,0 then when that�s done I have to return again to the cell where there�s the branch off and continue to traceback the 2nd pointer until it reaches Cell 0,0.

This is quite a simplified version of the problem. There may be more than a hundred branches in the program so that�s why I would need to tackle this problem accordingly.

Hope you could give me some ideas on this�. I can attach my source code of my 1st part of the program if ask. Thank you.

I have been editing my post as the matrix and arrows are not showing properly here.... Can't seem to do anything about it though...if you have any further questions on my explanation do ask..

[ January 09, 2006: Message edited by: Alisha Burke ]

[ January 09, 2006: Message edited by: Alisha Burke ]

Right now, I have completed the first part of my program(the matrix) and its working properly. However, I have no idea on how to approach the 2nd part of the program (optimal alignment with traceback).

As my program is modeled for biologist, I would do my best to explain it in simple English terms. First off, the users must be able to enter 2 DNA sequence termed SEQUENCE 1 and SEQUENCE 2 (with no gap penalty and with linear gap penalty options imposed) include traceback algorithm.

For example: SEQUENCE 1: AAGT

SEQUENCE 2: AGT

Gap Penalty: -2

Match Score: 1 (if x=y ie AA or GG then match score is +1)

-1(if x ≠y ie AG or GT then mismatch score is -1)

Program will then produce as follows:

X A A G T

X 0-2-4-6-8

A-2 1-1-3-5

G-4-1 0 0-2

T-6-3-2-1 1

Note X stands for gap.

The second part of the program (the one that I am stuck with) is suppose to produce the optimal global alignment from the matrix.

So in this example, there are 2 optimal alignments:

A � G T - A G T

| | | and | | |

A A G T A A G T

Note: | stands for a match

- stands for a gap

How do I get the optimal global alignment is from the matrix with the traceback algorithm. Basically the traceback algorithm serves as a pointer that lead us back to the parent cells (in simple words we must find the path of choices that led to this optimal alignment)

The pointers start from the final cell (last row last column) and it follows any path back to point 0,0.

So the previous matrix must include the arrows which serves as pointers as shown below:

X A AG T

X 0 ←-2 -4-6-8

A -2 d 1←d -1-3-5

G -4 -1 0 d 0-2

T -6 -3 -2-1 d 1

Note: If arrows are pointing upwards ↑ or to the left ← means there is a gap which is

represented by a �-�

If arrows are pointing diagonally which I indicated as a d in the matrix (cant attach

a diagonal arrow) means there is a match and is represented by a � | �

So this is the part I am having problem with (traceback algorithm with the optimal alignment).

What I have thought so far:

>>Saving the pointers in each cell by using the if-statements as follows:

M[i,j] = max{M[i-1,j-1] + s(S1,S2[j]), M[i-1,j]+g, M[I,j-1] + g} ;

Initialize P[i,j]=�

If(M[i,j]==M[i-1,j-1]+s(S1[i],S2[j])) then P[i,j] = P[i,j]

If(M[i,j]==M[i-1, j]+g) then P[i,j] = P[i,j]

If(M[i,j]==M[i,j-1]+g) then P[i,j] = P[i,j]

However, as several optimal alignments may exist for a pair of sequences I am a bit lost on finding the right way to approach this problem. For example if there are two arrows branching off from one cell as you can see in the previous example (ie diagnol arrow(d) and left arrow(← , then I would firstly have to save this cell in memory, then I would have to traceback one arrow/pointer until it reaches Cell 0,0 then when that�s done I have to return again to the cell where there�s the branch off and continue to traceback the 2nd pointer until it reaches Cell 0,0.

This is quite a simplified version of the problem. There may be more than a hundred branches in the program so that�s why I would need to tackle this problem accordingly.

Hope you could give me some ideas on this�. I can attach my source code of my 1st part of the program if ask. Thank you.

I have been editing my post as the matrix and arrows are not showing properly here.... Can't seem to do anything about it though...if you have any further questions on my explanation do ask..

[ January 09, 2006: Message edited by: Alisha Burke ]

[ January 09, 2006: Message edited by: Alisha Burke ]