Pancoska's Algorithm
In trying to improve upon previous results to choose the most specific siRNAs, we have studied the basis of Pancoska’s algorithm as detailed in his paper (P.Pancoska, Z. Moravek, U. Moll, Efficient RNA interference depends on global context of the target sequence: quantitative analysis of silencing efficiency using Eulerian graph representation of siRNA, Nucleic Acids Res. 32 (4)(2004) 1469-1479). First a nucleotide sequence is represented as a Eulerian graph Γ where the bases A,T,G,C are the vertices and each edge represents a connection between the bases. From this graph an adjaceny matrix is constructed where the vertices A,T,G,C are written across the top and side and each entry Γ ij represents the number of edges between i and j. This matrix will be symmetric. Then Γ’ is constructed by subtracting, in each column, the off diagonal entries from the diagonal entry. The basis for a graph is 13 independent cycles which are represented by matrices e and from this can be derived 13 matrices f. Then a 13 dimensional vector representing the proportion of the 13 independent cycles is constructed by multiplying Γ’ by the f matrices. For each target segment, there is a corresponding 13 dimensional vector and the n target vectors for the n target sequences are arranged as the rows of a matrix. Then several variables are defined which measure the similarity between these sequences. Eventually, these measures will be translated into an efficiency measure for the siRNA. Pancoska tested this algorithm on the human lamin A/C gene, the human CD54 gene, and the PTEN mRNA sequence. The following is a diagram from Pancoska's paper.
|