Special Session 40: Applications of dynamical systems in medicine and biology

Distances on DNA sequences
Magdalena Nowak
Jan Kochanowski University of Kielce
Poland
Co-Author(s):    Taras Banakh, Judyta B\k{a}k, Grzegorz Czerwonka, Joanna Garbuli\`{n}ska-W\k{e}grzyn, Piotr Lisicki, Micha\l{} Pop\l{}awski
Abstract:
We introduce several distances based on the Chaos Game Representation method. We use the pseudometric $d_k$ counting the differences in the number of corresponding $k$-mers in two DNA sequences $g,g`$, to define a metric $d(g,g`)=\sum_{k=1}^{\infty} \frac{d_k(g,g`)}{2^k}$ which combines the analysis of the frequency of occurrence of all possible $k$-mers. We compare this approach with known genetic distances such as Levenshtein edit distance, Average Nucleotide Identity or Mash distance. We also introduce distances invariant under the operations of inversion or nucleotides complementarity.