| Abstract: |
| We introduce several distances based on the Chaos Game Representation method. We use the pseudometric $d_k$ counting the differences in the number of corresponding $k$-mers in two DNA sequences $g,g`$, to define a metric $d(g,g`)=\sum_{k=1}^{\infty} \frac{d_k(g,g`)}{2^k}$ which combines the analysis of the frequency of occurrence of all possible $k$-mers. We compare this approach with known genetic distances such as Levenshtein edit distance, Average Nucleotide Identity or Mash distance.
We also introduce distances invariant under the operations of inversion or nucleotides complementarity. |
|