Nucleotide sequences contain motifs that preserved through evolution because they are important to the structure or function of the molecules. DNA binding site analysis is an important issue in biology experiments as well as in computational methods. To find DNA binding sites that bind to specific transcription factors, we develop a robust mixed effect mixture model (RMEMM). The DNA sequences are represented as mixed effect model of position specific frequency, considering the relationship of frequency between positions. The results show that the mean effect is similar to position-specific scoring matrices (PSSM), providing a new view of the sequence. This model is robust to outliers or data with a bit large tails on distribution.
Robust mixture model clustering of DNA binding sites. Publishing Authors By Initials