Background: For purpose of protein-DNA recognition analysis
we classified amino acids on the basis of protein-DNA contacts geometry and statistics. However, the methods of crisp classification do
not allow describing the diversity of properties of amino acids. Amino
acid residues have a variety of properties and can simultaneously belong to different classes. So, we used the classification of amino acids
with different types of fuzzing. Methods: Voronoi-Delaunay tessellation was used to determine the spatial relationship between the amino
acids of proteins and DNA nucleotides. Classification of amino acids
was carried out on the statistics of contacts and the statistics of area
of contact between amino acids and nucleotides. Classic hierarchical and non-hierarchical methods were used for crisp classification
of amino acids with different types of distance measures. General
variation approach was used for fuzzy classification of amino acids.
Results: It was shown using the proposed mathematical model that
about 30% of all contacts between amino acids and nucleotides in
protein-DNA complexes are not random. Crisp classification methods showed the existence of clustering invariants of amino acids. By
fuzzy classification methods it was shown that six classes are optimal
for protein-DNA recognition task. Conclusions: We are going to use
the fuzzy classification of amino acids data to construct the substitution matrix for DNA-binding protein sequences. This research is
funded by RFBR grants 12-07-00634-a and 14-04-00639-a.