Submit Manuscript  

Article Details


Cluster analysis of coronavirus sequences using computational sequence descriptors: With applications to SARS, MERS and SARS-CoV-2 (CoVID-19)

Author(s):

Marjan Vračko, Subhash C. Basak*, Tathagata Dey and Ashesh Nandy  

Abstract:


Background: Study of 573 genome sequences belonging to SARS, MERS and SARS-CoV-2 (CoVID-19) viruses.

Objective: To compare the virus sequences, which originate from different places around the world.

Methods: Alignment free methods for representation of sequences and chemometrical methods for analyzing of clusters.

Results: Majority of genome sequences are clustered with respect on virus type, but some of them are outliers.

Conclusion: We indicate 71 sequences, which tend to belong to more than cluster.

Keywords:

SARS-CoV-2 (CoVID-19), SARS, MERS, mathematical representation of sequences, clustering, Euclidean distance, Mahalanobis distance, principal component analysis, alignment-free sequenc descriptors.

Affiliation:

Theoretical Department. National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Theoretical Department. National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Centre for Interdisciplinary Research and Education, Kolkata, Centre for Interdisciplinary Research and Education, Kolkata



Full Text Inquiry