Download PDFOpen PDF in browserTowards a Systematic Investigation of Deep Learning Approaches for Bacterial Taxonomic Classification Using the 16S rRNA GeneEasyChair Preprint 99253 pages•Date: April 4, 2023AbstractModern bacterial taxonomy revolves around bioinformatics-based analysis, leading to deeper insights into microbial communities and their composition. The 16S ribosomal RNA (16S rRNA) gene is a frequently used and well-established phylogenetic marker for in silico bacterial classification. With the rise of sequence data, novel machine learning methods are required to deal with the increasing complexity involved in analyses. In this project, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and attention-based deep learning models were proposed to serve as efficient alternative approaches to bacterial classification. Machine learning models were trained and evaluated with a manually curated 16S dataset. Two sequence encoding strategies, k-mer and one-hot encoding, were studied and evaluated with the CNN- and RNN-based models respectively. Although a one-hot encoding approach allows for a greater variety of experimental comparisons, k-mer encoding showed superior results. The performance of deep learning models was compared against the conventional machine learning-based Ribosomal Database Project (RDP) Classifier in terms of accuracy and training time. The CNN model with 8-mer encoding showed 96.33% test accuracy at the genus level, 0.17%p higher than the RDP Classifier, demonstrating the potential of deep learning approaches for bacterial classification. Keyphrases: 16S rRNA, Bacterial Classification, Convolutional Neural Network, Recurrent Neural Network, machine learning
|