How to Download
FRDR offers multiple ways to download datasets. Learn more in our documentation.
Caribou pipeline for the alignment-free bacterial identification and classification in metagenomics sequencing data using machine learning
Description: | This dataset contains sequencing data used to train the models of the Caribou pipeline. We developed this pipeline for alignment-free bacterial identification and classification in metagenomics sequencing data using machine learning. The datasets were derived from the GTDB v.202 database (https://data.gtdb.ecogenomic.org/releases/release202/202.0/) and include training steps using the species representatives, as the benchmark datasets used non-representative whole genomes. We also simulated sequencing reads to evaluate and compare performance on whole genomes and sequencing reads. We provide models and encoding files of CNN-trained models; datasets used for training, validation and testing of models, randomly sampled from representative genomes; and datasets used for benchmarking the method against state-of-the-art methods, randomly sampled from non-representative whole genomes and simulated reads. |
Notes: | This data was used in Nicolas de Montigny's Master's degree thesis (https://archipel.uqam.ca/18182/) |
Authors: | de Montigny, Nicolas; University of Quebec at Montreal; 0000-0002-3708-4055 Steven W., Kembel; University of Quebec at Montreal; 0000-0001-5224-0952 Abdoulaye Baniré, Diallo; University of Quebec at Montreal |
Keywords: | metagenomics alignment-free DNA classification classification models machine learning neural networks bacterial genomes taxonomic classification |
Field of Research: | Computer and information sciences > Artificial intelligence (AI) > Machine learning
|
Publication Date: | 2024-12-12 |
Publisher: | Federated Research Data Repository / dépôt fédéré de données de recherche |
Funder: | Natural Sciences and Engineering Research Council of Canada (NSERC) |
URI: | https://doi.org/10.20383/103.01160 |
Related Identifiers: |
This dataset is part/subset of
This dataset is derived from
This dataset is derived from
|
Files in Dataset
No files uploaded
Download entire dataset using Globus Transfer. This method requires a Globus account and installing software. Watch Video: Get Started with FRDR: Download a Dataset
Files for this dataset are currently being backed up so it cannot be approved at this time. Please try later.
Access to this dataset is subject to the following terms:
Creative Commons Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/
Citation
de Montigny, N., Steven W., K., Abdoulaye Baniré, D. (2024). Caribou pipeline for the alignment-free bacterial identification and classification in metagenomics sequencing data using machine learning. Federated Research Data Repository. https://doi.org/10.20383/103.01160