Applications Of Machine Learning To The Problem Of ... - PubMed

Full text links CiteDisplay options Display options Format AbstractPubMedPMID

Abstract

Antimicrobial resistance (AMR) remains one of the most challenging phenomena of modern medicine. Machine learning (ML) is a subfield of artificial intelligence that focuses on the development of algorithms that learn how to accurately predict outcome variables using large sets of predictor variables that are typically not hand selected and are minimally curated. Models are parameterized using a training data set and then applied to a test data set on which predictive performance is evaluated. The application of ML algorithms to the problem of AMR has garnered increasing interest in the past 5 years due to the exponential growth of experimental and clinical data, heavy investment in computational capacity, improvements in algorithm performance, and increasing urgency for innovative approaches to reducing the burden of disease. Here, we review the current state of research at the intersection of ML and AMR with an emphasis on three domains of work. The first is the prediction of AMR using genomic data. The second is the use of ML to gain insight into the cellular functions disrupted by antibiotics, which forms the basis for understanding mechanisms of action and developing novel anti-infectives. The third focuses on the application of ML for antimicrobial stewardship using data extracted from the electronic health record. Although the use of ML for understanding, diagnosing, treating, and preventing AMR is still in its infancy, the continued growth of data and interest ensures it will become an important tool for future translational research programs.

Keywords: antibiotic resistance; antimicrobial stewardship; drug discovery; machine learning; mechanisms of action; whole-genome sequencing.

PubMed Disclaimer

Figures

FIG 1

FIG 1

General overview of machine learning…

FIG 1

General overview of machine learning analyses. (A) ML analyses are capable of integrating…

FIG 1 General overview of machine learning analyses. (A) ML analyses are capable of integrating a wide variety of data types. These include raw images or instrument traces, pathogen genomic sequences, data obtained from wet lab experiments, and information contained in the electronic health record. The latter encompasses clinical pathology results, free text notes, and structured data such as demographics, comorbidities, procedures, allergies, medication exposures, and hospital encounters. These inputs must be carefully cleaned and the relevant features extracted or engineered. Multiple validation checks are often necessary to ensure the data remain accurate after preprocessing. Next, the data are split into a training set used to define the model parameters and a remaining portion is held out for testing model performance. (B) There are three broad categories of ML analyses. The first two are supervised and unsupervised learning. In supervised learning, training data contains labels denoting the outcome of interest (i.e., antibiotic resistance phenotypes). The model trains on these data and then predicts the predefined outcomes of interest on test data. Unsupervised models are trained on data that does not contain labels for the outcome of interest. The model therefore searches on its own for relationships between variables and then predicts these relationships on unlabeled test data. A typical use for unsupervised learning involves clustering high dimensional data and outlier detection. The final category of ML analyses are reinforcement learning models. These models comprise an “agent” which interacts with its environment over time. The state of the environment is provided to the agent, and the agent then chooses an action, e.g., an antibiotic treatment choice, from a set of available options. It then assesses the impact of that action on the environment through a reward function. The purpose of the reinforcement learning agent is to learn a set of actions for different states (i.e., a “policy”) that maximizes the cumulative reward.
FIG 2

FIG 2

Schematic overview of the process…

FIG 2

Schematic overview of the process of training a machine learning (ML) model to…

FIG 2 Schematic overview of the process of training a machine learning (ML) model to predict antimicrobial susceptibility testing (AST) results from whole-genome sequences. (A) ML models rely on data sets containing tens to thousands of isolates with paired whole-genome sequences and phenotypic AST results. Data sets are divided into training and test sets, where the training set is used to fit the parameters of the model and the test set evaluates the model accuracy. An optimal data set contains a balance of resistant and susceptible examples for each organism-drug combination. (B) The data inputs are usually quality-controlled sequencing reads or assembled genomes, which are transformed into overlapping subsequences of length k, referred to as a k-mer. A typical length for a k-mer feature is 13 to 31 nucleotides, but a length of 6 nucleotides is shown here for clarity. There are 4k k-mer possibilities (e.g., 4,096 when k = 6), and the counts of each k-mer present in a given sequence are tallied for each isolate in the data set. The selected features are then merged with the phenotypic AST data, and this matrix is used as the input for a supervised machine learning model. The model analyzes the matrix to find the features, e.g., k-mers, that best predict resistance or susceptibility to a given antibiotic.
See this image and copyright information in PMC

References

    1. Centers for Disease Control and Prevention. 2019. Antibiotic resistance threats in the United States, 2019. Centers for Disease Control and Prevention, Atlanta, GA.
    1. O’Neill J. 2016. Tackling drug-resistant infections globally: final report and recommendations. Analysis and Policy Observatory, Hawthorn, Australia. https://amr-review.org/sites/default/files/160518_Final%20paper_with%20c....
    1. Burkov A. 2019. The hundred-page machine learning book. Andriy Burkov.
    1. Bzdok D, Altman N, Krzywinski M. 2018. Statistics versus machine learning. Nat Methods 15:233–234. 10.1038/nmeth.4642. - DOI - PMC - PubMed
    1. Schuler MS, Rose S. 2017. Targeted maximum likelihood estimation for causal inference in observational studies. Am J Epidemiol 185:65–73. 10.1093/aje/kww165. - DOI - PubMed
Show all 86 references

Publication types

  • Review Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search

MeSH terms

  • Anti-Bacterial Agents* / pharmacology Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search
  • Anti-Infective Agents* / pharmacology Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search
  • Artificial Intelligence Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search
  • Drug Resistance, Bacterial Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search
  • Humans Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search
  • Machine Learning Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search
  • Translational Research, Biomedical Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search

Substances

  • Anti-Bacterial Agents Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search
  • Anti-Infective Agents Actions
    • Search in PubMed
    • Search in MeSH
    • Add to Search

Grants and funding

  • R00 GM118907/GM/NIGMS NIH HHS/United States
  • R01 AI146194/AI/NIAID NIH HHS/United States

LinkOut - more resources

  • Full Text Sources

    • Atypon
    • Europe PubMed Central
    • PubMed Central
  • Other Literature Sources

    • The Lens - Patent Citations Database
    • scite Smart Citations
  • Medical

    • ClinicalTrials.gov
    • MedlinePlus Health Information
  • Miscellaneous

    • NCI CPTAC Assay Portal

Tag » A Review Of Artificial Intelligence Applications For Antimicrobial Resistance