DeepG4: A Deep Learning Approach To Predict Cell-type Specific ...
G4 predictions with DeepG4
We then evaluated the prediction performance of DeepG4. In term of AUROC, DeepG4 obtained excellent predictions of active G4 regions from HaCaT cells on the testing set (Fig 3A; AUROC = 0.988). On an independent ChIP-seq experiment done with the same cell line (from GEO GSE99205 accession), prediction performance of DeepG4 also showed very high accuracy (AUC = 0.986; Fig 3A). We then evaluated the ability of DeepG4 trained on one cell line (HaCaT) to predict G4s in another cell line (e.g. K562). We first browsed the genome where G4 regions were mapped by ChIP-seq as active in K562. For instance, we looked around the oncogene KRAS known to be regulated by a G4 in its promoter (Fig 3B). ChIP-seq mapped one active G4 region in the promoter of KRAS, which was also predicted with high score by DeepG4 (score > 0.95). On the left side of KRAS, another active G4 region was mapped experimentally within CASC1 gene and was also predicted by DeepG4. On another locus, ChIP-seq mapped three main active G4 regions, located inside the genes C5orf28 (TMEM267), C5orf34 and PAIP1 (Fig 3C). These three regions were also predicted as active G4 regions with high score (score > 0.95). DeepG4 also mistakenly predicted with medium score two other regions within C5orf34 (score ≈ 0.6, red stars), which were not mapped by ChIP-seq.
- PPTPowerPoint slide
- PNGlarger image
- TIFForiginal image
A) Prediction performance of DeepG4. The model was trained and evaluated using HaCaT cell data. Predictions were evaluated on the testing set of sequences (same experiment as training set), but also on an independent set of sequences (from a different ChIP-seq experiment). Receiver operating characteristic (ROC) curve and area under the ROC curve (AUROC) were plotted. B) Genome browser of HaCaT-trained DeepG4 predictions and G4 ChIP-seq around KRAS gene in K562 cells. C) Genome browser of HaCaT-trained DeepG4 predictions and G4 ChIP-seq around C5orf34 gene in K562 cells. D) Prediction performance of DeepG4 trained using HaCaT data and evaluated on other cell lines. E) Genome-wide prediction performance of DeepG4 trained using HaCaT data and evaluated on other cell lines. Predictions are computed for every 200-b bins of the genome. Area Under the Precision-Recall curve is plotted (AUPR). F) Prediction performance of DeepG4* trained using HaCaT data and evaluated on other cell lines. DeepG4* is identical to DeepG4 except that chromatin accessibility is not used as input. G) Genome-wide prediction performance of DeepG4* trained using HaCaT data and evaluated on other cell lines. H) Comparison of DeepG4 and DeepG4* prediction performances, in terms of accuracy and false discovery rate (FDR) metrics. I) Comparison of DeepG4 and DeepG4* genome-wide prediction performances, in terms of accuracy and false discovery rate (FDR) metrics. J) Comparison of DeepG4 and DeepG4* promoter prediction performances, in terms of AUPR, accuracy and false discovery rate (FDR) metrics.
https://doi.org/10.1371/journal.pcbi.1009308.g003
Overall, DeepG4, which was trained using HaCaT cell line data, could well predict in other cell lines. For instance, the AUROC was very high for HEKnp (AUROC = 0.97; Fig 3D). For K562, HeLaS3 and H1975, AUROCs were also very good (K562: AUROC = 0.963; HeLaS3: AUROC = 0.948; H1975: AUROC = 0.948), except for 293T and A549, which presented good but slightly lower accuracy (293T: AUROC = 0.921; A459: AUROC = 0.912). We then evaluated predictions over the whole genome in an unbiased way. For this purpose, we split the genome into 200-base bins, and evaluated DeepG4 ability to discriminate between bins corresponding to active G4 regions (tens of thousands of bins) and other bins (millions of bins). Despite this highly imbalanced data, DeepG4 showed good prediction accuracy as measured by AUPR for HaCaT (AUPR = 0.291, independent experiment), K562 (AUPR = 0.309), 293T (AUPR = 0.176), A549 (AUPR = 0.124) and H1975 (AUPR = 0.129) (Fig 3E). For some cell lines, predictions were less good (HEKnp: AUPR = 0.019; HeLaS3: AUPR = 0.08).
We previously hypothesized that chromatin accessibility could help to produce cell-type specific predictions. To verify this assumption, chromatin accessibility was removed from DeepG4 model (yielding an alternative model called DeepG4*). Removing chromatin accessibility significantly lowered cell-type specific prediction accuracy. For instance, the AUROC of HaCaT (independent) was 0.939 for DeepG4* as compared to 0.986 for DeepG4, which represented an important difference (Fig 3F). We also found a large difference for HEKnp (DeepG4*, AUROC = 0.854; DeepG4, AUROC = 0.970). In terms of accuracy and false discovery rate (FDR) metrics, DeepG4* performed slightly less well than DeepG4 (Fig 3H). Regarding genome-wide predictions, removing chromatin accessibility also significantly lowered prediction performance (Fig 3G). For instance, for HaCaT (independent), we obtained an AUPR of 0.120 with DeepG4* and an AUPR of 0.291 with DeepG4. Regarding accuracy metric, DeepG4* performed less well than DeepG4, but slightly better in term of FDR (Fig 3I). We also assessed predictions on promoters to distinguish the promoters with active G4 regions from the promoters without active G4 regions. DeepG4* performed less well than DeepG4 in term of AUPR and accuracy, but slightly better in term of FDR (Fig 3J).
These results thus demonstrated the ability of DeepG4 to accurately predict cell-type specific active G4 regions from DNA sequences and chromatin accessibility. Moreover, results also revealed the importance of incorporating chromatin accessibility into DeepG4 for cell-type specific predictions.
Từ khóa » G4 Chip-seq
-
Genome-wide Mapping Of Endogenous G-quadruplex DNA ... - Nature
-
Promoter G-quadruplexes And Transcription Factors Cooperate To ...
-
Genome-wide Mapping Of Endogenous G-quadruplex DNA ...
-
Genome-wide Mapping Of G-quadruplex Structures With CUT&Tag
-
G-quadruplexes Are Transcription Factor Binding Hubs In Human ...
-
Promoter G-quadruplex Folding Precedes Transcription And Is ...
-
Genome-wide Mapping Of G-quadruplex Structures With CUT&Tag
-
Ligand-induced Native G-quadruplex Stabilization ... - Genome Res
-
DNA G-Quadruplexes Contribute To CTCF Recruitment - MDPI
-
G4 Sites Are Prevalent In Regulatory Chromatin Regions. (a) Example...
-
[PDF] Dna G-quadruplex Structures In The Mammalian Genome: Dissecting ...
-
Methodological Advances Of Bioanalysis And Biochemical Targeting ...
-
Sblab-bioinformatics/G4-vs-TFs - GitHub