GRCh37/38(NCBI) Vs Hg19/hg38(UCSC) - Biostars
Are there any major differences between the GRCh38 (NCBI) and hg38(UCSC) databases, aside from the fact that GRCh38 uses a 1-based coordinate system, while UCSC uses a 0-based coordinate system? Are there any pros/cons in using one vs the other? And, I am guessing that any identifier conversion software (e.g, BioMart) should choose one database over the other? Also, where does Ensembl come into play? Is the Ensembl database just a subset of the GRCh38 (NCBI) database? Any clarification would be greatly appreciated.
ncbi ucsc grch38 hg38 • 86k views ADD COMMENT • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by pwg46 ▴ 540 67 Entering edit modeGRCh37/hg19 and GRCh38 are genome builds rather than annotations, which describe where features are in a given genome build. The actual sequences you'll get from NCBI/UCSC/Ensembl will be identical, but their annotations will be different and (importantly) updated at different frequencies. NCBI's annotation is the "refseq" dataset (the "refGene" track in UCSC), which is essentially a subset of the UCSC and Ensembl annotations. UCSC's annotations are kind of a mess. You'll find genes with the same ID on multiple strand and multiple chromosomes, which makes them a bit useless. Ensembl's annotations typically contain more features than UCSC (so a bit more noise), but they're otherwise much better put together (e.g., you'll never find a gene ID on different strand or different chromosomes) and their IDs are typically easier to map to other things (e.g., gene names, GO and pathway memberships). Ensembl also updates its annotation fairly often and versions everything nicely, so it's quite convenient to report what version you used in a paper (reproducibility is always a good thing). Given the choice, use the Ensembl annotation.
BTW, don't forget that the various sources can use different names for chromosomes (e.g., chr1 in UCSC is just 1 in Ensembl), so don't mix and match them.
ADD COMMENT • link 11.3 years ago by Devon Ryan 105k 1 Entering edit modeI see. Thank you for your answer. So, right now I am using the Ensembl and Uniprot databases. Would there be any reason to include the UCSC database if I am working with an identifier conversion tool? E.g, say I am trying to map Ensembl Transcript (ENST) identifiers to Uniprot. Would I get any different mappings converting directly from ENST->Uniprot (both Ensembl and Uniprot dbs have data files which do so) than converting from ENST->UCSC->Uniprot?
ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by pwg46 ▴ 540 0 Entering edit modeYou might get more ambiguous mappings going via UCSC (or not, it's hard to say).
ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by Devon Ryan 105k 0 Entering edit modeOkay. So, in general, do you think it would be wise to stick only with the Ensembl database and not mix the two (Ensembl and UCSC) with respect to an identifier conversion software?
ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by pwg46 ▴ 540 2 Entering edit modeYeah, you'll normally just have more headaches by mixing the two and Ensembl is typically one of the more supported IDs.
ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by Devon Ryan 105k 3 Entering edit modeNo need to map IDs between resources yourself, EnsEMBL has good cross-references to many other databases including UniProt. You can access those either via BioMart or with the API.
ADD REPLY • link updated 4.0 years ago by Ram 45k • written 11.3 years ago by Jean-Karim Heriche 27k 1 Entering edit modeThe UCSC Genome Browser just released an "NCBI RefSeq" track that is based entirely on coordinates and alignments provided by the RefSeq group. These new tracks should avoid the issue of genes mapping to multiple locations, etc. You can read about it more on our website: https://genome.ucsc.edu/goldenPath/newsarch.html#030317.
Matthew Speir UCSC Genome Bioinformatics Group
ADD REPLY • link updated 4.0 years ago by Ram 45k • written 8.8 years ago by Matthew_UCSC ▴ 20 3 Entering edit modeIn addition to BioMart and the Perl API, you can also use the Ensembl REST API to map Ensembl IDs to cross reference entries and vice versa.
ADD COMMENT • link 11.2 years ago by Denise CS ★ 5.2kLogin before adding your answer.
Similar Posts Loading Similar Posts Traffic: 84 users visited in the last hour Content Search Users Tags Badges Help About FAQ Access RSS API StatsUse of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by theTừ khóa » Hg19 Hg38
-
Similarities And Differences Between Variants Called With Human ...
-
Human Genome Reference Builds - GRCh38 Or Hg38 - B37 - Hg19
-
Hg19 Vs Hg38. Notable Differences. - SEQanswers
-
[PDF] 1 Human Genome Version 38 FAQ General Questions 1. Is Hg38 The ...
-
Hg38 Vs Hg19 : R/bioinformatics - Reddit
-
Similarities And Differences Between Variants Called With ... - PubMed
-
Which Human Reference Genome Should You Use? Hg19 Vs Hg38
-
Reference Genome Comparison Finds Exome Variant Discrepancies
-
Hg19 Vs Hg38 In Two Pictures - Biostars
-
Demystifying The Versions Of GRCh38/hg38 Reference Genomes ...
-
Lift Genome Annotations
-
Hg19 Diff Track Settings - UCSC Genome Browser
-
Mitochondrial Genome Versions - VarSome Help Center
-
Which Reference Genome Is Being Used To Align The Reads?