Taxon genetics
Taxon genetics
This section provides an overview of the genetic information for species found within the Balearic region. The genetic data has been sourced from two major public databases: National Center for Biotechnology Information - GenBank12 and the Barcode of Life Data Systems - BOLDSystem3. Please note that the genetic information in Balearica covers those species reported within the Balearic Archipelago, but it includes sequences numbers and IDs from around the world. If you are specifically interested in genetic data from samples collected exclusively within the Balearic area, you can activate the "Balearic Islands only" toggle to filter the results. Be aware that Balearica relies on the source’s available metadata to assign the geographic origin of a given genetic sequence. This may lead to an underrepresentation of the genetic information from Balearic samples, when detailed geographic information is missing from public portals.
We are constantly updating the platform with new species sequences. However, please note that some information may still be incomplete. If you come across any gaps or inconsistencies, we encourage you to contact us.
Warning
When the "Balearic Islands only" toggle is activated, the filter includes only taxa whose metadata provide geographic coordinates located within the Balearic area.
List of genetic metadata
Balearica includes data from the most commonly used genetic markers for taxonomic and systematic purposes (Table 1). Users can filter the data to focus on the genetic marker of interest in selecting the corresponding marker.
An innovative feature offered by Balearica is the ability to explore genetic information across different taxonomic levels. This feature enhances the flexibility of your search by providing not only the information for the specific taxon you are interested in but also for the taxonomic entities that lie below it. For example, when searching for the genus Aeshna, you will be able to access all genetic data from the lower taxonomic levels within that genus, such as species or subspecies.
The genetic metadata list also includes a column labeled "External ID", which corresponds to the identifier reported in the original source (listed in the "Source" column).
Warning
The "Total" button in the genetic metadata list refers to the overall number of entries we stored in the platform, not the sum of the individual markers listed therein. This is because a single genetic entry can contain multiple markers.
For example, a given taxon can have three entries:
- Entry 1 (marker: COI)
- Entry 2 (markers: COI, COII)
- Entry 3 (markers: COI, COII)
In this case: Total = 3; COI = 3; COII = 2
Tab.1 Genetic markers, their abbreviations and synonyms in accessed public repositories.
| Full name | Abbreviation | List of synonyms | 
|---|---|---|
| 12S ribosomal RNA | 12S rRNA | rrn12S; 12S_rRNA; 12S ribosomal RNA; rRNA 12S; 12SrRNA; 12S | 
| 16S ribosomal RNA | 16S rRNA | 16S_rRNA; 16SribosomalRNA; 16SrDNA; 16S rRNA; 16SrDNA; rRNA 16S; rrn16S; 16SrRNA; 16S | 
| 18S ribosomal RNA | 18S rRNA | rRNA-18S; 18S-5P; 18Srrn; rRNA_18s; 18srRNA; rrn18S; 18S-3P; 18S-V4; 18S | 
| 5.8S ribosomal RNA | 5.8S rRNA | 5-8S; 5 8S rRNA | 
| Cytochrome b | cytochrome b | CYTB; Cyt.b; Cyt b; Cyt-b; CYTB5E | 
| Ribulose-1,5-bisphosphate carboxylase small subunit | RubisCo | rbcLa; rbcL; rbcL-like; rbcL-ct | 
| Internal Transcribed Spacer | ITS | ITS-2; ITS-1; ITS1; ITS1 rRNA; ITS2 rRNA; ITS2 | 
| Cytochrome oxidase | COX | COI-LIKE; COX6B; COX19-1; COX19-2; COX15; COX15_1; COX6A; COX6B2; COX11; COX10; COX6B-1_0; Cox4i1; COII-COIII | 
| Cytochrome C Oxidase Subunit I | COI | COX1; COI-5P; COXI; CO1; -CO1; CO I; Cox1_1; Cox1_2; COI-3P; COI-5PNMT1; cox1-1; cox1-2; cox1-i5 | 
| Cytochrome C Oxidase Subunit II | COII | COX2; COXII; CO2; CO II; cytochrome c oxidase subunit II; cox2b; cox2-1; cox2-2; cCOII | 
| Cytochrome C Oxidase Subunit III | COIII | COX3; COXIII; CO III; cytochrome c oxidase subunit III; cox3-2; cox3-1 | 
| ATP synthase F0 subunit 6 | ATP6 | Atp6-ps; atp6-2; atp6-1 | 
| ATP synthase F0 subunit 8 | ATP8 | Atp8_2; Atp8_1; ATP8A2; ATP8B4; ATP8B2; atp8-1; atp8-2 | 
| NADH dehydrogenase subunit 1 | NADH1 | ND1; nad1; NADH-1; NADH 1; nd1-i2 | 
| NADH dehydrogenase subunit 2 | NADH2 | ND2; nad2; nad2-i2; NADH-2; NADH 2 | 
| NADH dehydrogenase subunit 3 | NADH3 | ND3; nad3; NADH-3; ndh3; NADH 3; nad3-c; nad3-b; nad3-a | 
| NADH dehydrogenase subunit 4 | NADH4 | NADH 4L; NADH 4; nad4b; nad4a; ND4L; nad4L-2; NADH-4L; ND4; NADH4L; nad4l; nad4; NADH-4; Nad4L_1; nad4L-1; Nad4L_2; ND4-0; nad4-3; nad4-1; nad4-2 | 
| NADH dehydrogenase subunit 5 | NADH5 | ND5; nad5; NADH-5; NADH 5; nad5-i6; NADH-dehydrogenase subunit 5, ND5; nad5-I1; nad5-I2; nd5-i1 | 
| Megakaryocyte-associated tyrosine-protein kinase | matk | matK-like | 
| Histone H3 | H3 | H33_4; H32_5; H32_3 | 
GenBank Marker string
Our query strings used to download genetic information from GenBank can be broken down into two main sections:
1) Organism Search: The first part of the query specifies the organism. This is represented as 'taxa'[Organism], where 'taxa' is substituted with the desired organism name.
2) Genetic Marker Search: The second part of the query focuses on the genetic marker. This is represented as 'marker'[GENE], where the marker is specified (e.g., "16S rRNA", "cytb", "atp6", etc.).
GenBank strings
Used strings:
- Ribosomal RNA (rRNA): '[Organism] AND ("12s rrna"[GENE] OR "16s rrna"[GENE] OR "18s rrna"[GENE] OR "5.8s rrna"[GENE] OR "5,8s rrna"[GENE])'
- Cytochrome b (cytb): '[Organism] AND ("cyt-b"[GENE] OR "cytb"[GENE] OR "cytochrome-b"[GENE])'
- RuBisCO large subunit (rbcL): '[Organism] AND ("rubisco"[GENE] OR "rbcl"[GENE])'
- Internal Transcribed Spacer region (ITS): '[Organism] AND ("its"[GENE] OR "its1"[GENE] OR "its2"[GENE])'
- Cytochrome c oxidase subunit I (COI): '[Organism] AND ("coi"[GENE] OR "co1"[GENE] OR "cox"[GENE] OR "COX1"[GENE] OR "coxi"[GENE])'
- Cytochrome c oxidase subunit II (COII): '[Organism] AND ("cox2"[GENE] OR "coii"[GENE] OR "co2"[GENE] OR "coxii"[GENE])'
- ATP synthase subunit 6 and 8: '[Organism] AND ("atp6"[GENE] OR "atp8"[GENE])'
- NADH dehydrogenase (complex I) subunit: '[Organism] AND ("nad1"[GENE] OR "nad2"[GENE] OR "nad3"[GENE] OR "nad4"[GENE] OR "nad5"[GENE] OR "nadh1"[GENE] OR "nadh2"[GENE] OR "nadh3"[GENE] OR "nadh4"[GENE] OR "nadh5"[GENE] OR "nd1"[GENE] OR "nd2"[GENE] OR "nd3"[GENE] OR "nd4"[GENE] OR "nd5"[GENE])'
- Histone H3 and matK: '[Organism] AND ("histone3"[GENE] OR "h3"[GENE] OR "hist3"[GENE] OR "matk"[GENE])'
Shared Genetic Information Between GenBank and BOLDSystem
Many BOLDSystem sequences are retrieved from GenBank, and are identified by the label "Mined from GenBank, NCBI". To avoid data duplication, we include only the records provided by GenBank in Balearica. This ensures that genetic information is not redundantly stored while maintaining the integrity and accuracy of the dataset.
- 
E. W. Sayers, J. Beck, E. E. Bolton, J. R. Brister, J. Chan, D. C. Comeau, R. Connor, M. DiCuccio, C. M. Farrell, M. Feldgarden, A. M. Fine, K. Funk, E. Hatcher, M. Hoeppner, M. Kane, S. Kannan, K. S. Katz, C. Kelly, W. Klimke, S. Kim, and S. T. … Sherry. Database resources of the national center for biotechnology information. Nucleic Acids Research, 52(D1):D33–D43, 2024. doi:10.1093/nar/gkad1044. ↩ 
- 
Dennis A. Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and Eric W. Sayers. Genbank. Nucleic Acids Research, 41(D1):D36–D42, 11 2012. doi:10.1093/nar/gks1195. ↩ 
- 
S. Ratnasingham, C. Wei, D. Chan, J. Agda, J. Agda, L. Ballesteros-Mejia, H. Ait Boutou, Z. M. El Bastami, E. Ma, R. Manjunath, D. Rea, C. Ho, A. Telfer, J. McKeown, M. Rahulan, C. Steinke, J. Dorsheimer, M. Milton, and P. D. N. Hebert. Bold v4: a centralized bioinformatics platform for dna-based biodiversity data. In DNA Barcoding: Methods and Protocols, chapter 26, pages 403–441. Springer US, New York, NY, 2024. ↩