Taxon genetics

This section provides an overview of the genetic information for species found within the Balearic region. The genetic data has been sourced from two major public databases: National Center for Biotechnology Information - GenBank¹² and the Barcode of Life Data Systems - BOLDSystem³. Please note that the genetic information in Balearica covers those species reported within the Balearic Archipelago, but it includes sequences numbers and IDs from around the world. If you are specifically interested in genetic data from samples collected exclusively within the Balearic area, you can activate the "Balearic Islands only" toggle to filter the results. Be aware that Balearica relies on the source’s available metadata to assign the geographic origin of a given genetic sequence. This may lead to an underrepresentation of the genetic information from Balearic samples, when detailed geographic information is missing from public portals.

We are constantly updating the platform with new species sequences. However, please note that some information may still be incomplete. If you come across any gaps or inconsistencies, we encourage you to contact us.

Warning

When the "Balearic Islands only" toggle is activated, the filter includes only taxa whose metadata provide geographic coordinates located within the Balearic area.

List of genetic metadata

Balearica includes data from the most commonly used genetic markers for taxonomic and systematic purposes (Table 1). Users can filter the data to focus on the genetic marker of interest in selecting the corresponding marker.

An innovative feature offered by Balearica is the ability to explore genetic information across different taxonomic levels. This feature enhances the flexibility of your search by providing not only the information for the specific taxon you are interested in but also for the taxonomic entities that lie below it. For example, when searching for the genus Aeshna, you will be able to access all genetic data from the lower taxonomic levels within that genus, such as species or subspecies.

The genetic metadata list also includes a column labeled "External ID", which corresponds to the identifier reported in the original source (listed in the "Source" column).

Warning

The "Total" button in the genetic metadata list refers to the overall number of entries we stored in the platform, not the sum of the individual markers listed therein. This is because a single genetic entry can contain multiple markers.

For example, a given taxon can have three entries:

Entry 1 (marker: COI)
Entry 2 (markers: COI, COII)
Entry 3 (markers: COI, COII)

In this case: Total = 3; COI = 3; COII = 2

Tab.1 Genetic markers, their abbreviations and synonyms in accessed public repositories.

Full name	Abbreviation	List of synonyms
12S ribosomal RNA	12S rRNA	rrn12S; 12S_rRNA; 12S ribosomal RNA; rRNA 12S; 12SrRNA; 12S
16S ribosomal RNA	16S rRNA	16S_rRNA; 16SribosomalRNA; 16SrDNA; 16S rRNA; 16SrDNA; rRNA 16S; rrn16S; 16SrRNA; 16S
18S ribosomal RNA	18S rRNA	rRNA-18S; 18S-5P; 18Srrn; rRNA_18s; 18srRNA; rrn18S; 18S-3P; 18S-V4; 18S
5.8S ribosomal RNA	5.8S rRNA	5-8S; 5 8S rRNA
Cytochrome b	cytochrome b	CYTB; Cyt.b; Cyt b; Cyt-b; CYTB5E
Ribulose-1,5-bisphosphate carboxylase small subunit	RubisCo	rbcLa; rbcL; rbcL-like; rbcL-ct
Internal Transcribed Spacer	ITS	ITS-2; ITS-1; ITS1; ITS1 rRNA; ITS2 rRNA; ITS2
Cytochrome oxidase	COX	COI-LIKE; COX6B; COX19-1; COX19-2; COX15; COX15_1; COX6A; COX6B2; COX11; COX10; COX6B-1_0; Cox4i1; COII-COIII
Cytochrome C Oxidase Subunit I	COI	COX1; COI-5P; COXI; CO1; -CO1; CO I; Cox1_1; Cox1_2; COI-3P; COI-5PNMT1; cox1-1; cox1-2; cox1-i5
Cytochrome C Oxidase Subunit II	COII	COX2; COXII; CO2; CO II; cytochrome c oxidase subunit II; cox2b; cox2-1; cox2-2; cCOII
Cytochrome C Oxidase Subunit III	COIII	COX3; COXIII; CO III; cytochrome c oxidase subunit III; cox3-2; cox3-1
ATP synthase F0 subunit 6	ATP6	Atp6-ps; atp6-2; atp6-1
ATP synthase F0 subunit 8	ATP8	Atp8_2; Atp8_1; ATP8A2; ATP8B4; ATP8B2; atp8-1; atp8-2
NADH dehydrogenase subunit 1	NADH1	ND1; nad1; NADH-1; NADH 1; nd1-i2
NADH dehydrogenase subunit 2	NADH2	ND2; nad2; nad2-i2; NADH-2; NADH 2
NADH dehydrogenase subunit 3	NADH3	ND3; nad3; NADH-3; ndh3; NADH 3; nad3-c; nad3-b; nad3-a
NADH dehydrogenase subunit 4	NADH4	NADH 4L; NADH 4; nad4b; nad4a; ND4L; nad4L-2; NADH-4L; ND4; NADH4L; nad4l; nad4; NADH-4; Nad4L_1; nad4L-1; Nad4L_2; ND4-0; nad4-3; nad4-1; nad4-2
NADH dehydrogenase subunit 5	NADH5	ND5; nad5; NADH-5; NADH 5; nad5-i6; NADH-dehydrogenase subunit 5, ND5; nad5-I1; nad5-I2; nd5-i1
Megakaryocyte-associated tyrosine-protein kinase	matk	matK-like
Histone H3	H3	H33_4; H32_5; H32_3

GenBank Marker string

Our query strings used to download genetic information from GenBank can be broken down into two main sections:

1) Organism Search: The first part of the query specifies the organism. This is represented as 'taxa'[Organism], where 'taxa' is substituted with the desired organism name.

2) Genetic Marker Search: The second part of the query focuses on the genetic marker. This is represented as 'marker'[GENE], where the marker is specified (e.g., "16S rRNA", "cytb", "atp6", etc.).

GenBank strings

Used strings:

Ribosomal RNA (rRNA): '[Organism] AND ("12s rrna"[GENE] OR "16s rrna"[GENE] OR "18s rrna"[GENE] OR "5.8s rrna"[GENE] OR "5,8s rrna"[GENE])'
Cytochrome b (cytb): '[Organism] AND ("cyt-b"[GENE] OR "cytb"[GENE] OR "cytochrome-b"[GENE])'
RuBisCO large subunit (rbcL): '[Organism] AND ("rubisco"[GENE] OR "rbcl"[GENE])'
Internal Transcribed Spacer region (ITS): '[Organism] AND ("its"[GENE] OR "its1"[GENE] OR "its2"[GENE])'
Cytochrome c oxidase subunit I (COI): '[Organism] AND ("coi"[GENE] OR "co1"[GENE] OR "cox"[GENE] OR "COX1"[GENE] OR "coxi"[GENE])'
Cytochrome c oxidase subunit II (COII): '[Organism] AND ("cox2"[GENE] OR "coii"[GENE] OR "co2"[GENE] OR "coxii"[GENE])'
ATP synthase subunit 6 and 8: '[Organism] AND ("atp6"[GENE] OR "atp8"[GENE])'
NADH dehydrogenase (complex I) subunit: '[Organism] AND ("nad1"[GENE] OR "nad2"[GENE] OR "nad3"[GENE] OR "nad4"[GENE] OR "nad5"[GENE] OR "nadh1"[GENE] OR "nadh2"[GENE] OR "nadh3"[GENE] OR "nadh4"[GENE] OR "nadh5"[GENE] OR "nd1"[GENE] OR "nd2"[GENE] OR "nd3"[GENE] OR "nd4"[GENE] OR "nd5"[GENE])'
Histone H3 and matK: '[Organism] AND ("histone3"[GENE] OR "h3"[GENE] OR "hist3"[GENE] OR "matk"[GENE])'

Shared Genetic Information Between GenBank and BOLDSystem

Many BOLDSystem sequences are retrieved from GenBank, and are identified by the label "Mined from GenBank, NCBI". To avoid data duplication, we include only the records provided by GenBank in Balearica. This ensures that genetic information is not redundantly stored while maintaining the integrity and accuracy of the dataset.

E. W. Sayers, J. Beck, E. E. Bolton, J. R. Brister, J. Chan, D. C. Comeau, R. Connor, M. DiCuccio, C. M. Farrell, M. Feldgarden, A. M. Fine, K. Funk, E. Hatcher, M. Hoeppner, M. Kane, S. Kannan, K. S. Katz, C. Kelly, W. Klimke, S. Kim, and S. T. … Sherry. Database resources of the national center for biotechnology information. Nucleic Acids Research, 52(D1):D33–D43, 2024. doi:10.1093/nar/gkad1044. ↩
Dennis A. Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and Eric W. Sayers. Genbank. Nucleic Acids Research, 41(D1):D36–D42, 11 2012. doi:10.1093/nar/gks1195. ↩
S. Ratnasingham, C. Wei, D. Chan, J. Agda, J. Agda, L. Ballesteros-Mejia, H. Ait Boutou, Z. M. El Bastami, E. Ma, R. Manjunath, D. Rea, C. Ho, A. Telfer, J. McKeown, M. Rahulan, C. Steinke, J. Dorsheimer, M. Milton, and P. D. N. Hebert. Bold v4: a centralized bioinformatics platform for dna-based biodiversity data. In DNA Barcoding: Methods and Protocols, chapter 26, pages 403–441. Springer US, New York, NY, 2024. ↩