Taxonomic Occurrence Services

A taxon record stores the hierarchical classification of a taxon, its scientific names and synonyms, and other relevant data retrieved from the respective source system. All components and data types in the taxon model, as well as a comprehensive list of all taxon-related endpoints are documented in the API endpoint reference. A list of available fields is also available at:

https://api.biodiversitydata.nl/v2/taxon/metadata/getFieldInfo

Base URL

The base URL for taxon-specific services is https://api.biodiversitydata.nl/v2/taxon

Data Source Systems

Currently, Naturalis provides data from these sources:

The field sourceSystem.code stores the source system of a taxon (COL, NSR, and DCSR respectively).

Available Services

Query

Querying for taxonomic data can be done using the /taxon/query/ endpoint, which accepts human-readable query strings and JSON encoded QuerySpec parameters.

Retreiving large quantities of data

Note that the query service is limited to a maximum of 10.000 rercords to retreive with one query. For larger quantities, we offer a /taxon/download service which returns the data as a gzipped JSON stream.

Data Access

Several access methods offer the convenient retrieval of taxa matching a certain identifier. The services /taxon/find/ and /taxon/findByIds/ retrieve taxa according to their id fields (see here).

Aggregation

A count aggregation of query results (using query parameters or a QuerySpec object) can be done using the /taxon/count/ endpoint.

For a specific field, /taxon/getDistinctValues/ returns all different values that there is in the data or that field.

Nested aggreation over two fields can be done with /taxon/getDistinctValuesPerGroup/

/taxon/countDistinctValues/ and /taxon/countDistinctValuesPerGroup/ do the same as the above, but return only the counts instead of any data.

For more information and examples on aggregation queries, please also refer to the advanced queries section.

DwCA Download

Download services offer bulk retrieval of taxonomic data. Instead of JSON format, download services return zip files containing the data. The zip files are formatted according to the Darwin Core archive standard for the exchange of biodiversity data (also see below). While collection download services offer pre-compiled datasets, dynamic download services produce Darwin Core archives for the results of any query for taxon or specimen data types.

Species Collection DwCA Download

The endpoint for downloading a collection of taxa (species) is /taxon/dwcagetDataSet/ with the name of a specific dataset. The names of predefined datasets can be retrieved with the endpoint /taxon/dwca/getDataSetNames/.

Dynamic DwCA Download

Dynamic download queries follow the same syntax as regular queries with the query endpoint. Suppose we have a simple query for taxa that are in the genus crocus:

https://api.biodiversitydata.nl/v2/taxon/query/?defaultClassification.genus=Crocus

Simply adding the path dwca in front of query will return the zipped archive:

https://api.biodiversitydata.nl/v2/taxon/dwca/query/?defaultClassification.genus=Crocus

Darwin Core Archives

Following files are contained in the zip archives generated by the download services:

A core data file in csv format named Taxa.txt. This file contains a tabular representation of the data with the first row defining the column names.
A descriptor file named meta.xml which maps the columns in the core data file to their respective TWDG term. Each column in the data is thus mapped to a specific concept termed by the TDWG consortium.
A metadata file named eml.xml formatted according to the Electronic Metadata Language specification EML. Metadata in this file includes a description of the dataset and details about the source institution.
Extension data: The csv file Vernacular_names.txt gives information about common taxon names. The mapping of columns to TDWG terms is provided in meta.xml.

Metadata

Metadata services provide miscellaneous information about taxon records. This includes detailed information about a taxon's fields and paths. A description of all taxon metadata services can be found here.

Identifiers

The field sourceSystemId of a taxon is the identifier as it is in the source database. A unique identifier consisting of {sourceSystemId}@{sourceSystem.code} is stored in the field id. The recordURI is a direct link to the database entry in the source system.

Classification

The classification of the taxon is according to the source system. In taxon records, there are two different types of classification: systemClassification and defaultClassification. The systemClassification is the verbatim classification as found in the taxonomic source system. During the import process, this classification is converted to satisfy the Biodiversity Information standards of the Taxonomic Database Working group (TDWG, see here). The TDWG-conform classification is termed defaultClassification here.

Names, Descriptions and Synonyms

Each taxon has an acceptedName that represents this taxon in the source system. The acceptedName block stores additional taxonomic information, such as species and genus names and the authors that termed the taxon name. For example, we can retrieve all taxa that have been described by Linnaeus:

https://api.biodiversitydata.nl/v2/taxon/query/?acceptedName.author=Linnaeus

Furthermore, taxon records can have a list of synonyms, descriptions, references and vernacular names (common names). Suppose we would like to search for passion flowers, without having any prior knowledge (e.g. without knowing that their genus is Passiflora):

{
  "conditions": [
    {
      "field": "vernacularNames.name",
      "operator": "MATCHES",
      "value": "passion flower"
    }
  ],
  "size": 100
}

And there at least three species of genus Passiflora in the result set. Note that our datasets also include vernacular names in different languages; searching for the term “passiebloem” also yields a Passiflora taxon.