Specimen occurrence services

Specimen records constitute the core of data served by the NBA. Museum objects and observations can represent a whole variety of different entities such as plants, animals or single parts thereof, DNA samples, fossils, rocks or meteorites. Also species occurrences are described as a specimen in our data. The specimens in our document store are therefore described with an extensive data model. All components and data types in the Specimen model, as well as a comprehensive list of all specimen-related endpoints are documented in the API endpoint reference . Below, the major components of specimen records are introduced and examples are given on how to query them. A list of available fields can be found at

https://api.biodiversitydata.nl/v2/specimen/metadata/getFieldInfo

Base URL

The base URL for specimen-specific services is https://api.biodiversitydata.nl/v2/specimen

Data source systems

Specimen occurrence data are harvested from four main data sources, (i) the CRS (Collection Registration System for zoological and geological specimens), (ii) BRAHMS (http://herbaria.plants.ox.ac.uk/bol/) for botanical specimen including fungi, (iii) the Xeno-canto database of bird sounds, and (iv) species occurrences from Observation International. This information is stored in the path sourceSystem.code. The query

https://api.biodiversitydata.nl/v2/specimen/query/?sourceSystem.code=BRAHMS

will return all plant and fungi specimens.

Available services

Query

Querying for specimens can be done using the /specimen/query/endpoint , which accepts human-readable query strings and JSON encoded QuerySpec parameters.

Retrieving large quantities of data

Note that the query service is limited to a maximum of 10.000 records to retrieve with one query. For larger quantities, we offer two services that can return the complete set of records matching your query: a download service, and a BatchQuery service.

Download service

The download service returns records as a gzipped JSON stream. For example, retrieving the entire botany collection, the /specimen/download service can be used with the query

{
  "conditions" : [
    { "field" : "collectionType", "operator" : "=", "value" : "Botany" }
  ]
}

BatchQuery service

To retrieve results in batches, use the BatchQuery service. This service returns the number of records indicated by the size parameter, as well as a token. This token can be used to retrieve subsequent batches, until a token is no longer present in the returned result, in which case you have retrieved the last available batch. For example, retrieving the entire botany collection, the /specimen/batchQuery service can be used with the initial query:

{
  "conditions" : [
    { "field" : "collectionType", "operator" : "=", "value" : "Botany" }
  ],
  "size" : 1000
}

The results will include a token, which can be used in subsequent requests using the parameter _token instead of the _querySpec parameter.

Data access

Several access methods offer the convenient retrieval of specimens matching a certain identifier or being part of a certain collection. The services /specimen/findByUnitID/ , /specimen/find/ and /specimen/findByIds/ retrieve specimens according to their unitID or id fields (see here). The service /specimen/getNamedCollections/ returns all available collections of species (e.g. Mammalia) and /specimen/getIdsInCollection/ returns all identifiers of specimen that are part of that collection.

Aggregation

A count aggregation of query results (using query parameters or a QuerySpec object) can be done using the /specimen/count/ endpoint.

For a specific field, /specimen/getDistinctValues/ returns all different values that there is in the data or that field.

Nested aggreation over two fields can be done with /specimen/getDistinctValuesPerGroup/

/specimen/countDistinctValues/ and /specimen/countDistinctValuesPerGroup/ do the same as the above, but return only the counts instead of any data.

For more information and examples on aggregation queries, please also refer to the advanced queries section.

DwC-A download

Download services offer bulk retrieval of specimen occurrence data. Instead of JSON format, download services return zip files containing the data. The zip files are formatted according to the Darwin Core archive standard for the exchange of biodiversity data (also see below). While collection download services offer pre-compiled datasets, dynamic download services produce Darwin Core archives for the results of any query for taxon or specimen data types.

Specimen collection DwC-A download

The endpoint for specimen collection downloads is /specimen/dwca/getDataSet/ with the name of a specific dataset. The names of predefined datasets can be retrieved with the endpoint /specimen/dwca/getDataSetNames/ . A dataset, for instance the tunicata can then be downloaded as follows:

https://api.biodiversitydata.nl/v2/specimen/dwca/getDataSet/tunicata

Dynamic DwC-A download

Dynamic download queries follow the same syntax as regular queries with the query endpoint. Suppose we have a simple query for specimen of the genus crocus:

https://api.biodiversitydata.nl/v2/specimen/query/?identifications.defaultClassification.genus=Crocus

Simply adding the path dwca in front of query will return the zipped archive:

https://api.biodiversitydata.nl/v2/specimen/dwca/query/?identifications.defaultClassification.genus=Crocus

Darwin Core archives

Following files are contained in the zip archives generated by the download services:

  • A core data file in csv format named Occurrence.txt. This file contains a tabular representation of the data with the first row defining the column names.
  • A descriptor file named meta.xml which maps the columns in the core data file to their respective TDWG term. Each column in the data is thus mapped to a specific concept termed by the TDWG consortium.
  • A metadata file named eml.xml formatted according to the Electronic Metadata Language specification EML. Metadata in this file includes a description of the dataset and details about the source institution.

Metadata

Specimen metadata services provide miscellaneous information about specimen records. This includes detailed information about a specimen’s fields and paths . A description of all specimen metadata services can be found here .

Identifiers

Specimen records have several identifiers. The field untitID is the identifier from the specific source system. Since uniqueness across source systems is not ensured, the field id, consisting of {unitID}@{sourceSystem.code}. Further, the field unitGUID represents the a permanent uniform web location (PURL, see also PURL services).

Collection types

All specimens are categorised into different subcollections (e.g. mammals, aves, petrology or paleobotany, birdsounds, observations, …). The following query retrieves the names of all available collections and their specimen counts.

https://api.biodiversitydata.nl/v2/specimen/getDistinctValues/collectionType

Gathering events

The gatheringEvent field of a specimen holds all relevant information about the process of obtaining the specimen. This includes finder, date, exact location, and information about the estimated specimen age (biostratigraphy/litostratigraphy). A use-case could be for instance the retrieval of all specimens that were collected between 1750 and 1800 by P. Miller:

{
  "conditions": [
    {
      "field": "gatheringEvent.dateTimeBegin",
      "operator": "BETWEEN",
      "value": [
        "1750",
        "1800"
      ]
    },
    {
      "field": "gatheringEvent.gatheringPersons.fullName",
      "operator": "EQUALS",
      "value": "Miller, P"
    }
  ],
  "sortFields": [
    {
      "path": "gatheringEvent.dateTimeBegin",
      "sortOrder": "ASC"
    }
  ]
}

As a second example, we query for all specimen that are classified within the family Passifloraceae and that have lat-long coordinates (fields gatheringEvent.siteCoordinates.latitudeDecimal, gatheringEvent.siteCoordinates.longitudeDecimal):

{
  "conditions": [
    {
      "field": "identifications.defaultClassification.family",
      "operator": "EQUALS",
      "value": "Passifloraceae"
    },
    {
      "field": "gatheringEvent.siteCoordinates.longitudeDecimal",
      "operator": "NOT_EQUALS",
      "value": null
    },
    {
      "field": "gatheringEvent.siteCoordinates.latitudeDecimal",
      "operator": "NOT_EQUALS",
      "value": null
    }
  ]
}

Identifications

A crucial part of information about a biological specimen is its identification, i.e. the assignment to an existing taxonomic classification. The identifications field of a specimen can contain one or more species identifications. Multiple identifications are possible if for instance a specimen has been re-identified e.g. using a new identification key or DNA barcoding. Also concretions containing multiple fossils species will have multiple identifications. To indicate that one identification is more reliable than the others, one identification of a specimen can have the identification.preferred flag set to true. Identifications usually store taxonomic rank, the scientific (identifications.scientificName) name and higher-level classifications (identifications.defaultClassification) of the specimen. Also the person who identified the specimen, date and references to scientific publications, type status and vernacular (common) taxon names are stored in the identifications block.

Furthermore, the

https://api.biodiversitydata.nl/v2/specimen/metadata/queryWithNameResolution

service allows for searching for specimens by names other than the assigned taxonomic classification(s). The service accepts queries with a nameResolutionRequest clause, which triggers a sub-query for synonyms and/or vernacular names in the Catalogue of Life and the Dutch Species Register. From the resulting records, scientific names are extracted, which are subsequently used for the main specimen search. Example: The common vampire bat was previously classified as Phyllostoma rotundum but now the accepted scientific name is Desmodus rotondus. If we search for the previous name:

{
  "conditions": [
    {
      "field": "identifications.scientificName.fullScientificName",
      "operator": "MATCHES",
      "value": "Phyllostoma rotundum"
    }
  ],
  "fields": [
    "identifications.scientificName.fullScientificName"
  ],
  "size": 1000
}

We won’t find any specimens of the vampire bat. If we, however, include a nameResolutionRequest-clause, the search will include synonyms into the search as well, using

https://api.biodiversitydata.nl/v2/specimen/metadata/queryWithNameResolution

:

{
    "conditions": [ ],
    "nameResolutionRequest": {
        "searchString": "Phyllostoma rotundum",
        "nameTypes": [
            "ACCEPTED_NAME",
            "SYNONYM"
        ],
        "searchType": "STARTS_WITH",
        "useCoL": false,
        "fuzzyMatching": false,
        "from": 0,
        "size": 100
    }
}

We will have multiple vampire bat specimen in the results.

Please note: by setting the parameter useCoL to true, the service tries to look up the name specified by searchString using an external service of the Catalogue of Life. Currently, this only works for the name types ACCEPTED_NAME and SYNONYM. Trying to resolve a name of type VERNACULAR_NAME will cause an error when useCoL is set to true. However, when useCoL is false, the service will access the internal taxon index to look up the specified name , which supports all three name types, including VERNACULAR_NAME.

Multimedia

A specimen record can link to one or more multimedia items. Multimedia information is stored in the multimedia data type. Additionally, links to the associated multimedia records are also stored within specimens, in the fields associatedMultiMediaUris.accessUri and associatedMultiMediaUris.format.