Specimen occurrence services

Specimen records constitute the core of data served by the NBA. Museum objects and observations can represent a whole variety of different entities such as plants, animals or single parts thereof, DNA samples, fossils, rocks or meteorites. Also species occurrences are described as a specimen in our data. The specimens in our document store are therefore described with an extensive data model. All components and data types in the Specimen model, as well as a comprehensive list of all specimen-related endpoints are documented in the API endpoint reference . Below, the major components of specimen records are introduced and examples are given on how to query them. A list of available fields can be found at

https://api.biodiversitydata.nl/v2/specimen/metadata/getFieldInfo

Base URL

The base URL for specimen-specific services is https://api.biodiversitydata.nl/v2/specimen

Data source systems

Specimen occurrence data are harvested from four main data sources, (i) the CRS (Collection Registration System for zoological and geological specimens), (ii) BRAHMS (http://herbaria.plants.ox.ac.uk/bol/) for botanical specimen including fungi, (iii) the Xeno-canto database of bird sounds, and (iv) species occurrences from Observation International. This information is stored in the path sourceSystem.code. The query

https://api.biodiversitydata.nl/v2/specimen/query/?sourceSystem.code=BRAHMS

will return all plant and fungi specimens.

Available services

Query

Querying for specimens can be done using the /specimen/query/endpoint , which accepts human-readable query strings and JSON encoded QuerySpec parameters.

Retrieving large quantities of data

Note that the query service is limited to a maximum of 10.000 rercords to retrieve with one query. For larger quantities, we offer a download service which returns the data as a gzipped JSON stream. For example, retrieving the entire botany collection, the /specimen/download ,
service can be used.

Data access

Several access methods offer the convenient retrieval of specimens matching a certain identifier or being part of a certain collection. The services /specimen/findByUnitID/ , /specimen/find/ and /specimen/findByIds/ retrieve specimens according to their unitID or id fields (see here). The service /specimen/getNamedCollections/ returns all available collections of species (e.g. Mammalia) and /specimen/getIdsInCollection/ returns all identifiers of specimen that are part of that collection.

Aggregation

A count aggregation of query results (using query parameters or a QuerySpec object) can be done using the /specimen/count/ endpoint.

For a specific field, /specimen/getDistinctValues/ returns all different values that there is in the data or that field.

Nested aggreation over two fields can be done with /specimen/getDistinctValuesPerGroup/

/specimen/countDistinctValues/ and /specimen/countDistinctValuesPerGroup/ do the same as the above, but return only the counts instead of any data.

For more information and examples on aggregation queries, please also refer to the advanced queries section.

DwC-A download

Download services offer bulk retrieval of specimen occurrence data. Instead of JSON format, download services return zip files containing the data. The zip files are formatted according to the Darwin Core archive standard for the exchange of biodiversity data (also see below). While collection download services offer pre-compiled datasets, dynamic download services produce Darwin Core archives for the results of any query for taxon or specimen data types.

Specimen collection DwC-A download

The endpoint for specimen collection downloads is /specimen/dwca/getDataSet/ with the name of a specific dataset. The names of predefined datasets can be retrieved with the endpoint /specimen/dwca/getDataSetNames/ . A dataset, for instance the tunicata can then be downloaded as follows:

https://api.biodiversitydata.nl/v2/specimen/dwca/getDataSet/tunicata

Dynamic DwC-A download

Dynamic download queries follow the same syntax as regular queries with the query endpoint. Suppose we have a simple query for specimen of the genus crocus:

https://api.biodiversitydata.nl/v2/specimen/query/?identifications.defaultClassification.genus=Crocus

Simply adding the path dwca in front of query will return the zipped archive:

https://api.biodiversitydata.nl/v2/specimen/dwca/query/?identifications.defaultClassification.genus=Crocus

Darwin Core archives

Following files are contained in the zip archives generated by the download services:

  • A core data file in csv format named Occurrence.txt. This file contains a tabular representation of the data with the first row defining the column names.
  • A descriptor file named meta.xml which maps the columns in the core data file to their respective TDWG term. Each column in the data is thus mapped to a specific concept termed by the TDWG consortium.
  • A metadata file named eml.xml formatted according to the Electronic Metadata Language specification EML. Metadata in this file includes a description of the dataset and details about the source institution.

Metadata

Specimen metadata services provide miscellaneous information about specimen records. This includes detailed information about a specimen’s fields and paths . A description of all specimen metadata services can be found here .

Identifiers

Specimen records have several identifiers. The field untitID is the identifier from the specific source system. Since uniqueness across source systems is not ensured, the field id, consisting of {unitID}@{sourceSystem.code}. Further, the field unitGUID represents the a permanent uniform web location (PURL, see also PURL services).

Collection types

All specimens are categorised into different subcollections (e.g. mammals, aves, petrology or paleobotany, birdsounds, observations, …). The following query retrieves the names of all available collections and their specimen counts.

https://api.biodiversitydata.nl/v2/specimen/getDistinctValues/collectionType

Gathering events

The gatheringEvent field of a specimen holds all relevant information about the process of obtaining the specimen. This includes finder, date, exact location, and information about the estimated specimen age (biostratigraphy/litostratigraphy). A use-case could be for instance the retrieval of all specimens that were collected between 1750 and 1800 by P. Miller:

{
  "conditions": [
    {
      "field": "gatheringEvent.dateTimeBegin",
      "operator": "BETWEEN",
      "value": [
        "1750",
        "1800"
      ]
    },
    {
      "field": "gatheringEvent.gatheringPersons.fullName",
      "operator": "EQUALS",
      "value": "Miller, P"
    }
  ],
  "sortFields": [
    {
      "path": "gatheringEvent.dateTimeBegin",
      "sortOrder": "ASC"
    }
  ]
}

As a second example, we query for all specimen that are classified within the family Passifloraceae and that have lat-long coordinates (fields gatheringEvent.siteCoordinates.latitudeDecimal, gatheringEvent.siteCoordinates.longitudeDecimal):

{
  "conditions": [
    {
      "field": "identifications.defaultClassification.family",
      "operator": "EQUALS",
      "value": "Passifloraceae"
    },
    {
      "field": "gatheringEvent.siteCoordinates.longitudeDecimal",
      "operator": "NOT_EQUALS",
      "value": null
    },
    {
      "field": "gatheringEvent.siteCoordinates.latitudeDecimal",
      "operator": "NOT_EQUALS",
      "value": null
    }
  ]
}

Identifications

A crucial part of information about a biological specimen is its identification, i.e. the assignment to an existing taxonomic classification. The identifications field of a specimen can contain one or more species identifications. Multiple identifications are possible if for instance a specimen has been re-identified e.g. using a new identification key or DNA barcoding. Also concretions containing multiple fossils species will have multiple identifications. To indicate that one identification is more reliable than the others, one identification of a specimen can have the identification.preferred flag set to true. Identifications usually store taxonomic rank, the scientific (identifications.scientificName) name and higher-level classifications (identifications.defaultClassification) of the specimen. Also the person who identified the specimen, date and references to scientific publications, type status and vernacular (common) taxon names are stored in the identifications block.

Furthermore, the

https://api.biodiversitydata.nl/v2/specimen/metadata/queryWithNameResolution

service allows for searching for specimens by names other than the assigned taxonomic classification(s). The service accepts queries with a nameResolutionRequest clause, which triggers a sub-query for synonyms and/or vernacular names in the Catalogue of Life and the Dutch Species Register. From the resulting records, scientific names are extracted, which are subsequently used for the main specimen search. Example: When we’re not sure of the accepted scientific name of the European badger, we could search directly for specimen that contain ‘badger’ as part of the specimen record, using

https://api.biodiversitydata.nl/v2/specimen/metadata/query

:

{
  "conditions" : [
    { "field" : "identifications.vernacularNames.name", "operator" : "MATCHES", "value" : "badger" }
  ],
  "size": 1000
}

However, as the vernacularNames field is not mandatory, the query returns only a small number of records. If we, however, employ the name resolution request, using

https://api.biodiversitydata.nl/v2/specimen/metadata/queryWithNameResolution

:

{
  "conditions": [],
  "nameResolutionRequest" : 
    {
      "searchString" : "badger",
      "nameTypes" : [ "VERNACULAR_NAME" ],
      "matchWholeWords" : true,
      "useCoL" : false,
      "size" : 100
   },
  "size": 1000
}

We will get a complete set of badger-specimens in the results.

When useCoL is set to true, the service will find scientific names by matching the search string against the Catalogue of Life name usage API (https://api.catalogue.life/nameusage), rather than by querying the Dutch Species Register and a local copy of the Catalogue of Life. As the external service searches more recent data, this might improve results. But also be aware that when useCoL is set to true, your query is partly dependent on an external service the availability of which we have no direct control over.

Please note that the nameResolutionRequest currently only supports MATCHES and CONTAINS (matchWholeWords : true or false), which means that you may get a wide array of matching names in your results.

To gain insight in the inner workings of nameResolutionRequest and see the results of the sub-query, you can run your query against

https://api.biodiversitydata.nl/v2/specimen/metadata/explainNameResolution

. The output provides insight into the intermediate results, as well as the precise effect of the different query parameters.

Multimedia

A specimen record can link to one or more multimedia items. Multimedia information is stored in the multimedia data type. Additionally, links to the associated multimedia records are also stored within specimens, in the fields associatedMultiMediaUris.accessUri and associatedMultiMediaUris.format.