Specimen Occurrence Services
Specimen records constitute the core of data served by the NBA.
Museum objects and observations can represent a whole variety of different entities
such as plants, animals or single parts thereof, DNA samples, fossils, rocks or meteorites.
Also species occurrences are described as a specimen in our data.
The specimens in our document store are therefore described with an extensive data model.
All components and data types in the Specimen model,
as well as a comprehensive list of all specimen-related endpoints
are documented in the API endpoint reference.
Below, the major components of specimen records are introduced
and examples are given on how to query them.
A list of available fields can be found at:
https://api.biodiversitydata.nl/v2/specimen/metadata/getFieldInfo
Base URL
The base URL for specimen-specific services is:
https://api.biodiversitydata.nl/v2/specimen
Data Source Systems
Specimen occurrence data are harvested from three main data sources:
- the CRS (Collection Registration System for zoological and geological specimens);
- BRAHMS for botanical specimens, including fungi; and
- the Xeno-canto database of wildlife sounds.
This information is stored in the path sourceSystem.code
. The query:
https://api.biodiversitydata.nl/v2/specimen/query/?sourceSystem.code=BRAHMS
will return all botanical and fungi specimens from the Brahms collection.
Available Services
Query
Querying for specimens can be done using the endpoint:
which accepts human-readable query strings and JSON encoded QuerySpec parameters.
Retrieving large quantities of data
Note that the query service is limited to a maximum of 10.000 records to retrieve with one query. For larger quantities, we offer two services that can return the complete set of records matching your query: a download service, and a BatchQuery service.
Download Service
The download service returns records as a gzipped JSON stream. For example, retrieving the entire botany collection, the:
service can be used with the query:
{
"conditions" : [
{ "field" : "collectionType", "operator" : "=", "value" : "Botany" }
]
}
BatchQuery Service
To retrieve results in batches, use the BatchQuery service. This service returns the number of records indicated by the size parameter, as well as a token. This token can be used to retrieve subsequent batches, until a token is no longer present in the returned result, in which case you have retrieved the last available batch. For example, retrieving the entire botany collection, the
service can be used with the initial query:
{
"conditions" : [
{ "field" : "collectionType", "operator" : "=", "value" : "Botany" }
],
"size" : 1000
}
The results will include a token, which can be used in subsequent requests using the parameter _token
instead of the _querySpec
parameter.
Data Access
Several access methods offer the convenient retrieval of specimens matching a certain identifier or being part of a certain collection. The services:
retrieve specimens according to their unitID
or id
fields (see here). The service:
/specimen/getNamedCollections/
returns all available collections of species (e.g. Mammalia) and
returns all identifiers of specimen that are part of that collection.
Aggregation
A count aggregation of query results (using query parameters or a QuerySpec object) can be done using the:
/specimen/count/ endpoint.
For a specific field, /specimen/getDistinctValues/ returns all different values that there is in the data or that field.
Nested aggreation over two fields can be done with /specimen/getDistinctValuesPerGroup/
/specimen/countDistinctValues/ and /specimen/countDistinctValuesPerGroup/ do the same as the above, but return only the counts instead of any data.
For more information and examples on aggregation queries, please also refer to the advanced queries section.
DwC-A download
Download services offer bulk retrieval of specimen occurrence data. Instead of JSON format, download services return zip files containing the data. The zip files are formatted according to the Darwin Core Archive Standard for the exchange of biodiversity data (also see below). While collection download services offer pre-compiled datasets, dynamic download services produce Darwin Core Archives for the results of any query for taxon or specimen data types.
Specimen collection DwC-A download
The endpoint for specimen collection downloads is /specimen/dwca/getDataSet/ with the name of a specific dataset. The names of predefined datasets can be retrieved with the endpoint /specimen/dwca/getDataSetNames/.
A dataset, for instance the tunicata can then be downloaded as follows:
https://api.biodiversitydata.nl/v2/specimen/dwca/getDataSet/tunicata
Dynamic DwC-A download
Dynamic download queries follow the same syntax as regular queries with the query endpoint. Suppose we have a simple query for specimen of the genus crocus:
Simply adding the path dwca in front of query will return the zipped archive:
Darwin Core archives
Following files are contained in the zip archives generated by the download services:
-
A core data file in csv format named Occurrence.txt. This file contains a tabular representation of the data with the first row defining the column names.
-
A descriptor file named meta.xml which maps the columns in the core data file to their respective TDWG term. Each column in the data is thus mapped to a specific concept termed by the TDWG consortium.
-
A metadata file named
eml.xml
formatted according to the Electronic Metadata Language specification EML. Metadata in this file includes a description of the dataset and details about the source institution.
Metadata
Specimen metadata services provide miscellaneous information about specimen records. This includes detailed information about a specimen's fields and paths. A description of all specimen metadata services can be found here.
Identifiers
Specimen records have several identifiers. The field untitID
is the
identifier from the specific source system. Since uniqueness across
source systems is not ensured, the field id, consisting of
{unitID}@{sourceSystem.code}
. Further, the field unitGUID
represents the a permanent uniform web location (PURL, see also
PURL services).
Collection Types
All specimens are categorised into different subcollections (e.g. mammals, aves, petrology or paleobotany, wildlife sounds, ...). The following query retrieves the names of all available collections and their specimen counts.
https://api.biodiversitydata.nl/v2/specimen/getDistinctValues/collectionType
Gathering Events
The gatheringEvent field of a specimen holds all relevant information about the process of obtaining the specimen. This includes finder, date, exact location, and information about the estimated specimen age (biostratigraphy/litostratigraphy). A use-case could be for instance the retrieval of all specimens that were collected between 1750 and 1800 by P. Miller:
{
"conditions": [
{
"field": "gatheringEvent.dateTimeBegin",
"operator": "BETWEEN",
"value": [
"1750",
"1800"
]
},
{
"field": "gatheringEvent.gatheringPersons.fullName",
"operator": "EQUALS",
"value": "Miller, P"
}
],
"sortFields": [
{
"path": "gatheringEvent.dateTimeBegin",
"sortOrder": "ASC"
}
]
}
As a second example, we query for all specimen that are classified
within the family Passifloraceae and that have lat-long coordinates
(fields gatheringEvent.siteCoordinates.latitudeDecimal
,
gatheringEvent.siteCoordinates.longitudeDecimal
):
{
"conditions": [
{
"field": "identifications.defaultClassification.family",
"operator": "EQUALS",
"value": "Passifloraceae"
},
{
"field": "gatheringEvent.siteCoordinates.longitudeDecimal",
"operator": "NOT_EQUALS",
"value": null
},
{
"field": "gatheringEvent.siteCoordinates.latitudeDecimal",
"operator": "NOT_EQUALS",
"value": null
}
]
}
Identifications
A crucial part of information about a biological specimen is its identification, i.e. the assignment to an existing taxonomic classification. The identifications field of a specimen can contain one or more species identifications. Multiple identifications are possible if for instance a specimen has been re-identified because of a new identification key or genetic analysis. Also concretions containing multiple fossils species will have multiple identifications. To indicate that one identification is more reliable than the others, one identification of a specimen can have the identification.preferred
flag set to true
. Identifications usually store taxonomic rank, scientific name (identifications.scientificName
) and higher-level classifications (identifications.defaultClassification
) of the
specimen. Also the person who identified the specimen, date and references to scientific publications, type status and vernacular (common) taxon names are stored in the identifications block.
Name resolution
The queryWithNameResolution
service:
https://api.biodiversitydata.nl/v2/specimen/metadata/queryWithNameResolution
allows for searching specimens by names other than the assigned taxonomic classification(s). This service accepts queries with a nameResolutionRequest
clause, which triggers a sub-query for synonyms and/or vernacular names in the Catalogue of Life and the Dutch Species Register. From the resulting records, scientific names are extracted, which are subsequently used for the main specimen search.
Consider the following example. The common vampire bat was previously classified as Phyllostoma rotundum but now the accepted scientific name is Desmodus rotondus. If we search for the previous name:
{
"conditions": [
{
"field": "identifications.scientificName.fullScientificName",
"operator": "MATCHES",
"value": "Phyllostoma rotundum"
}
],
"fields": [
"identifications.scientificName.fullScientificName"
],
"size": 1000
}
we won’t find any specimens of the vampire bat. If, however, we use name resultion (https://api.biodiversitydata.nl/v2/specimen/metadata/queryWithNameResolution) and apply the following nameResolutionRequest
clause, the search will include synonyms into the search as well:
{
"conditions": [ ],
"nameResolutionRequest": {
"searchString": "Phyllostoma rotundum",
"nameTypes": [
"ACCEPTED_NAME",
"SYNONYM"
],
"searchType": "STARTS_WITH",
"useCoL": false,
"fuzzyMatching": false,
"from": 0,
"size": 100
}
}
Now, we will find multiple vampire bat specimens in the result.
Note, that by setting the parameter useCoL
to true, the service tries to look up the name specified by searchString
using an external service of the Catalogue of Life. Currently, this only works for the name types ACCEPTED_NAME
and SYNONYM
. Trying to resolve a name of type VERNACULAR_NAME
will cause an error when useCoL
is set to true. However, when useCoL
is false, the service will access the internal taxon index to look up the specified name, which supports all three name types, including VERNACULAR_NAME
.
Multimedia
A specimen record can link to one or more multimedia items. Multimedia information is stored in the multimedia data type. Additionally, links to the associated multimedia records are also stored within specimens, in the fields associatedMultiMediaUris.accessUri
and associatedMultiMediaUris.format
.