2. Release Notes

2.1. Version 3.0.0 (Upcoming)

  • Deprecated the default_value key for datasets. This key is still supported for attributes.

  • Deprecated linkable key for groups and datasets.

  • First release as hdmf-schema-language.

  • Removed legacy description of the specs or spec key.

  • Added specification for the specification language used by each file.

  • Added dtypes that are already supported in hdmf.spec: short, uint64, bytes, and datetime.

  • Clarified that if name is defined on a group/dataset/link specification, quantity may not be greater than 1.

  • Updated datetime specification to allow a date with no time or timezone.

  • Changed the meaning of the default shape shape: null from representing a scalar to representing any shape.

  • Added special value for shape: scalar that represents a scalar.

2.2. Version 2.0.2 (March, 2020)

  • add value and default_value as optional keys of a dataset.

  • dtype changed from required to optional for datasets.

2.3. Version 2.0.1 (March, 2019)

  • Added support for specifying a title and doc for source files as part of the schema portion of a namespace specification. This was added to improve documentation of individual source files and to support sorting of types by source file with meaningful titles and text as part of autogenerated docs.

  • Updated the docs for quantity to indicate that the default value is 1 if not specified.

2.4. Version 2.0.0 (January, 2019)

2.4.1. Summary

  • Simplify reuse of data_types:
    • Added new key: `data_type_def and  ```data_type_inc` (which in combination replace the keys `data_type`, `include` and `merge`). See below for details.

    • Removed key: `include`

    • Removed key: `merge`

    • Removed key: `merge+`

    • Removed key: `data_type` (replaced by data_type_inc and data_type_def)

    • Removed `\_properties` key. The primary use of the key is to define abstract specifications. However, as format specifications don’t implement functions but define a layout of objects, any spec (even if marked abstract) could still be instantiated and used in practice without limitations. Also, in the current instantiation of NWB:N this concept is only used for the `Interface` type and it is unclear why a user should not be able to use it. As such this concept was removed.

    • To improve compliance of NWB:N inheritance mechanism with established object-oriented design concepts, the option of restricting the use of subclasses in place of parent classes was removed. A subclass is always also a valid instance of a parent class. This also improves consistency with the NWB:N principle of a minimal specification that allows users to add custom data. This change affects the `allow_subclasses` key of links and the subclasses option of the removed `include key.

  • Improve readability and avoid collision of keys by replacing values encoded in keys with dedicated key/value pairs:
    • Explicit encoding of names and types:
      • Added `name` key

      • Removed <…> name identifier (replaced by empty `name` key)

      • Added `groups` key (previously groups were indicated by “/” as part of object’s key)

      • Added `datasets` key (previously datasets were indicated by missing “/” as part of the object’s key)

      • Added `links` key (previously this was a key on the group and dataset specification). The concept of links is with this now a first-class type (rather than being part of the group and dataset specs).

      • Removed link key on datasets as this functionality is now fully implemented by the links key on groups.

      • Removed / flag in keys to identify groups (replaced by `groups` and `datasets` keys)

    • Explicit encoding of quantitites:
      • Added new key `quantity` (which replaces the `quantity_flag`). See below for details.

      • Removed `quantity_flag` as part of keys

      • Removed Exclude_in` key. The key is currently not used in the NWB core spec. This feature is superseded by the ability to overwrite the `quantity` key as part of the reuse of `neurodata_types`

    • Removed `\_description` key. The key is no longer need because name conflicts with datasets and groups are no longer possible since the name is now explicitly encoded in a dedicated key/value pair.

  • Improve human readability:
    • Added support for YAML in addition to JSON

    • Values, such as, names, types, quantities etc. are now explicitly encoded in dedicated key/value pairs rather than being encoded as regular expressions in keys.

  • Improve direct interpretation of data:
    • Remove `references` key. This key was used in previous versions of NWB to generate implicit data structures where datasets store references to part of other metadata structures. These implicit data structures violate core NWB principles as they hinder the direct interpretation of data and cannot be interpreted (neither by human nor program) based on NWB files alone without having additional information about the specification as well. Through simple reorganization of metadata in the file, all instances of these implicit data structures were replaced by simple links that can be interpreted directly.

  • Simplified specification of dimensions for datasets:
    • Renamed `dimensions` key to `dims`

    • Added key `shape` to allow the specification of the shape of datasets

    • Removed custom keys for defining structures as types for dimensions:
      • `unit` keys from previous structured dimensions are now `unit` attributes on the datasets (i.e., all values in a dataset have the same units)

      • The length of the structs are used to define the length of the corresponding dimension as part of the `shape` key

      • `alias` for components of dimensions are currently encoded in the dimensions name.

  • Added support for default vs. fixed name for groups and datasets:
    • Added default_name key for groups and dataset to allow the specification of default names for objects that can have user-defined names (in addition to fixed names via name). Attributes can only have a fixed name since attributes can not have a neurodata_type and can, hence, only be identified via their fixed name.

  • Updated specification of fixed and default values for attributes to make the behavior of keys explicit:
    • Specifying attribute values:
      • Added default_value key for attributes to specify a default value for attributes

      • Removed const key for attributes which was used to control the behavior of the value key, i.e., depending on the value of const the value key would either act as a fixed or default value. By adding the default_value key this behavior now becomes explicit and the behavior of the value key no longer depends on the value of another key (i.e., the const key)

  • Improved governance and reuse of specifications:
    • The core specification documents are no longer stored as .py files as part of the original Python API but are released as separate YAML (or optionally JSON) documents in a seperate repository

    • All documentation has been ported to use reStructuredText (RST) markup that can be easily translated to PDF, HTML, text, and many other forms.

    • Documentation for source codes and the specification are auto-generated from source to ensure consistency between sources and the documentation

  • Avoid mixing of format specification and computations:
    • Removed key `autogen` (without replacement). The autogen key was used to describe how to compute certain derived datasets from the file. This feature was problematic with respect to the guiding principles of NWB for a couple of reasons. E.g., the resulting datasets were often not interpretable without the provenance of the autogeneration procedure and autogeneration itself and often described the generation of derived data structures to ease follow-on computations. Describing computations as part of a format specification is problematic as it creates strong dependencies and often unnecessary restrictions for use and analysis of data stored in the format. Also, the reorganization of metadata has eliminated the need for autogen in many cases. A autogen features is arguably the role of a data API or intermediary derived-quantity API (or specification), rather than a format specification.

  • Enhanced specification of data types via dtype:
    • Enhanced the syntax for dtype to allow the specification of flat compound data types via lists of types

    • Enhanced the syntax for dtype to allow the specification of i) object references and ii) region references

    • Removed “!” syntax (e.g., “float32!”) previously used to specify a minimum precision. All types are interpreted as minimum specs.

    • Specified list of available data types and their names

    • Added isodatetime dtype for specification of ISO8061 datetime string (e.g., 2018-09-28T14:43:54.123+02:00) as data type

    • Added bool dtype for specification fo boolean type fields (see PR691 (PyNWB) and I658 (PyNWB).

  • Others:
    • Removed key `\_\_custom` (without replacement). This feature was used only in one location to provide user hints where custom data could be placed, however, since the NWB specification approach explicitly allows users to add custom data in any location, this information was not binding.

2.4.2. Currently unsupported features:

  • `_required` : The current API does not yet support specification and verification of constraints previously expressed via _required.

  • Relationships are currently available only through implicit concepts, i.e., by sharing dimension names and through implicit references as part of datasets. The goal is to provide explicit mechanisms for describing these as well as more advanced relationships.

  • `dimensions_specification`: This will be implemented in later version likely through the use of relationships.

2.4.3. YAML support

To improve human readability of the specification language, Version 1.2a now allows specifications to be defined in YAML as well as JSON (Version 1.1c allowed only JSON).

2.4.4. `quantity`

Version 1.1c of the specification language used a `quantity_flag` as part of the name key of groups and datasets to the quantity

  • ! - Required (this is the default)

  • ?- Optional

  • ^ - Recommended

  • + - One or more instances of variable-named identifier required

  • * - Zero or more instances of variable-named identifier allowed

Version 1.2a replaces the `quantity_flag` with a new key `quantity` with the following values:

value

required

number of instances

`zero_or_more` or `*`

optional

unlimited

`one_or_more` or `+`

required

unlimited but at least 1

`zero_or_one` or `?`

optional

0 or 1

`1`, `2`, `3`, …

required

Fixed number of instances as indicated by the value

2.4.5. `merge` and `include`

To simplify the concept `include` and `merge`, version 1.2a introduced a new key `neurodata_type_def` which describes the creation of a new neurodata_type. The combination `neurodata_type_def` and `neurodata_type_inc simplifies the concepts of merge (i.e., inheritance/extension) and inclusion and allows us to express the same concepts in an easier-to-use fashion. Accordingly, the keys `include`, `merge` and `merge+` have been removed in version 1.2a. Here a summary of the basic cases:

neurodata_type_inc

neurodata_type_def

Description

not set

not set

define standard dataset or group without a type

not set

set

create a new neurodata_type from scratch

set

not set

include (reuse) neurodata_type without creating a new one (include)

set

set

merge/extend neurodata_type and create a new type (merge)

2.4.6. `structured_dimensions`

The definition of structured dimensions has been removed in version 1.2a. The concept of structs as dimensions is problematic for several reasons: 1) it implies support for defining general tables with mixed units and data types which are currently not supported, 2) they easily allow for colliding specification where mixed units are assigned to the same value, 3) they are hard to use and unsupported by HDF5. Currently structured dimensions, however, have been used only to encode information about “columns” of a dataset (e.g., to indicate that a dimension stores x,y,z values). This information was translated to the dims` and `shape` keys and `unit` attributes. The more general concept of structured dimensions will be implemented in future versions of the specification language and format likely via support for modeling of relationships or support for table data structures (stay tuned)

2.4.7. `autogen`

The `autogen` key has been removed without replacement.

Reason: The autogen specification was originally used to specify that the attribute or dataset contents (values) can be derived from the contents of the HDF5 file and, hence, generated and validated automatically. As such, autogen crossed a broad range of different functionalities, including:

  1. Specification of the structure of format datasets/attributes

  2. Description of data constraints (e.g., the shape of the generated dataset directly depends on the structure of the input data consumed by autogen),

  3. Specification of the content (i.e., value) of datasets and attributes,

  4. Description of computations to create derived data, and

  5. Validation of the structure and content of datasets/attributes.

This mixing of functionality in turn led to several concerns:

  • autogen exhibited a fairly complex syntax, which made it hard to interpret and use

  • autogen is specifically used to create derived data from information that is already in the NWB file. Attributes/datasets generated via autogen: i) are redundant, ii) often require bookkeeping to ensure data consistency, iii) generate dependencies across data and types, iv) have limited utility as the information can be derived through other means, and v) interpretation of data values may require the provenance of autogen.

  • Description of computations as part of a format specification was seen as problematic.

  • There was potential for collisions between autogen and the specification of the dataset/attribute itself.

Usage in NWB autogen was used in NWB V.1.0.6 to generate 17 datasets/attributes primarily to: i) store the path of links in separate datasets/attributes or ii) generate lists of datasets/groups of a given type/property. The datasets were reviewed at a hackathon and determined to be non-essential and as such removed from the format as well.

2.5. Version 1.1c (Oct. 7, 2016)

  • Original version of the specification language generated as part of the NWB pilot project