2. Release Notes¶
2.1. Version 3.0.0 (Upcoming)¶
Deprecated the
default_value
key for datasets. This key is still supported for attributes.Deprecated
linkable
key for groups and datasets.First release as hdmf-schema-language.
Removed legacy description of the
specs
orspec
key.Added specification for the specification language used by each file.
Added dtypes that are already supported in
hdmf.spec
: short, uint64, bytes, and datetime.Clarified that if
name
is defined on a group/dataset/link specification,quantity
may not be greater than 1.Updated
datetime
specification to allow a date with no time or timezone.Changed the meaning of the default shape
shape: null
from representing a scalar to representing any shape.Added special value for
shape: scalar
that represents a scalar.
2.2. Version 2.0.2 (March, 2020)¶
add
value
anddefault_value
as optional keys of a dataset.dtype
changed from required to optional for datasets.
2.3. Version 2.0.1 (March, 2019)¶
Added support for specifying a
title
anddoc
forsource
files as part of theschema
portion of anamespace
specification. This was added to improve documentation of individual source files and to support sorting of types by source file with meaningful titles and text as part of autogenerated docs.Updated the docs for
quantity
to indicate that the default value is1
if not specified.
2.4. Version 2.0.0 (January, 2019)¶
2.4.1. Summary¶
- Simplify reuse of data_types:
Added new key:
`data_type_def and ```data_type_inc`
(which in combination replace the keys`data_type`
,`include`
and`merge`
). See below for details.Removed key:
`include`
Removed key:
`merge`
Removed key:
`merge+`
Removed key:
`data_type`
(replaced bydata_type_inc
anddata_type_def
)Removed
`\_properties`
key. The primary use of the key is to defineabstract
specifications. However, as format specifications don’t implement functions but define a layout of objects, any spec (even if marked abstract) could still be instantiated and used in practice without limitations. Also, in the current instantiation of NWB:N this concept is only used for the`Interface`
type and it is unclear why a user should not be able to use it. As such this concept was removed.To improve compliance of NWB:N inheritance mechanism with established object-oriented design concepts, the option of restricting the use of subclasses in place of parent classes was removed. A subclass is always also a valid instance of a parent class. This also improves consistency with the NWB:N principle of a minimal specification that allows users to add custom data. This change affects the
`allow_subclasses`
key of links and the subclasses option of the removed`include
key.
- Improve readability and avoid collision of keys by replacing values encoded in keys with dedicated key/value pairs:
- Explicit encoding of names and types:
Added
`name`
keyRemoved <…> name identifier (replaced by empty
`name`
key)Added
`groups`
key (previously groups were indicated by “/” as part of object’s key)Added
`datasets`
key (previously datasets were indicated by missing “/” as part of the object’s key)Added
`links`
key (previously this was a key on the group and dataset specification). The concept of links is with this now a first-class type (rather than being part of the group and dataset specs).Removed
link
key on datasets as this functionality is now fully implemented by thelinks
key on groups.Removed / flag in keys to identify groups (replaced by
`groups`
and`datasets`
keys)
- Explicit encoding of quantitites:
Added new key
`quantity`
(which replaces the`quantity_flag`
). See below for details.Removed
`quantity_flag`
as part of keysRemoved Exclude_in` key. The key is currently not used in the NWB core spec. This feature is superseded by the ability to overwrite the
`quantity`
key as part of the reuse of`neurodata_types`
Removed
`\_description`
key. The key is no longer need because name conflicts with datasets and groups are no longer possible since the name is now explicitly encoded in a dedicated key/value pair.
- Improve human readability:
Added support for YAML in addition to JSON
Values, such as, names, types, quantities etc. are now explicitly encoded in dedicated key/value pairs rather than being encoded as regular expressions in keys.
- Improve direct interpretation of data:
Remove
`references`
key. This key was used in previous versions of NWB to generate implicit data structures where datasets store references to part of other metadata structures. These implicit data structures violate core NWB principles as they hinder the direct interpretation of data and cannot be interpreted (neither by human nor program) based on NWB files alone without having additional information about the specification as well. Through simple reorganization of metadata in the file, all instances of these implicit data structures were replaced by simple links that can be interpreted directly.
- Simplified specification of dimensions for datasets:
Renamed
`dimensions`
key to`dims`
Added key
`shape`
to allow the specification of the shape of datasets- Removed custom keys for defining structures as types for dimensions:
`unit`
keys from previous structured dimensions are now`unit`
attributes on the datasets (i.e., all values in a dataset have the same units)The length of the structs are used to define the length of the corresponding dimension as part of the
`shape`
key`alias`
for components of dimensions are currently encoded in the dimensions name.
- Added support for default vs. fixed name for groups and datasets:
Added
default_name
key for groups and dataset to allow the specification of default names for objects that can have user-defined names (in addition to fixed names vianame
). Attributes can only have a fixed name since attributes can not have a neurodata_type and can, hence, only be identified via their fixed name.
- Updated specification of fixed and default values for attributes to make the behavior of keys explicit:
- Specifying attribute values:
Added
default_value
key for attributes to specify a default value for attributesRemoved
const
key for attributes which was used to control the behavior of thevalue
key, i.e., depending on the value ofconst
thevalue
key would either act as a fixed or default value. By adding thedefault_value
key this behavior now becomes explicit and the behavior of thevalue
key no longer depends on the value of another key (i.e., theconst
key)
- Improved governance and reuse of specifications:
The core specification documents are no longer stored as .py files as part of the original Python API but are released as separate YAML (or optionally JSON) documents in a seperate repository
All documentation has been ported to use reStructuredText (RST) markup that can be easily translated to PDF, HTML, text, and many other forms.
Documentation for source codes and the specification are auto-generated from source to ensure consistency between sources and the documentation
- Avoid mixing of format specification and computations:
Removed key
`autogen`
(without replacement). The autogen key was used to describe how to compute certain derived datasets from the file. This feature was problematic with respect to the guiding principles of NWB for a couple of reasons. E.g., the resulting datasets were often not interpretable without the provenance of the autogeneration procedure and autogeneration itself and often described the generation of derived data structures to ease follow-on computations. Describing computations as part of a format specification is problematic as it creates strong dependencies and often unnecessary restrictions for use and analysis of data stored in the format. Also, the reorganization of metadata has eliminated the need for autogen in many cases. A autogen features is arguably the role of a data API or intermediary derived-quantity API (or specification), rather than a format specification.
- Enhanced specification of data types via
dtype
: Enhanced the syntax for
dtype
to allow the specification of flat compound data types via lists of typesEnhanced the syntax for
dtype
to allow the specification of i) object references and ii) region referencesRemoved “!” syntax (e.g., “float32!”) previously used to specify a minimum precision. All types are interpreted as minimum specs.
Specified list of available data types and their names
Added
isodatetime
dtype for specification of ISO8061 datetime string (e.g.,2018-09-28T14:43:54.123+02:00
) as data typeAdded
bool
dtype for specification fo boolean type fields (see PR691 (PyNWB) and I658 (PyNWB).
- Enhanced specification of data types via
- Others:
Removed key
`\_\_custom`
(without replacement). This feature was used only in one location to provide user hints where custom data could be placed, however, since the NWB specification approach explicitly allows users to add custom data in any location, this information was not binding.
2.4.2. Currently unsupported features:¶
`_required`
: The current API does not yet support specification and verification of constraints previously expressed via_required
.Relationships are currently available only through implicit concepts, i.e., by sharing dimension names and through implicit references as part of datasets. The goal is to provide explicit mechanisms for describing these as well as more advanced relationships.
`dimensions_specification`
: This will be implemented in later version likely through the use of relationships.
2.4.3. YAML support¶
To improve human readability of the specification language, Version 1.2a now allows specifications to be defined in YAML as well as JSON (Version 1.1c allowed only JSON).
2.4.4. `quantity`
¶
Version 1.1c of the specification language used a `quantity_flag`
as part of the name key of groups and datasets to the quantity
! - Required (this is the default)
?- Optional
^ - Recommended
+ - One or more instances of variable-named identifier required
* - Zero or more instances of variable-named identifier allowed
Version 1.2a replaces the `quantity_flag`
with a new key `quantity`
with the following values:
value |
required |
number of instances |
---|---|---|
|
optional |
unlimited |
|
required |
unlimited but at least 1 |
|
optional |
0 or 1 |
|
required |
Fixed number of instances as indicated by the value |
2.4.5. `merge`
and `include`
¶
To simplify the concept `include`
and `merge`
, version 1.2a introduced a new
key `neurodata_type_def`
which describes the creation of a new neurodata_type.
The combination `neurodata_type_def`
and `neurodata_type_inc
simplifies the concepts of merge (i.e., inheritance/extension) and inclusion and
allows us to express the same concepts in an easier-to-use fashion.
Accordingly, the keys `include`
, `merge`
and `merge+`
have been removed in version 1.2a.
Here a summary of the basic cases:
neurodata_type_inc |
neurodata_type_def |
Description |
---|---|---|
not set |
not set |
define standard dataset or group without a type |
not set |
set |
create a new neurodata_type from scratch |
set |
not set |
include (reuse) neurodata_type without creating a new one (include) |
set |
set |
merge/extend neurodata_type and create a new type (merge) |
2.4.6. `structured_dimensions`
¶
The definition of structured dimensions has been removed in version 1.2a. The concept of structs as dimensions is
problematic for several reasons: 1) it implies support for defining general tables with mixed units and data types
which are currently not supported, 2) they easily allow for colliding specification where mixed units are assigned
to the same value, 3) they are hard to use and unsupported by HDF5. Currently structured dimensions, however, have
been used only to encode information about “columns” of a dataset (e.g., to indicate that a dimension stores x,y,z
values). This information was translated to the dims`
and `shape`
keys and `unit`
attributes.
The more general concept of structured dimensions will be implemented in future versions of the specification language
and format likely via support for modeling of relationships or support for table data structures (stay tuned)
2.4.7. `autogen`
¶
The `autogen`
key has been removed without replacement.
Reason: The autogen specification was originally used to specify that the attribute or dataset contents (values) can be derived from the contents of the HDF5 file and, hence, generated and validated automatically. As such, autogen crossed a broad range of different functionalities, including:
Specification of the structure of format datasets/attributes
Description of data constraints (e.g., the shape of the generated dataset directly depends on the structure of the input data consumed by autogen),
Specification of the content (i.e., value) of datasets and attributes,
Description of computations to create derived data, and
Validation of the structure and content of datasets/attributes.
This mixing of functionality in turn led to several concerns:
autogen exhibited a fairly complex syntax, which made it hard to interpret and use
autogen is specifically used to create derived data from information that is already in the NWB file. Attributes/datasets generated via autogen: i) are redundant, ii) often require bookkeeping to ensure data consistency, iii) generate dependencies across data and types, iv) have limited utility as the information can be derived through other means, and v) interpretation of data values may require the provenance of autogen.
Description of computations as part of a format specification was seen as problematic.
There was potential for collisions between autogen and the specification of the dataset/attribute itself.
Usage in NWB autogen was used in NWB V.1.0.6 to generate 17 datasets/attributes primarily to: i) store the path of links in separate datasets/attributes or ii) generate lists of datasets/groups of a given type/property. The datasets were reviewed at a hackathon and determined to be non-essential and as such removed from the format as well.
2.5. Version 1.1c (Oct. 7, 2016)¶
Original version of the specification language generated as part of the NWB pilot project