NeXus Proposal

NeXus
A Proposal for a Common Data Format
for
Neutron and X-Ray Scattering Instruments

Mark Könnecke
Labor für Neutronenstreuung
Paul Scherrer Institut
CH-5232 Villigen-PSI
Switzerland

Przemek Klosowski
Nicholas C. Maliszewskyj
U. of Maryland, NIST
NIST, Bldg. 235
Gaithersburg MD 20899, USA

Jonathan Tischler
Oak Ridge National Laboratory
Bldg 3025, MS6030
Oak Ridge, TN 37831-6030, USA

Ray Osborn
Material Science Division
Argonne National Laboratory
Argonne, IL 60439-4845, USA

Freddy Akeroyd
ISIS-Facility
Rutherford Appleton Laboratory
Chilton, Didcot OX11 0QX
United Kingdom

1 Introduction
2 Requirements for a common dataformat
3 NeXus overview
4 The physical file format
5 File structure
    5.1 Entry vGroups
        5.1.1 The Instrument vGroup
        5.1.2 The Sample vGroup
    5.2 The Data vGroup
6 Rules for the storage of individual data items
7 The NeXus Glossary
8 The NeXus API
9 Conclusion and outlook
10 References

1. Introduction

Traditionally, any large neutron or x-ray scattering facility has its own home grown data format. This is fine for people working there with the adapted set of programs provided. Users, however, rarely work at only one facility but perform complementary experiments at other sites. Furthermore, there is a shortage of funding for neutron scattering related software programming. Thus scientists are forced to use more third party software or general purpose tools. Experimental data needs to be exchanged with scientists working in related fields, too. In any of these cases users are confronted with the necessity to write and use data conversion programs. These are usually not very complex, but the problem is the sheer number of utilities necessary. Clearly a common data exchange format would simplify this situation and enhance software reuse and exchange. Further problems arise from the increasingly inhomogenous computing environment and the fact that modern position sensitive detectors have increased the amount of data collected to a degree where efficient means of data handling become a prime requirement.

In order to solve this problem an international group of scientists developed a proposal for a new data exchange format in a series of conferences (SoftNess 94, 95, 96) from three individually developed proposals from Jonathan Tischler¹, Przemek Klosowski² and Mark Könnecke³. This proposal consists of a physical file format, a structure for the content of the file and guidelines for storing individual data items. All this has been named NeXus.

2. Requirements for a common dataformat

A common data format has to adress the following issues:

It must be portable across all of the computing platforms in common usage.
An implementation should be readily available in Fortran77 and ANSI-C on a variety of computing platforms.
The format must be flexible enough to account for the plethora of different instrument types.
It should be possible to add data later on and extend the standard.
The large amounts of data gathered at some sites call for an efficient format.
A means of structuring data would help a lot.
The standard should allow for automatic data analysis tools.

Furthermore all information necessary for evaluating an experiment should be in one file rather than split across several files and notebooks.

3. NeXus overview

The NeXus proposal consists basically of the following levels:

A physical file format. This will be HDF.
A file organisation.
Guidelines for the storage of individual data items in the file.
A glosarry of variable names.
An application programmers interface (API) to NeXus.

4. The physical file format

After reviewing several scientific data file formats existing (CDF, netCDF, CIF, ISO-STEP/EXPRESS, HDF) the hierarchical data format⁴ (HDF) developed by the National Institute for Supercomputer Applications(NCSA), USA, was choosen because it provided the best means of data structuring. HDF is a binary format portable across different computer and operating system platforms. HDF is accessed through a library of API functions which take care of platform specific details such as byte ordering and of file consistency and conformance to the HDF standard. Implementations of the HDF library are readily available for Fortran 77, ANSI-C and C++ for all major computing platforms both in binary form and as source code. The HDF-library supports various means of lossless data compression transparently. HDF is being actively supported and developed by NCSA. Many major commercial data analysis packages have interfaces to HDF. HDF is self describing. This means an unknown HDF file can be opened and information about the data items in the file can be collected without any prior knowledge of the file structure. New data items can be easily added to an HDF file or data items can be modified. HDF supports several different data models: tables, images, scientific data sets (SDS), annotations and vGroups. NeXus uses only the SDS and vGroup data models from the HDF standard.

An SDS is a scientific data set. This is a n-dimensional array of numbers in one of the supported number types. All common number types are supported. An SDS can have meta data associated with it. These meta data are called attributes. This facility can be used to define unit types, define the meaning of the axis of the n-dimensional array and the like. Attributes follow a name value pair pattern. Permissable values are not restricted to text but can also be numbers or arrays.

Attributes need not be associated with SDS's. There is also the concept of global attributes which are global for the whole file.

vGroups are the means for structuring data in an HDF file. A vGroup is a container for other HDF data items. vGroups can contain other vGroups, SDS's or any of the other HDF data types. By nesting vGroups a data hierarchy can be constructed which can be navigated much like a UNIX, DOS or VMS directory structure. This feature is used by NeXus. The first step in using vGroups is to create both the vGroup and the data item. Then the data item is linked into the vGroup. Please note, that a data item or vGroup can be linked into more than one vGroup without the actual data item being replicated.

vGroups are identified by their name. But each vGroup may also have a text string defining a class name with it. NeXus uses this feature in the following way: users are free to choose meaningful vGroup names according to their instrument's requirements. But users have to specify class names for each vGroup as well. These class names are part of the NeXus standard and are used to define the expected content of the vGroup. All these class names begin with NX. If a user defines her own class names she must not use the NX prefix in order to avoid confusion with NeXus.

5. File structure

At the top level of a NeXus conforming HDF file there are several global attributes and 1-n entry vGroups. The following global attributes need to be defined:

file_name: the original name of the file.
file_time: creation time of the file.
NeXus_version: version of NeXus used.
owner: name of the file owner.
owner_address: postal address of the owner.
owner_telephone_number: telephone number of owner.
owner_fax_number: fax number of owner.
owner_email: e-mail address of the owner.

Than there are the entry vGroups. Each entry vGroup will contain all data for one scan. More than one entry vGroup is permitted. This allows to store whole series of related scans in one file. Any entry vGroup may have any descriptive name (for example NaCL-277K) but MUST be of class NXentry.

5.1 Entry vGroups

Each entry vGroup is subdivided into at least three further vGroups of class NXinstrument, NXsample and NXdata along with some general data items. These general data items will include "title", "start_time" and ënd_time".

5.1.1 The Instrument vGroup

The instrument vGroup (of class NXinstrument) will hold all the instrument specific information. In order to subdivide the instrument vGroup further the instrument is split into its single components from the source to the detectors. Each component represents a building block. Each building block will be represented by vGroup at the instrument level. For example a monochromator could be represented by a vGroup with the name "Ge-Monochromator" and class NXcrystal. Each of these component vGroups will hold all the data associated with this component including data varied through the course of the experiment and counts detected in possible detectors. For some instruments, notably time-of-flight instruments, the distances between components matters. Thus each building block vGroup will hold a SDS "distance". This distance is measured with respect to the sample position (The sample is at 0.0). Negative values indicate an object which is upstream (towards the source) from the sample.

At this level there also exists a special vGroup of class NXbeam. This is a convenience vGroup which should hold links to all the data items which characterise the neutron at the sample, e.g. wavelength, polarisation or flux. Such a vGroup may have the name ïncident_beam".

Monitors are treated as detectors which have their own NXdetector vGroup within the NXinstrument section. Scanned monitor values will be represented by their own NXdata vGroups at the toplevel of each entry.

5.1.2 The Sample vGroup

This vGroup will include all details about the sample, e.g. sample_name, lattice_constants, temperature etc.

5.2 The Data vGroup

The data vGroup is a convenience feature which caters for the implementation of automatic plotting programs. The idea is that each detector is not only represented in the NXinstrument group but also by a data vGroup at entry level. This data vGroup will hold links to all information necessary to create a default plot of the data measured. Automated data plotting is not yet in common usage in the neutron scattering community but is an important requirement in the synchrotron radiation community, where some instruments measure standard curves in very short time intervals. Such experiments are usually analyzed visually. In a future stage tools will exist which prepare a default plot given a valid NeXus file and an entry.

As an example this file structure is outlined in figure 1 for a powder diffractometer. In this example VG stands for vGroup, SDS for an SDS, ATT for an attribute and LL for a link. vGroup level is simulated by indentation. The parameters for VG are name, class.

ATT: file_name
ATT: file_time
ATT: NeXus_version
ATT: owner_name
ATT: owner_affiliation
ATT: owner_adress
ATT: owner_telephone_number
ATT: owner_fax_number
ATT: owner_email
VG first_scan, NXentry
   SDS: title
   SDS: start_time
   SDS: end_time
   VG: DMC, NXinstrument
         VG: SinQ, NXsource
             SDS: beam_current
             SDS: time
                  ATT: axis=1
         VG: Ge-Monochromator, NXcrystal
             SDS: Theta
                  ATT: Units=degrees
             SDS: 2Theta
                  ATT: Units=degrees
             SDS: horizontal_bender
                  ATT: Units=mm
             SDS: lambda
                  ATT:Units= Angstroem
             SDS: reflection_used
             SDS: type
         VG: incident_beam, NXbeam
             LL:lambda
         VG: BF3-banana, NXpsd
             SDS: 2Theta
                  ATT: Units=degrees
                  ATT: axis=1
             SDS: Counts
   VG: NaCl, NXsample
         SDS: name
         SDS: temperature
              ATT: Units=K
   VG: Detector, NXdata
       LL: SDS Counts
       LL: SDS 2Theta (from BF3-banana)          
VG second_scan, NXentry
....

Figure1: Example of a NeXus file structure for a powder diffractometer

6. Rules for the storage of individual data items

The first question to answer is: "What to store?" Ideally, all information necessary for common data evaluation should be present in a NeXus file including all the "well known at site" data. All values should be stored as their ideal physical values, with experimental offset stored as attributes. This implies that for instance motor positions are stored as angles or translations in mm or whatever is appropriate and not as encoder readings.

The recommendation for storing data is to use SI-units. The authors understand that SI-units are not in common usage throughout the neutron and x-ray scattering community. NeXus therefore permits the storage of data in any unit aplicable. NeXus, however, requires the specification of the units used as an attribute to the corresponding scientific data set. Unit names must adhere to the standard set by the UDUNITS⁵ units conversion utility provided by NCAR. Udunits is also the recommended tool for performing any necessary unit conversions.

With multi dimensional data, the meaning of each dimension has to be given. The standard HDF way of doing this is via dimension attributes. However, there is a quirk in HDF as dimension attributes have only one file global name space. This implies, if there are two area detectors perhaps on a time of flight instrument a possible dimension attribute phi cannot be used for both of them. In order to cope with this problem a different scheme is used in NeXus. Along with the data, there exist in the same vGroup other scientific data sets which describe the axis. In order to find them, these additional scientific data sets must have an attribute äxis = 1" with the number referring to the dimension defined. As an example, consider an area time-of-flight detector with three axis, phi, chi and time. Such a detector would be represented by four SDS's in its vGroup, one three dimensional one named "Counts", containing the counts, another one named chi which contains the chi positions of the single detectors and the attribute äxis = 1", an SDS named "phi", containing the phi positions of the single detectors with the attribute äxis= 2" and an SDS with the name "time" and the attribute äxis=3".

Another advantage is that this scheme allows multiple axis definitions for any given dimension. For instance a given axis for the area time-of-flight detector discussed above can have an x-axis named phi, defining angles, and another x-axis defining its position in reciprocal space, which can even be multidimensional in itself.

In order to allow for automatic plotting, the default axis has to have another attribute called "primary=1".

Please note, that this scheme requires each detector to reside in its own vGroup within the NXinstrument section and to have its own NXdata entry in order to avoid confusion.

7. The NeXus Glossary

Clearly, common names for common scattering related variables would help when developing general data analysis software. The NeXus glossary tries to standardize the names to use for variables. The namesin this glossary are built along the following guidelines:

All names are in lower case, except for cases where uppercase is in common usage, for example: FWHM.
Names consist of full words connected by underscores.
Sequential names are built by appending the number to the name. For example: field1, field2, ...fieldn.
NeXus class names are prefixed by NX. This convention should avoid confusion with class names provided by third party software developers.
The NeXus hierarchy should be used in order to simplify variable names. For instance, a variable temperature in the sample vGroup should be temperature and not sample_temperature.

More information about the glossary and names already defined can be found on the NeXus WWW-site.

8. The NeXus API

In order to facilitate the implementation of NeXus a set of application programmers interface functions (API) have been defined. This API protects a NeXus user from some of the complexities of the HDF library interface. It also ensures adherence to the NeXus standard as far as this is possible. The usage paradigm implemented with the NeXus API is very much like manipulating a directory hierarchy. The NeXus API functions can be categorized into six groups. The first group deals with file manipulation:

NXopen opens a NeXus file.
NXclose closes a NeXus file.

The second group deals with vGroups:

NXmakegroup creates a vGroup and links it into the current level.
NXopengroup opens an existing vGroup.
NXclosegroup closes a vGroup and steps one level deeper in the vGroup hierarchy.

A similar set of functions exists for dealing with scientific data sets:

NXmakedata creates a new scientific data set and links it into the current level.
NXopendata opens an existing SDS. Now data can be manipulated.
NXclosedata closes a SDS. After this call further manipulation of data is no longer permitted.

Of course there are functions for writing and reading data:

NXgetdata reads data from an open SDS.
NXgetslab reads a subset of a dataset from an open SDS.
NXgetattr reads an SDS attribute.
NXputdata writes data to an open SDS.
NXputslab writes a subset of a dataset to an open SDS.
NXputattr writes an attribute to an SDS.

Then there are functions to query the contents of the NeXus file:

NXgetinfo queries the number type and dimensions of an open SDS.
NXgetnextentry allows for a directory search. It returns the name and class of the next item in the current vGroup level.
NXgetnextattr allows the scanning of the attributes of an SDS or global attributes. The call returns the name of the next attribute in the list.

Last not least there are a few functiosn which help in linking data sets:

NXgetgroupID gets the identification of the current vGroup.
NXgetdataID gets the identification of the current data set.
NXlink links the item specified as parameter at the current vGroup level.

All these functions have been implemented in ANSI-C. An interface for calling this API from Fortran 77 is provided.

9. Conclusion and outlook

General adaption of the NeXus specification for data storage will create a feasible basis for data exchange in the neutron and x-ray scattering community. NeXus will also be a good basis for the development of generic data analysis tools. Most major neutron scattering sources are ready to adapt this new standard.

There already exist some examples of software written for NeXus: The new SICS instrument control software written at PSI by one of the authors uses NeXus for data storage. In addition to the NeXus API, one of the authors has devloped a set of NeXus utility functions and an addtional API which uses a dictionary and a data definition language for the creation of data structures in a NeXus file.

The data analysis program Open-Genie⁵, developed at ISIS, supports NeXus in its latest release. Nick Malinkowski⁶ from NIST has contributed an interface for NeXus to Tcl/TK⁷. Work is in progress at PSI to develop a small angle scattering data analysis suite which uses NeXus as its preferred file format.

Most common general HDF-browsers and tools work with NeXus files without any problems.

More information about NeXus is available from

http://www.neutron.anl.gov/NeXus/

The authors invite comments and suggestions which can be sent to the NeXus mailing list or to the authors directly.

10. References

Jonathan Tischler, ORNL & UNI-CAT, Proposed Data Standard for the APS,(v4), June 6, 1994.
Przemek Klosowski, NIST, Nexus, a portable, self-describing data format for neutron and X-ray scattering data, version 1.24, october 7, 1997.
Mark Könnecke, Rutheford Appleton Laboratory: Proposal for a European Neutron Scattering Data Excahnge Format(v4.2), June 17, 1994.
HDF Reference Manual, NCSA, February 1994, available electronically form ftp.ncsa.uiuc.edu:/Documentation/HDF3.3
UDunits, A library for manipulating units of physical quantities. Available electronically from: http://www.unidata.ucar.edu/packages/udunits/.
F. A. Akeroyd, R. L. Ashworth, S. D. Johnston, J. M. Martin, C. M. Moreton-Smith, D. S. Sivia: Open Genie, 1997.
Nick Maliszewskyj, Tcl-interface to NeXus. Personal Communication.
John K. Ousterhout: Tcl and the Tk Toolkit, Addison-Wesley, 1994.

File translated from T_EX by T_TH, version 0.9.

Contents