CHAPTER 10 HISTORY OF SAD
PREFACE
This standard is the result of a joint investigation
carried out by the Anglo-Australian Observatory and the
Mount Stromlo and Siding Springs Observatories. The
members of the investigating committee were:
A. Bosma (MSSSO)
R. Ekers (University of Groningen/MSSSO)
B. Newell (MSSSO)
J. Straede (AAO)
P. Wallace (AAO)
D. Warne (MSSSO)
Page 2
10.1 INTRODUCTION
Data formats fall into three categories: (i) those recorded
at the telescope, (ii) those used to transport data from
institution to institution, and (iii) those used by data
reduction programs.
Most discussions on standardized data formats center on the
transportability of the data (case ii). Here we aim for two
additional goals. First, the data interchange format is to be
suitable for recording at the telescope in order to eliminate the
need to copy tapes. In addition, a common data format must be
defined so that software can be transferred.
Two formats have, in fact, been developed: an interchange
format and an access format.
The interchange format is designed for a sequential medium
such as magnetic tape. It can be kept simple enough to be
recorded under the constraints operating at most telescopes. On
the other hand, it can be extended in a standard manner to meet
the demands of interchange of data which has passed through
reduction programs which add descriptive information.
During data reduction, flexibility and speed of access are
the prime requirements. These programs use the access format.
In this standard, it is assumed that the data storage medium
allows random access. In situations where this is not available,
data reduction programs can operate on the interchange format
directly.
Although the interchange and access formats are necessarily
different, they have enough common ground to ensure ease of
translation. This common ground is provided by building each out
of the same basic unit of data, called an image, and using the
same format for astronomical descriptions in both cases.
The FITS (Flexible Image Transport System) format developed
by Wells (KPNO) and Greison (NRAO) was released at about the same
time as the original version of this standard was proposed. This
standard was then altered to make it more compatible with the F
keyword subdivision (see Section 10.2.3). This change has made
some features of the original standard redundant, particularly
with regard to comments appended to the data. These redundant
features have been left in the standard.
Page 3
10.2 IMAGES
10.2.1 Outline
The basic unit of astronomical data is an image. An image
is defined as that set of data which it is appropriate to collect
under one astronomical description. For example, a single
spectrum, whether one- or two- dimensional, qualifies as an
image. A digitized representation of a photographic plate as
produced by a microdensitometer is also an image. A complete set
of Aperture Synthesis Radio-Telescope maps may be collected into
the one image.
An image is made up of a three dimensional data array plus
header information. The header may be supplemented by a trailer
in the interchange format to cope with the limitations of
sequential recording.
The format allows for an area around the edges of the data
cube to be excluded from the active data. This area has two main
uses. It can be used for descriptive information such as scan
line identifiers or a wavelength scale. Alternatively, where a
reduction process destroys edge information, the bounds of the
active data can be tightened to compensate.
Two classes of comments are available in the format. One is
a fixed length field in the header (and trailer in the case of
the interchange format). The other is an extensible set of
comments which are normally present only in data which has
undergone reduction. Both comment fields have been made largely
redundant by the introduction of the keyword subdivision. As
they may be removed in a future revision of the format, their use
is not recommended.
Page 4
10.2.2 Image Header
The image header is made up of subdivisions. One
subdivision, the control subdivision, holds all the information
necessary to physically access the data. This is the only
mandatory part of the header. The remaining subdivisions contain
astronomical subdivision required.
In the interchange format, the entire control subdivision is
in a standard character code (such as ASCII or EBCDIC). For
reasons of efficiency, however, most fields are converted to
their binary equivalents in the access format. Astronomical
subdivisions use identical codes in both formats and, as far as
is reasonable, contain only character code.
Page 5
10.2.3 Keyword Subdivision
The keyword subdivision contains descriptive astronomical
parameters identified by keywords. Parameter definitions are of
the form:
KEYWORD = Value(s) / Comments <CR>
The value or list of values must be given in the appropriate
character code and must conform to the Fortran 77 list directed
I/O conventions. If there are no comments, the "/" may be
omitted unless the value list is truncated. Keyword "END="
terminates the list.
Although the structure is not identical to the Flexible
Image Transport System (FITS) structure, the same keywords and
units should be used to enable easy translation to and from FITS.
The keyword subdivision follows the same conventions as the
special purpose astronomical subdivisions, i.e. it starts with a
tag ($KYWRD01) followed by a character count. In this case, the
character count is redundant and the END keyword always takes
precedence.
Page 6
10.2.4 Special Purpose Subdivisions
Although most descriptive parameters can conveniently be
specified in the keyword subdivision, provision is made for
special purpose subdivisions. For example, it may be be
desirable to store a histogram of data values. This could not
conveniently be achieved using keywords.
Special purpose subdivisions would generally be of fixed
format and must comply with the following conventions:
1. If a particular subdivision is included in the header, the
complete subdivision must be present, though some fields may
be undefined. An undefined character field is blank-filled
while an undefined binary field is identified by some
convention appropriate to the field (e.g., if the standard
deviation is zero, both the mean and standard deviation are
considered undefined).
2. A subdivision can be standard or private. A subdivision is
made standard by its acceptance by those responsible for
maintaining the format standard at the cooperating
observatories. Users who wish to set up a private
subdivision for their own use are free to do so.
3. Each subdivision starts with a six character tag; a two
character version number; followed by a four character
subdivision length. Standard subdivision tags must start
with a "$" to avoid confusion with private subdivisions.
Private subdivision tags may not contain a "$". The
subdivision length is the number of bytes in the subdivision
including tag, version number and length fields. It is
expressed in formatted character code with I4 format.
4. It is recommended practice that all fields be a multiple of
four bytes long. Where a number of consecutive fields are
normally dealt with as a unit, it is acceptable for the group
as a whole to be a multiple of four bytes.
5. Where an astronomical subdivision contains values in binary
code, it must also contain a descriptor specifying the binary
code format. This descriptor follows the conventions defined
for the first four bytes of the data format description in
4
the image control subdivision (e.g., PDP-11 REAL4 would be
"R4PD").
6. It is not permissable to fill an undefined field with
"garbage". Character code fields must at least be blank
filled. Filling with nulls is not acceptable. Binary fields
must contain a value which signifies the field is undefined.
7. Where a character field represents a number, leading blanks
are permitted and the absence of a sign signifies a positive
value. When the accuracy of the value does not merit the
number of decimal places available, trailing blanks are
recommended in preference to trailing zeroes.
Page 7
10.3 INTERCHANGE FORMAT
10.3.1 Outline
An image is recorded on the interchange medium as a header,
followed by the data and, optionally, a trailer and comments.
Access to the image is assumed to be sequential. Where a file
concept is appropriate to the medium, there may be several images
to the file.
Except for the control subdivision, interchange format image
headers are the same format as their access format counterparts.
Details of the image header and trailer control subdivisions are
given in Appendix A.
It is not possible to update values of control parameters in
a trailer. Where data has been truncated for some reason, it is
assumed that this is determined from the length of the data
actually encountered rather than from the trailer.
When translating to access format, trailer astronomical
subdivisions are appended to the corresponding header
subdivisions.
Comments following the trailer correspond to the extensible
comments and genealogy discussed in section 10.4.4. They are
normally present only in reduced data. Comments made at the time
of observation are allowed for in the fixed length comment fields
of the header and trailer. (See note in preface regarding
obsolescence of these comment fields.)
10.3.2 File and Record Structure
This discussion assumes magnetic tape is the medium in use.
Appropriate modifications would need to be made for other media.
An interchange file is bounded by end-of-file marks except
at the beginning of the tape where the initial end-of-file mark
may be omitted. File and volume labels, if present, must be
separated from the body of data by end-of-file marks. It is
recommended that physical blocks be of fixed length.
To allow error recovery, headers, trailers and comments must
start on a physical block boundary. The header, trailer and
comment tags can then be used to locate the bounds of a corrupted
image. This method of recovery fails in the unlikely event that
data looks like a tag. It is recommended practice that, if a tag
is encountered when data is expected during tape reading, the
contents of the record be printed out assuming the tag is valid.
The decision as to whether the record is data or not is then
determined by operator inspection.
A technique for automatic error recovery, based on tagging
data blocks, is allowed for. It is not made compulsory due to
the difficulty under some operating systems of manipulating
physical blocks under time critical conditions. This method uses
the area which can optionally be set aside at the beginning of
physical blocks containing data. Such a tag must comply with the
following conventions:
(i) If four or more bytes are present, the first four bytes
Page 8
must contain the characters "DATA".
(ii) If eight or more bytes are present, the second four bytes
must contain a block count in character code in I4 format.
Block 1 is the first block in the current image.
(iii) Remaining bytes are not interpreted by the error recovery
procedure.
Tape reading programs which do not support automatic error
recovery ignore this tag field.
Tags on the header and trailer blocks are the tags of the
control subdivision, i.e. "$IMHDRnn" and "$IMTLRnn", where nn is
the version number of the subdivision. Comments are tagged with
the characters "COMMENTS".
Data can start on a physical block boundary or immediately
following the header within the same block. The method used is
specified in the control subdivision. Data logical records
immediately follow each other and may lie across block
boundaries.
Physical details of the tape (number of tracks, density,
block length) and the character code used (ASCII, EBCDIC, BCD,
etc.) should be noted on a label affixed to the reel.
Page 9
10.4 ACCESS FORMAT
10.4.1 Tree Structure
In the access format, images are grouped into trees which in
most operating systems will correspond to files. In the simplest
case there is one image per file. More commonly, there are
several images within a file and these are grouped under a node
which acts as an index to the images.
These simple structures will suffice in most cases.
However, to allow an astronomer to group his data in a flexible
and astronomically meaningful way, the access format allows for
an indefinitely extensible tree structure. Individual images can
be collected under a common node which may in turn be collected
under another node. While, in principle, there can be an
indefinite number of levels in the tree, in practice there are
rarely more than one or two.
As an example, consider a surface photometry project in
which there are two calibration images (zero and flat-field), and
a number of observations of each of two objects. This data might
be collected in a tree as shown in figure 1. During the
reduction of this data, the astronomer might perform the step
"OBJECT A: 5 - ZERO".
Node records and image header control subdivisions have been
made similar so that the same subroutine can access them both.
This allows programs to be written in such a way that the tree
structure need not be known in advance. Each node and image is
named so that it can be accessed by name or number.
At the head of the tree, usually the first record in the
file, there is a tree/file header which contains
implementation-dependent information. Under most systems the
only information required is the record size and a pointer to the
next vacant record. Space for a name is provided though this is
normally redundant as it will be contained in a directory
maintained by the operating system. A fixed length comment is
also allowed for.
10.4.2 Record Structure
Efficiency requirements dictate that the record structure of
the access format be optimized for a particular computer and
operating system. The structure must, however, meet the
following requirements:
(i) It is capable of random access;
(ii) Records are fixed length;
(iii) Headers, extensible comments, and data all start on record
boundaries; and
(iv) Within the headers, extensible comments, and data, logical
records immediately succeed each other and may overlap
record boundaries.
Page 10
10.4.3 Header Chains
During data reduction, astronomical subdivisions are
frequently added to the header, possibly overflowing the original
space set aside. This is provided for by chaining whereby the
first four bytes of the random access record are set aside for an
integer record pointer to the next record in the header. All
record pointers assume the first record in the file is zero. A
zero pointer indicates the last record in the chain. When a
subdivision that causes a header overflow is written, the header
writing subroutines automatically find the next available record
from the tree/file header.
Chain pointers are not considered as part of the header
logical record.
10.4.4 Extensible Comments and Genealogy
Comments and genealogy are a record of the data reduction
process. Comments are entered by the person operating the
reduction program while the genealogy is a reduction history
automatically recorded by the program itself. Genealogy entries
are enclosed between "$" signs. Logical records in the comments
are terminated with a carriage return character and the last
carriage return is followed by an end-of-text character. Line
feeds are ignored.
Extensible comments and genealogy, like headers, must be
capable of indefinite extension and so use the same chaining
technique.
Page 11
10.5 CONCLUSIONS
This format can be kept simple yet, at the same time, can be
expanded to satisfy quite complex requirements. It contains only
one mandatory item - the control subdivision of the image header.
The concept of astronomical subdivisions provides sufficient
flexibility to deal with new observing techniques as they arise.
Simplicity is maintained for the computer at the recording
instrument since it will only have to deal with a header
appropriate to that instrument.
A subset of the format came into operation at Mt Stromlo
Observatory during the third quarter, 1979. Documentation of
this implementation is available.