CHAPTER 10     HISTORY OF SAD













                                         PREFACE





                     This standard is the result of a joint investigation

                carried  out  by the Anglo-Australian Observatory and the

                Mount Stromlo  and  Siding  Springs  Observatories.   The

                members of the investigating committee were:



                          A.  Bosma (MSSSO)

                          R.  Ekers (University of Groningen/MSSSO)

                          B.  Newell (MSSSO)

                          J.  Straede (AAO)

                          P.  Wallace (AAO)

                          D.  Warne (MSSSO)










                                                                Page 2





            10.1 INTRODUCTION



                 Data formats fall into three categories:  (i) those recorded

            at  the  telescope,  (ii)  those  used  to  transport  data  from

            institution  to  institution,  and  (iii)  those  used  by   data

            reduction programs.

                 Most discussions on standardized data formats center on  the

            transportability  of  the  data  (case  ii).  Here we aim for two

            additional goals.  First, the data interchange format  is  to  be

            suitable for recording at the telescope in order to eliminate the

            need to copy tapes.  In addition, a common data  format  must  be

            defined so that software can be transferred.

                 Two formats have, in fact, been developed:   an  interchange

            format and an access format.

                 The interchange format is designed for a  sequential  medium

            such  as  magnetic  tape.   It  can  be  kept simple enough to be

            recorded under the constraints operating at most telescopes.   On

            the  other  hand, it can be extended in a standard manner to meet

            the demands of interchange  of  data  which  has  passed  through

            reduction programs which add descriptive information.

                 During data reduction, flexibility and speed of  access  are

            the  prime  requirements.   These programs use the access format.

            In this standard, it is assumed  that  the  data  storage  medium

            allows random access.  In situations where this is not available,

            data reduction programs can operate  on  the  interchange  format

            directly.

                 Although the interchange and access formats are  necessarily

            different,  they  have  enough  common  ground  to ensure ease of

            translation.  This common ground is provided by building each out

            of  the  same  basic unit of data, called an image, and using the

            same format for astronomical descriptions in both cases.

                 The FITS (Flexible Image Transport System) format  developed

            by Wells (KPNO) and Greison (NRAO) was released at about the same

            time as the original version of this standard was proposed.  This

            standard  was  then altered to make it more compatible with the F

            keyword subdivision (see Section 10.2.3).  This change  has  made

            some  features  of  the original standard redundant, particularly

            with regard to comments appended to the  data.   These  redundant

            features have been left in the standard.


                                                                Page 3





            10.2 IMAGES



            10.2.1 Outline



                 The basic unit of astronomical data is an image.   An  image

            is defined as that set of data which it is appropriate to collect

            under  one  astronomical  description.   For  example,  a  single

            spectrum,  whether  one-  or  two-  dimensional,  qualifies as an

            image.  A digitized representation of  a  photographic  plate  as

            produced by a microdensitometer is also an image.  A complete set

            of Aperture Synthesis Radio-Telescope maps may be collected  into

            the one image.

                 An image is made up of a three dimensional data  array  plus

            header  information.  The header may be supplemented by a trailer

            in the  interchange  format  to  cope  with  the  limitations  of

            sequential recording.

                 The format allows for an area around the edges of  the  data

            cube to be excluded from the active data.  This area has two main

            uses.  It can be used for descriptive information  such  as  scan

            line  identifiers  or a wavelength scale.  Alternatively, where a

            reduction process destroys edge information, the  bounds  of  the

            active data can be tightened to compensate.

                 Two classes of comments are available in the format.  One is

            a  fixed  length  field in the header (and trailer in the case of

            the interchange format).  The  other  is  an  extensible  set  of

            comments  which  are  normally  present  only  in  data which has

            undergone reduction.  Both comment fields have been made  largely

            redundant  by  the  introduction  of the keyword subdivision.  As

            they may be removed in a future revision of the format, their use

            is not recommended.


                                                                Page 4





            10.2.2 Image Header



                 The  image  header  is  made  up   of   subdivisions.    One

            subdivision,  the  control subdivision, holds all the information

            necessary to physically  access  the  data.   This  is  the  only

            mandatory part of the header.  The remaining subdivisions contain

            astronomical subdivision required.

                 In the interchange format, the entire control subdivision is

            in  a  standard  character  code  (such as ASCII or EBCDIC).  For

            reasons of efficiency, however,  most  fields  are  converted  to

            their  binary  equivalents  in  the  access format.  Astronomical

            subdivisions use identical codes in both formats and, as  far  as

            is reasonable, contain only character code.


                                                                Page 5





            10.2.3 Keyword Subdivision



                 The keyword subdivision  contains  descriptive  astronomical

            parameters  identified by keywords.  Parameter definitions are of

            the form:



                   KEYWORD = Value(s) / Comments <CR>



            The value or list of values must  be  given  in  the  appropriate

            character  code  and must conform to the Fortran 77 list directed

            I/O conventions.  If there  are  no  comments,  the  "/"  may  be

            omitted  unless  the  value  list  is  truncated.  Keyword "END="

            terminates the list.

                 Although the structure is  not  identical  to  the  Flexible

            Image  Transport  System  (FITS) structure, the same keywords and

            units should be used to enable easy translation to and from FITS.

                 The keyword subdivision follows the same conventions as  the

            special purpose astronomical subdivisions, i.e.  it starts with a

            tag ($KYWRD01) followed by a character count.  In this case,  the

            character  count  is  redundant  and the END keyword always takes

            precedence.


                                                                Page 6





            10.2.4 Special Purpose Subdivisions



                 Although most descriptive  parameters  can  conveniently  be

            specified  in  the  keyword  subdivision,  provision  is made for

            special  purpose  subdivisions.   For  example,  it  may  be   be

            desirable  to  store  a histogram of data values.  This could not

            conveniently be achieved using keywords.

                 Special purpose subdivisions would  generally  be  of  fixed

            format and must comply with the following conventions:



            1.  If a particular subdivision is included in  the  header,  the

                complete  subdivision must be present, though some fields may

                be undefined.  An undefined character field  is  blank-filled

                while  an  undefined  binary  field  is  identified  by  some

                convention appropriate to the field (e.g.,  if  the  standard

                deviation  is  zero, both the mean and standard deviation are

                considered undefined).



            2.  A subdivision can be standard or private.  A  subdivision  is

                made  standard  by  its  acceptance  by those responsible for

                maintaining  the   format   standard   at   the   cooperating

                observatories.    Users   who   wish  to  set  up  a  private

                subdivision for their own use are free to do so.



            3.  Each subdivision starts with  a  six  character  tag;  a  two

                character  version  number;  followed  by  a  four  character

                subdivision length.  Standard  subdivision  tags  must  start

                with  a  "$"  to  avoid  confusion with private subdivisions.

                Private  subdivision  tags  may  not  contain  a  "$".    The

                subdivision  length is the number of bytes in the subdivision

                including tag, version  number  and  length  fields.   It  is

                expressed in formatted character code with I4 format.



            4.  It is recommended practice that all fields be a  multiple  of

                four  bytes  long.   Where a number of consecutive fields are

                normally dealt with as a unit, it is acceptable for the group

                as a whole to be a multiple of four bytes.



            5.  Where an astronomical subdivision contains values  in  binary

                code, it must also contain a descriptor specifying the binary

                code format.  This descriptor follows the conventions defined

                for  the  first  four bytes of the data format description in

                                                                 4

                the image control subdivision (e.g., PDP-11  REAL4  would  be

                "R4PD").



            6.  It is  not  permissable  to  fill  an  undefined  field  with

                "garbage".   Character  code  fields  must  at least be blank

                filled.  Filling with nulls is not acceptable.  Binary fields

                must contain a value which signifies the field is undefined.



            7.  Where a character field represents a number,  leading  blanks

                are  permitted and the absence of a sign signifies a positive

                value.  When the accuracy of the value  does  not  merit  the

                number  of  decimal  places  available,  trailing  blanks are

                recommended in preference to trailing zeroes.


                                                                Page 7





            10.3 INTERCHANGE FORMAT



            10.3.1 Outline



                 An image is recorded on the interchange medium as a  header,

            followed  by  the  data  and, optionally, a trailer and comments.

            Access to the image is assumed to be sequential.   Where  a  file

            concept is appropriate to the medium, there may be several images

            to the file.

                 Except for the control subdivision, interchange format image

            headers  are the same format as their access format counterparts.

            Details of the image header and trailer control subdivisions  are

            given in Appendix A.

                 It is not possible to update values of control parameters in

            a  trailer.  Where data has been truncated for some reason, it is

            assumed that this is determined  from  the  length  of  the  data

            actually encountered rather than from the trailer.

                 When translating  to  access  format,  trailer  astronomical

            subdivisions   are   appended   to   the   corresponding   header

            subdivisions.

                 Comments following the trailer correspond to the  extensible

            comments  and  genealogy  discussed  in section 10.4.4.  They are

            normally present only in reduced data.  Comments made at the time

            of observation are allowed for in the fixed length comment fields

            of the header  and  trailer.   (See  note  in  preface  regarding

            obsolescence of these comment fields.)





            10.3.2 File and Record Structure



                 This discussion assumes magnetic tape is the medium in  use.

            Appropriate modifications would need to be made for other media.

                 An interchange file is bounded by end-of-file  marks  except

            at  the  beginning of the tape where the initial end-of-file mark

            may be omitted.  File and volume  labels,  if  present,  must  be

            separated  from  the  body  of  data by end-of-file marks.  It is

            recommended that physical blocks be of fixed length.

                 To allow error recovery, headers, trailers and comments must

            start  on  a  physical  block  boundary.  The header, trailer and

            comment tags can then be used to locate the bounds of a corrupted

            image.   This method of recovery fails in the unlikely event that

            data looks like a tag.  It is recommended practice that, if a tag

            is  encountered  when  data  is expected during tape reading, the

            contents of the record be printed out assuming the tag is  valid.

            The  decision  as  to  whether  the record is data or not is then

            determined by operator inspection.

                 A technique for automatic error recovery, based  on  tagging

            data  blocks,  is  allowed for.  It is not made compulsory due to

            the difficulty  under  some  operating  systems  of  manipulating

            physical blocks under time critical conditions.  This method uses

            the area which can optionally be set aside at  the  beginning  of

            physical blocks containing data.  Such a tag must comply with the

            following conventions:



            (i)   If four or more bytes are present,  the  first  four  bytes


                                                                Page 8





                  must contain the characters "DATA".

            (ii)  If eight or more bytes are present, the second  four  bytes

                  must  contain a block count in character code in I4 format.

                  Block 1 is the first block in the current image.

            (iii) Remaining bytes are not interpreted by the  error  recovery

                  procedure.



            Tape reading  programs  which  do  not  support  automatic  error

            recovery ignore this tag field.

                 Tags on the header and trailer blocks are the  tags  of  the

            control subdivision, i.e.  "$IMHDRnn" and "$IMTLRnn", where nn is

            the version number of the subdivision.  Comments are tagged  with

            the characters "COMMENTS".

                 Data can start on a physical block boundary  or  immediately

            following  the  header within the same block.  The method used is

            specified in  the  control  subdivision.   Data  logical  records

            immediately   follow   each   other  and  may  lie  across  block

            boundaries.

                 Physical details of the tape  (number  of  tracks,  density,

            block  length)  and  the character code used (ASCII, EBCDIC, BCD,

            etc.) should be noted on a label affixed to the reel.


                                                                Page 9





            10.4 ACCESS FORMAT



            10.4.1 Tree Structure



                 In the access format, images are grouped into trees which in

            most operating systems will correspond to files.  In the simplest

            case there is one image  per  file.   More  commonly,  there  are

            several  images  within a file and these are grouped under a node

            which acts as an index to the images.

                 These  simple  structures  will  suffice  in   most   cases.

            However,  to  allow an astronomer to group his data in a flexible

            and astronomically meaningful way, the access format  allows  for

            an indefinitely extensible tree structure.  Individual images can

            be collected under a common node which may in turn  be  collected

            under  another  node.   While,  in  principle,  there  can  be an

            indefinite number of levels in the tree, in  practice  there  are

            rarely more than one or two.

                 As an example, consider  a  surface  photometry  project  in

            which there are two calibration images (zero and flat-field), and

            a number of observations of each of two objects.  This data might

            be  collected  in  a  tree  as  shown  in  figure  1.  During the

            reduction of this data, the astronomer  might  perform  the  step

            "OBJECT A: 5 - ZERO".

                 Node records and image header control subdivisions have been

            made  similar  so  that the same subroutine can access them both.

            This allows programs to be written in such a way  that  the  tree

            structure  need  not be known in advance.  Each node and image is

            named so that it can be accessed by name or number.

                 At the head of the tree, usually the  first  record  in  the

            file,    there    is    a   tree/file   header   which   contains

            implementation-dependent information.   Under  most  systems  the

            only information required is the record size and a pointer to the

            next vacant record.  Space for a name is provided though this  is

            normally  redundant  as  it  will  be  contained  in  a directory

            maintained by the operating system.  A fixed  length  comment  is

            also allowed for.





            10.4.2 Record Structure



                 Efficiency requirements dictate that the record structure of

            the  access  format  be  optimized  for a particular computer and

            operating  system.   The  structure  must,  however,   meet   the

            following requirements:



            (i)   It is capable of random access;

            (ii)  Records are fixed length;

            (iii) Headers, extensible comments, and data all start on  record

                  boundaries; and

            (iv)  Within the headers, extensible comments, and data,  logical

                  records  immediately  succeed  each  other  and may overlap

                  record boundaries.


                                                               Page 10





            10.4.3 Header Chains



                 During  data  reduction,   astronomical   subdivisions   are

            frequently added to the header, possibly overflowing the original

            space set aside.  This is provided for by  chaining  whereby  the

            first four bytes of the random access record are set aside for an

            integer record pointer to the next record  in  the  header.   All

            record  pointers  assume the first record in the file is zero.  A

            zero pointer indicates the last record  in  the  chain.   When  a

            subdivision  that causes a header overflow is written, the header

            writing subroutines automatically find the next available  record

            from the tree/file header.

                 Chain pointers are not considered  as  part  of  the  header

            logical record.





            10.4.4 Extensible Comments and Genealogy



                 Comments and genealogy are a record of  the  data  reduction

            process.   Comments  are  entered  by  the  person  operating the

            reduction program while the  genealogy  is  a  reduction  history

            automatically  recorded by the program itself.  Genealogy entries

            are enclosed between "$" signs.  Logical records in the  comments

            are  terminated  with  a  carriage  return character and the last

            carriage return is followed by an  end-of-text  character.   Line

            feeds are ignored.

                 Extensible comments and genealogy,  like  headers,  must  be

            capable  of  indefinite  extension  and  so use the same chaining

            technique.


                                                               Page 11





            10.5 CONCLUSIONS



                 This format can be kept simple yet, at the same time, can be

            expanded to satisfy quite complex requirements.  It contains only

            one mandatory item - the control subdivision of the image header.

            The  concept  of  astronomical  subdivisions  provides sufficient

            flexibility to deal with new observing techniques as they  arise.

            Simplicity  is  maintained  for  the  computer  at  the recording

            instrument since  it  will  only  have  to  deal  with  a  header

            appropriate to that instrument.

                 A subset of the format came into  operation  at  Mt  Stromlo

            Observatory  during  the  third  quarter, 1979.  Documentation of

            this implementation is available.