CIF Syntax

Syntax for the CIF Format


We have already covered the syntax employed in CIFs by example. Here a more formal summary of the rules is presented, which includes some details we have not yet considered.

  1. A text string is a string of printable ASCII characters bounded by blanks, matching single quotes (') or double quotes ("), or (if the string extends over several physical records) by a semicolon as the first character of the first and trailing lines.
  2. A data name is a text string starting with an underline (_) character.
  3. A data item is a text string not starting with an underline, but preceded by a data name to identify it.
  4. A data loop is a list of data names, preceded by `loop_' and followed by a list of data items.
  5. A data block is a collection of data names (looped or not) and data items preceded by a data_xxxx code record (the xxxx represents an arbitrary text string). A data name must be unique within a data block. A data block is terminated by another data_ statement or by the end of file.
  6. A data file is a collection of data blocks. The block codes must be unique within a data file.
  7. A hash character (#) introduces a comment - all further text to the end of a line may be ignored.

These rules are a large subset of the syntax rules governing Self-Defining Text Archive and Retrieval (STAR) files, as described by Hall (1991). The Crystallographic Information File is a particular application of STAR, with some additional restrictions to facilitate crystallographic use. These are:

  1. Lines must not exceed 80 characters in length.
  2. Data names and block codes may not exceed 32 characters in length, and should be treated as case-insensitive. NOTE This only applies to CIF's which conform to the Core Dictionary Version 1. There is NO formal restriction in Version 2 (though in practise the length is restricted to 76 characters).
  3. Data items are recognised as being of number or character type. A text string that is more than 80 characters long, and so extends over more than one line, is of type text, which may be regarded as a subset of the character type.
  4. A data item is of type number if it starts with a digit, plus, minus or period [0-9+-.].
  5. A number may be given in integer, floating-point or scientific notation. A trailing integer within parentheses is understood to be the estimated standard deviation in the final digit(s) of the number.
  6. A data item is of type text if it extends over more than one line. Semicolons as the first character of the first and last lines bound the data.
  7. A data item is of type character if it is not a number or text.
  8. Only one level of loop_ is permitted. Nested loops must be stored as lists within a text field.
  9. Numeric data with physical significance have a default unit stated in the CIF Dictionary. Some alternative units are permitted for certain data items. The indexing data name then has a units extension as specified in the CIF Dictionary.