|
Overview of SnB
Operation
- Initiate the SnB program:
Run SnB from the directory containing the reflection file
for the structure you wish to consider.
- Enter general information:
On the General Information page,
enter the fundamental information that is requested about the structure.
This information includes most of the parameters (e.g. cell
constants, space group) for which it is difficult to supply default
values. At this time, it is also necessary to specify the type of
data to be used. A single file of high-resolution (1.1-1.2 Angstroms
or better) intensity data with Bijvoet-related reflections merged
is required to look for a complete structure. This type of data is
referred to as Basic data. SnB can be applied to isomorphous
substructures or anomalously scattering substructures using SIR
or SAS data, respectively, at 3-4 Angstroms resolution. SIR
data consist of reflection files for a pair of isomorphous structures.
SAS data is a single reflection file with Bijvoet-related reflections
unmerged.
- Normalize the data:
Before direct methods can be applied to a data set, normalized structure-factor
magnitudes (|E|s) or difference magnitudes (in the case of
SIR or SAS substructure data) must be computed from
the usual structure-factor magnitudes (|F|s). The SnB
GUI provides convenient access to the DREAR suite of data reduction
and error analysis routines in order to compute these quantities.
Simply use the Create Es page
to execute DREAR and generate an SnB input reflection file.
DREAR can accept SCALEPACK and d*TREK output files as well as a simple
free-format ASCII file consisting of the fields H, K, L, |F|
& Sig(|F|) separated by one or more spaces.
If you have |E| values that were computed by some other program,
you can supply them to the main SnB program in the form of
a free-format ASCII file containing H, K, L, |F|, Sig(|F|),
|E| & Sig(|E|).
- Check remaining parameters:
Once the General Information page
has been completed and the data normalized by DREAR, the program will
supply default values for all of the
remaining parameters. It is recommended that the user explore the
other pages and decide whether any parameters need to be modified.
However, it is possible to proceed directly to the Submit
Jobs page and initiate a batch job (see #7) to process trial structures
(default: 1000 trials) after choosing an appropriate name for the
set of output file (see #5).
- Specify output files:
Choose a filename prefix (e.g.,
"structure-name") to be used for the output information
(see the Submit Jobs page). Output file
names are of the form prefix_#.SnB_output. SnB can be run in
multiprocessor mode, and the symbol
# stands for the digit(s) used to denote the processor number. If
only a single processor is being used, the processor number will be
0. A description of the available output files
is given elsewhere in this document.
- Save the current parameters:
At any time, you can save the information contained on the screens
for future use by clicking on the Save
As button. The screens should always be saved after all the information
necessary to run a Shake-and-Bake job has been entered. The
screen information is stored in a so-called "configuration"
file. Use of a filename such as "structure-name.config"
or "job#.config" is recommended. Save can be used later if you want to update
an existing file with a modified set of parameter values. Open can be used to restore a previously saved
set of values.
- Submit a Shake-and-Bake job:
Once you are satisfied with the parameter settings and other values
that are entered, execute the main phasing program in batch mode by
clicking the Process Jobs button
on the Submit Jobs page. The trial structures
will then be processed according to the dual-space Shake-and-Bake
phasing protocol. If you expect the processing time to be long,
you can now logout and return later to check your results.
- Check a submitted job for possible solutions using
the Rmin histogram:
Trial structures with relatively low values of Rmin (the minimal
function) are most likely to be solutions. To see a histogram of the
final Rmin values, go to the Evaluate
Trials page and choose a previously submitted job to be reviewed
(click Update List, select
the desired result files, and then click on the View Histogram button). A clear bimodal
distribution of Rmin values is a strong indication that a solution
has been found. However, users are hereby warned that solutions are
sometimes present even when the histogram appears to be unimodal.
R(true) gives some indication of what Rmin values to expect for solutions,
and R(random) indicates values expected for completely random phase
sets.
- Check other figures of merit:
Confirmation that a solution has likely been obtained should be
sought by checking for consistency with other figures of merit that
are stored, along with Rmin, in an output file called the "trace"
file. The View Sorted Trials
option shows the "trace" file sorted in increasing order
according to Rmin values. A crystallographic R value based
on |E| values and the Eobs/Ecalc correlation coefficient, CC
[Fujinaga, M. & Read, R.J. (1987). J. Appl. Cryst. 20,
517-521], are available as additional figures of merit. True solutions
should always have a correlation between Rmin and the crystallographic
R value. Ideally, there should be a bimodal distribution of
the crystallographic R as well as Rmin.
If the best trials have been subjected to Fourier refinement using
a large fraction of the data, a correlation coefficient of 0.7 or
more is a very strong indication that a solution has been obtained.
As it is implemented in SnB, CC is often not reliable for isomorphous
or anomalously scattering substructures because the weaker difference
data, on which CC depends, are themselves unreliable.
- Check the false minimum indicators:
False minima, typically having a single large "uranium"
peak, do occasionally occur, especially in space group P1. Users should
be suspicious of trials having large values of the R-Ratio (>0.2)
or Peak-Ratio (>5). The R-Ratio is a function of the Rmin values
before and after the imposition of real-space constraints (peak picking).
Peak-Ratio is the density ratio for the largest and second largest
peaks. If the trial with the best figures of merit is suspect because
of either of these criteria, it is wise to look further and inspect
the best trial that lacks any indication that a false minimum exists.
- View the Rmin trace as a function of cycle:
If complete traces have been stored for all the refinement cycles,
then Trace Rmin will show the course of Rmin
values over all cycles for the best trial. Typically, Rmin values
will drop slightly during the first few cycles, and then reach a plateau.
A sudden significant drop in Rmin value followed by stabilization
at a lower plateau is another indication that a solution has been
found.
- Visualize and edit the structure:
To view and manipulate a ball-and-stick representation of the best
trial structure, go to the Evaluate
Trials page and enter the bond distance information. When View Structure is selected, the model will
be displayed. It can be edited to remove obviously incorrect peaks
before saving it as atoms in the .SnB_atom, .SnB_ins, and .SnB_pdb
files.
The visualization feature can also
be useful for substructures. When viewing substructures with a "bond
distance" of 4-5 Angstroms, one hopes NOT to see many "bonds",
and those that do show up should have relatively large distances.
In favorable cases, the visualization window may even reveal the presence
of NCS in multi-site selenomethionine derivatives, and this is a strong
indication that the peaks involved are correct.
Selecting Check Geometry will
display a complete listing of bond distances and angles.
- Look for more atoms:
In order to search for additional atoms, an edited "atom"
file can be recycled in SnB as a single trial by using it as
a model structure (see Trials & Cycles
screen). Use either an edited file that has been saved from a visualization
session ("prefix_#.SnB_atom") or a peak file ("prefix_#.SnB_peak")
that has been manually edited. Unwanted low-density peaks must be
removed from the end of the peak file since the model structure file
will be read until an end of file is encountered. For example, suppose
that the edited atom file from job1 is to be used as a starting point
for 10 more cycles of Shake-and-Bake refinement followed by
20 cycles of Fourier refinement. Note that the current version of
SnB must always do at least one Shake-and-Bake cycle.
The GUI fields which should be changed are:
- Trials to process:
- Starting phases from: Model Structure Atoms
- Number of trials: 1
- Number of Shake-and-Bake cycles: 10
- Input atom file: job1.SnB_atom
- File name prefix for results: job2
- Twice Baking:
- Trials for E-Fourier filtering: All or Best
- Number of cycles: 20
- Number of peaks: as desired
- Minimum |E |: as desired
- Examine other trials:
If the "best" peak file does not yield a chemically sensible
structure or there are indications that false minima are present,
then other trials can be rerun by specifying individual desired trials
using the parameters on the Trials &
Cycles screen. A different descriptive name should be chosen for
the output prefix each time a phasing run is submitted. For example,
suppose that a user wished to obtain the peak file for trial 517 which
had the second lowest value of Rmin. Then the following fields
- Number of trials: 1
- Start at trial: 517
- File name prefix for results: tr517
should be changed to reprocess trial 517. It is also possible to
preform extra Shake-and-Bake cycles with more peaks or to add
or change the twice baking (E-Fourier recycling) cycles. However,
DO NOT CHANGE ANY OTHER PARAMETERS or the desired trial will not be
reproduced. In order to ensure that other parameters do not change,
it is wise to begin by restoring the "configuration" file
for the original job. Also, run the new job on the same machine as
the original.
- What do I do next?
SnB outputs coordinate information
in a variety of ways including both fractional (peak and atom files)
and orthogonal forms (short pdb files). An "ins" file is
produced that allows the coordinates for complete structures to be
input, with minimal editing, to SHELXL for least-squares refinement.
Substructure sites can be put into PHASES, MLPHARE, CNS, or SOLVE
for heavy-atom refinement.
|