Readers: After the Atlanta ACA Meeting (1994) I posted this article
to sci.techniques.xtallography. I finally got around to
creating a hyper-text version. I hope you find it useful.
and Crystallographic Computing
At the recent American Crystallographic Association meeting in
Atlanta, I met a number of crystallographers
who were interested in using linux as an operating system for
crystallographic computing. I thought I would write up a post of
my experiences of compiling and running crystallographic software
under linux.
Although, I am posting this to sci.techniques.xtallography to
encourage other crystallographers to try linux, I have decided
to cross-post it to comp.os.linux.misc to facilitate communication
between crystallographers and other linux users.
I will not pretend that I am the first, or the most knowledgeable
person to use linux for crystallographic computing. I have just
found linux to be a good operating system choice. For those of you
who don't know, linux is a free UNIX-clone operating system for
386/486/Pentium CPU's. I have found that using linux allows me to
make much better use of my available CPU resources. Linux is
impressive.
To start, I have to say that before I got involved with linux, I knew
no UNIX, and I have never done anything more complicated than reformat
a hard drive on a DOS PC. I could do some FORTRAN programming and no
C programming. Having convinced you that I am anything but a computer
genius, I want to say that if I could do it, you can too!
Reasons for switching to LINUX
- Previously I had done my computing using DOS as my operating system.
I found this unsatisfactory for several reasons:
- DOS's single-user, single-tasking capabilities were at best
inconvienent. If I started a long least squares, my computer
was completely useless for doing other things. I thought it would
be better to run a large job in the background while doing graphics
or some other more interactive computing in the foreground.
- Linux like other unixes (unicies (?)) is a multi-user, multi-tasking
operating system. The funnest thing I've done ( and convinced me to
never touch DOS again) was to solve a structure using NRCVAX SOLVER,
DIRDIF92 and SIR92 all at the same time using linux's multi-tasking
capability. Sure, the programs ran a bit slower than they would
have if they were ran sequientially, but the difference wasn't that
much. I was able to scan the trial structures quickly and
pick the best solution. In real time (as measured by amount of
time I had to sit in front of my computer) it was a faster
way to solve the structure. It sure beat using one program, not
finding a solution, going to non-default settings and repeating
the process and then deciding to move to another structure
solution method.
- DOS's special use of the first 640K of RAM. For normal real mode
computing the .exe file had to load within that first 640K along
with all of the devices and other system stuff. Many crystallographic
programs can not be made to fit within this limit. Some
crystallographers who make precompiled PC versions of their software
compiled in memory management to get around the 640K limit.
This was at best a partial solution, because they all used different
and incompatible memory managers. This
required me to reboot my machine every time I wanted to use a different
set of programs.
- A lot of crystallographic source code is freely available but
not all of them have precompiled PC versions. This limits the tools
available to me to solve and refine structures.
- From speaking to some other crystallographers at the Atlanta meeting,
it seemed that many of them had VAX's or micro-VAX's which were slower
than a 486DX. Going to linux would enable them to get better preformance
at a lower price and not have to sacrifice operating system functionality
by having to have to use DOS.
- Linux is a good OS choice for service crystallographers in particular
because of the virtual console (VC) capability included in the linux
kernel. This enables a person sitting at one physical console (keyboard
and screen) to "hotkey" to different VC's with a couple of keystrokes.
I can be logged on as user A and start a least squares and then switch
VC's, login as user B and start doing graphics, or reading mail, editing
source code etc. On linux there are usually 6 different VC's available.
- Linux supports X11, the UNIX graphical user interface. The graphics
capable software that I use, NRCVAX, SIR92, PLUTON, PLATON all use
calls to the X11 libraries, and compile without much difficulty under
linux.
- A lot of crystallographic programs with other UNIX implementations
compile without a lot of difficulty on linux machines.
- The linux community is a very helpful bunch of people. It is easy
to post questions to the comp.os.linux.* newsgroups and get prompt,
easy (for the most part) to understand answers.
- Linux is a fully featured UNIX-type OS. There is a lot of good
software available in either source or binary form which runs under linux.
This includes networking code, a full complement of UNIX utilities and
cool things like Mosaic and other net-surfing tools.
I am sure I have left out a lot of other desireable reasons to go to linux,
but these are the ones which come to my mind now.
Compiling Crystallographic Software on Linux
Traditionally, crystallographers use FORTRAN as their programming language
of choice. The first problem one encounters is how to compile FORTRAN
code with no native FORTRAN 77 compiler. Linux uses the f2c (FORTRAN to C)
translator available from netlib.att.com and the Free Software Foundation's
GNU gcc (C compiler). The linux distributions I have used (SLS and Slackware)
come with a shell script called f77 which simulates the way that a "real"
FORTRAN compiler would work. The f2c translator comes with it's own
libraries which are used when compiling and linking programs.
From what I have read on the net, a free FORTRAN 77 compiler (GNU
g77) is going to be coming out "any day now". I would anticipate
that someone is going to port this to linux. This should make
bringing crystallographic code to linux even easier, but I have
to admit that the f2c+gcc combination isn't bad.
Using f77 (f2c + gcc combination), I have compiled the following with
almost no trouble: SHELXL-93, SHELXS-86, PBDINS, CIFTAB, PATSEE, DIFABS,
DIRDIF92, SIR92, THMA11, PLATON, PLUTON and the NRCVAX program
package. I will give a brief summary of what changes in the source
code were needed to get the program running under linux.
- SHELXL-93:
- For the unix version, I added two little C programs to
get TIME and DATE and elapsed CPU time. Otherwise,
the code compiled very cleanly.
- PBDINS & CIFTAB:
- No real changes to the programs. For CIFTAB I
had to add the unix path for my local implementation.
- SHELXS, PATSEE:
- These codes were generic sources without I/O
( I think the codes were originally SHELXS.FTN and
PATSEE.FTN). I just added simple I/O. I wrote a little
C program to get the job name from the user; this could
have been done in FORTRAN, but I wanted to practice my
C.
- DIFABS:
- I had previously modified this code to run under DOS. This
DOS version compiled and ran without a problem under
linux.
- DIRDIF92:
- This compiled easily under linux. The TIME and DATE
routines looked a little more complicated than some
of the other programs I have worked with, so I just
dummied them out. Someday, I'll go back and fix them.
- SIR92:
- To get this to work, I added my standard TIME/DATE C
program and had to add an underscore to the name of the
C function which does the graphics.
- THMA11:
- I had previously modified this code to run under DOS. This
DOS version compiled and ran without a problem under
linux. Again, I added my standard C TIME/DATE routine.
- PLATON, PLUTON:
- Ton Spek distributes a linux version of these
programs. Actually it is a DEC version with a program
called add.c compiled in to take care of the time/date,
and FORTRAN functions like IOR, IAND which are not
translated by f2c.
For the most part, these are the sorts of changes that are made anyway
when implementing a program in one's local computing environment.
I have also implemented the NRCVAX program package to run under linux.
This was a little more problematic due to some read/write errors involving
direct access files. I wrote some sample code which reproduced the
error and sent it to the f2c maintainer. Although he informed me that
the FORTRAN was "buggy", he was kind enough to modify the f2c library
source code to accmodate the behaviour which the original source code
needed (See the 10 March 1994 entry in the f2c Change Log file). Not
being a computer genius, I can't say whether the code had a bug or not,
I am just grateful for the help I received making the program work
properly.
With the NRCVAX program package I had to write some short C functions
to take care of the IAND, IOR and IEOR FORTRAN functions which are
not translated by f2c. This was accomplished by running the nior.f
niand.f and nieor.f routines through f2c ( with no gcc) and editing the
resulting C source files to include the C Boolean operators: & (for IAND),
| (for IOR) and ^ (for IEOR). I used f2c because I didn't know any C
at the time I did this and it seemed the simplest way of doing things.
Were I to do it again, I would rewrite them in a more "normal" C by
myself, but it works now, so why fix something that's not broken ;-).
The last thing that needed adjustment to get NRCVAX working properly
was a slight modification in the graphics routine, cxdraw.c . I
contacted the author and maintainer of the fvwm window manager (which
comes with the Slackware distribution of linux). He suggested
a very simple change:
wm_hints.initial_state = ZoomState;
change that line to:
wm_hints.initial_state = NormalState;
Everything worked well after that, and I had a functioning NRCVAX
package running on my linux machine!
The execution times are comparable to 32 bit memory manager compiled in DOS
implementations of the same programs, so there is no performance hit taken
on going from DOS to linux. I recently realized that the f77 script
does not pass the compiler optimization switches to gcc, so I have been
comparing unoptimized gcc code with presumeably optimized 32 bit DOS code.
With optimization, the linux based code should improve even more. The
only thing I have had a problem with is that the backspace key doesn't
appear to do it's job with f2c/gcc compiled code. If I mistype something
in an input dialog, I can't seem to erase it properly. I don't know if
this is a problem with f2c or my ignorance of UNIX in setting up my
keyboard. I would guess the latter, but my typing isn't that bad to have
needed to investigate it further.
Hardware Setup
At various linux ftp sites, there are files available for Hardware
compatibilities. I suggest you look there to find out detailed information.
I can say what my hardware is, and that is about it. I have a 486DX-33
with 16 MB RAM; ISA bus; 2 IDE hard drives ( Maxtor 130 MB and Seagate
107MB); one 3.5" floppy; Orchid Fahrenheit 1280+ graphics adapter and
a ViewSonic 15 monitor; and an SMC 16 Elite Ethernet adapter. Given
the limited amount of money available to me, it took me about 2 years
to piece this setup together. It is not what a synthetic chemist would
call a "rational synthesis". It is not the sort of hardware I would
buy now for a linux system. I would get a bigger SCSI disk, tape
back up, 17" monitor and a PCI or VESA bus motherboard. The ISA bus
gets sluggish when using graphics programs (especially like rotating
ORTEP drawings). Because crystallography is a graphics intensive field
you are much better off using a bus which has a higher throughput then
an ISA bus. There are Hardware compatibility lists available at linux
ftp sites. Read them before investing, especially if $$ is tight.
Linux is available on a variety of media, from diskettes to
CD-ROM's and by anonymous ftp from a number of sites as well
as BBS's. I have always used anonymous ftp. Being a citizen of
North America I have used the following sites.
Site IP Address Directory
sunsite.unc.edu 152.2.22.81 pub/Linux
tsx-11.mit.edu 18.172.1.2 pub/linux
Lists of other ftp sites in Europe and Australia are available.
If you are going to do anonymous ftp, the hardest thing about the
whole process is having the patience ( and the wherewithal) to
download the 30 or so diskettes of compressed files, assuming
1.44M 3.5" diskettes. The Slackware distribution I used was
28 diskettes (series a, ap, d, n, x, xap, xd plus a bootdisk
and a rootdisk for installation).
Installing Linux
The best place to begin is by reading the Installation HOWTO
The steps involved in getting linux onto a PC is very clearly
explained in this guide. Briefly, it entails getting linux,
booting your PC from a linux floppy, formating your hard drive,
making your filesystem(s), and installing linux. I have found
the Slackware distribution easy to install, and I recommend
it to anyone who is new to linux. I would also recommend using
a DOS program called 'fips', which non-destructively alters the
size of DOS partitions. I have used it several times as I wanted
more of my hard drives to be dedicated to linux rather than DOS.
Using fips allows one to avoid needing to reformat and reloading
DOS partitions.
There are also many "HOWTO" guides for the various aspects of
setting up a linux system as well a FAQ. I have found them
very useful. By reading them, it has made what could be an
opaque, esoteric process, more understandable for me. The HOWTO
guides are available from sunsite.unc.edu .
I found that the f77 script included with the Slackware 1.1.2
distribution didn't work when compiling NRCVAX programs, but an
older f77 script from the SLS 1.03 distribution did work. I saved
(fortunately!) the older script and have recently modified it
to pass optimization flags to the C compiler. I also rebuilt
the f2c libraries to get the behaviour that I needed (see NRCVAX
section above). The libraries compile with out a problem
under linux, but you need to include the following in the
makefile for libI77:
CFLAGS = -O -DNON_UNIX_STDIO -DPosix_SOURCE -DPad_UDread
It probably wouldn't hurt getting one or two UNIX books as
reference material. I use A Practical Guide to the UNIX
System 2nd edition, by Mark G. Sobell. I have
also found a book on C useful. ( I use Using C by Lee
Atkinson and Mark Atkinson. I find it not such an easy
book, but I have usually found what I am looking for).
Acknowledgements
There are many people who helped me in my effort to turn my DOS PC
into a linux workstation. I want to thank all of the members of the
linux community who made this all possible and who helped me with
my questions. I would also like to thank the maintainer of the f2c
translator, David Gay for accomodating my direct read/write problems,
and Robert Nation
(author of the fvwm window manager) for helping me with getting some
of the graphics routines working. Peter White, who originally suggested
using linux, and who has been very helpful with getting NRCVAX under
linux going. Gianluca Cascarano for advice and suggestions concerning
SIR92. Frank Warmerdam (from somewhere on the net) who gave me a
C program to pass time and date info to a FORTRAN program.
Thanks goes also to programmers of the various crystallographic software
I use for writing such easily portable code, which made my job so
much more easier than it could have been.
boyle@laue.chem.ncsu.edu
Last Updated 5 May 1995