MOO-cows Mailing List Archive

[Prev][Next][Index][Thread]

Re: crash: checkpointing while file server burps



> This isn't a bug--it's more of an un-robust response to a network problem.

> MediaMOO was in the middle of checkpointing when NSF lost contact with our
> file server 

I hope the National Science Foundation isn't too upset. :-)

> for a few minutes.  This caused a panic.  No panic dump was made.
> We are running LambdaMOO 1.8.0p5 (unmodified) on a Sun Sparcstation/ipc
> running SunOs 4.1.4.  Here's the output from the core file:

[gdb output snipped]

The partial checkpoint has a time of 11:12.  This is from /var/adm/messages:

> Jul 30 11:07:19 microworld vmunix: NFS write failed for server mc: RPC: Timed o
> Jul 30 11:07:19 microworld vmunix: NFS write error 60 on host mc fh ab09a64a a00000 c000000 51ff0100 12800000 c000000 2000000 1800000 
> Jul 30 11:07:19 microworld vmunix: NFS write failed for server mc: RPC: Timed o
> Jul 30 11:07:19 microworld vmunix: NFS write error 60 on host mc fh ab09a64a a00000 c000000 51ff0100 12800000 c000000 2000000 1800000 
> Jul 30 11:07:19 microworld vmunix: NFS write failed for server mc: RPC: Timed o-t

You probably have the NFS volume mounted with the "soft" option.  This
is usually a bad idea:

       soft   If an NFS file operation has a major  time-
              out then report an I/O error to the calling
              program.  The default is to continue retry-
              ing NFS file operations indefinitely.

       hard   If  an NFS file operation has a major time-
              out then report "server not responding"  on
              the  console  and continue retrying indefi-
              nitely.  This is the default.

Very very few Unix programs deal with errors on writes to disk files
nicely.  In general, Unix disks have to be errorless; this isn't so
hard with modern SCSI and IDE drives which handle this behind your
back, but does pose a problem for networked file systems.  Hence,
you're pretty much stuck with the NFS hard option for anything
significant that involves write operations.  If you're going to run
binaries off the drive, you also need hard set; think of what will
happen when the system tries to page in part of the text segment and
fails....

Jay Carlson
nop@nop.com    nop@ccs.neu.edu    nop@kagoona.mitre.org

Flat text is just *never* what you want.   ---stephen p spackman


References:

Home | Subject Index | Thread Index