MOO-cows Mailing List Archive


Re: Signal 11 Panics

>>>>> "Martian" == Martian  <> writes:

    Martian> Every once in a while, without any pattern we've seen,
    Martian> MidgardMOO will panic on signal 11.  Twice it's even
    Martian> happened DURING a checkpoint (once causing a loss of both
    Martian> DBs)

    Martian> It's a dedicated server with no one logged into the
    Martian> server itself at the time.

    Martian> First question: What is signal 11?  Second question: What
    Martian> might be causing it?  Third question: How can we stop it
    Martian> panicing the MOO?

Signal 11 is usually a memory problem, such as segment violation, etc.
The Linux 2.0.0 docs mention that you can get a Signal 11 when doing
kernel builds, and that it usually means a memory hardware problem,
such as a flakey RAM module or some such thing.  

Since most modern cheap PCs no longer have memory parity, there is no
good way to detect when this foolishness occurs except a signal 11.
This unfortunately does not give you the physical address of the
error.  Not that it would help nowdays, what with the complicated DRAM
controllers, you never know where a particular address is located in
the actual hardware anymore :-(  

I just hope they have parity on the machine that keeps my bank

So if you are running Linux 2,.0.0, there is a distinct possibility
that you have a bad memory module, and this is just how it shows up.

If you are running on a *REAL* computer, like an HP-9000, or a Sun
Sparc, or a Silly-G, then you might have some other problem (then
again, it might *BE* memory...).

