At Wit's End on system freezes
Gert Doering (gert@greenie.muc.de)
Tue, 24 Nov 1998 20:29:29 +0100
Hi,
On Tue, Nov 24, 1998 at 04:18:40PM +0000, Frank da Cruz wrote:
> To isolate the problem, you must separate Kermit and mgetty. My long
> experience has told me that bidirectional terminal ports on Unix rarely
> work as desired, and when problems such as this occur, they can almost
> always be attributed to the bidirectionality. Remove that and the problems
> go away.
I strongly disagree.
I *wrote* mgetty to make bidirectional port usage work robust and
reliable, and can say that I succeeded. My SCO OpenServer 3.0 system
serves three modems, bidirectionally, for now about 5 years, and not a
single problem that wasn't caused by bad hardware. Some of our systems at
work have really busy lines (about 200 outgoing faxes and 100 incoming
data calls on a given modem per day, for a total of > 1000 faxes a day),
and I don't see any system locks there either.
[OTOH, to nail down the problem, it might be a good idea to run the ports
unidirectionally for a couple of days and see what will happen.]
What is known to cause problems like the ones observed:
- compiling linux 2.0.x kernels with C compilers different to gcc 2.7.2.3
(egcs for sure breaks 2.0 kernels [because kernel ASM and egcs ASM just
don't work together, I'm not blaiming anyone, just stating facts]).
It's *unlikely* that this is the reason here, because it wouldn't
go away if you do anything to the modem.
- bad modem cabling - if you have noise on the cable, especially the
RTS/CTS lines, and weak signals, the serial port might generate too
many IRQs and things might freeze. This is more likely, as it
will be influenced by switching off the modem.
Actually, I *don't think* this is the cause. Why? The Rocketport
card does not *use* an IRQ [as far as I know] and thus the system
itself shouldn't be affected by RTS/CTS noise at all. "Bad things" on
the modem lines could freeze the serial ports, but not the host
computer.
- bad modems. I'm not really sure how that could lead to a system
freeze, but I've had bad experience with Zoom in the past. If you
can, borrow a handful of USR Courier modems somewhere, and see whether
that changes things. From my experience, USR Couriers are absolutely
perfect for 24x7 mostly-data operations.
[..]
> : out. I obtained the latest and greatest RocketPort driver. The
> : behavior stayed the same. I played with the initialization strings in
> : mgetty and the behavior stayed the same.
> :
> This is a critical area. There might very well be a setting that makes
> the problem go away, but you didn't hit upon it in your experimentation.
> For example, did you try all possible DSR-behavior selections? (&S0,
> &S1, &S2, ...)
DSR should be set to "on all times". More interesting is DTR sensitivity.
I usually use AT&D2 or &D3, but some modems don't like AT&D3 and lock up.
[..]
> : In the middle of all this, I
> : discovered that if I turned on and off the offending modem my system
> : "unfroze" without rebooting. This saved a lot of time and frustration
> : for all concerned.
> :
> Because it resets the modem to its factory or saved state, which agrees
> with what mgetty and the port drivers need.
The port driver shouldn't care about the state the modem is in.
Mgetty cares, but all it will do if the modem is hosed is to complain
into the log files (possibly filling up your disk), but never "freeze
the system" - this isn't possible for a user-mode program.
[..]
> : Sometimes when the script is restarted and the modem is then accessed
> : the system freezes, as if the modem is already being used but no lock
> : file exists.
> :
> When you say "the system freezes", do you mean the entire system, or do you
> mean the process that is trying to open the modem? If you mean the whole
> system, then there is a serious problem in the system itself, since no
> user program should be able to freeze Linux.
Yes, exactly. You're voicing my question :-) -- I could easily imagine a
frozen process trying to access the serial port, but hardly a frozen
system (especially not a "frozen system that unfreezes if the modem is
switched off").
[..]
> So again, my first recommendation would be to separate the inbound and
> outbound modems. Configure the inbound modems for answering calls, and let
> Kermit handle the outbound modems. Pay very careful attention to the modem
> signal configurations on the modems (&Sn, &Cn, &Dn, etc), which MUST agree
> in every respect with what your port drivers require.
Actually, to locate whether the problem is caused by inbound or outbound
calls, or the combination of both, this is a good start to nail down the
problems.
gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany gert@greenie.muc.de
fax: +49-89-35655025 gert.doering@physik.tu-muenchen.de