From: Joel Becker <Joel.Becker@oracle.com>
To: lkml <linux-kernel@vger.kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
Marcelo Tosatti <marcelo@conectiva.com.br>,
Wim Coekaerts <Wim.Coekaerts@oracle.com>
Subject: [RFC] hangcheck-timer module
Date: Thu, 21 Nov 2002 12:17:11 -0800 [thread overview]
Message-ID: <20021121201711.GG770@nic1-pc.us.oracle.com> (raw)
Folks,
Attached is a module, hangcheck-timer. It is used to detect
when the system goes out to lunch for a period of time, such as when a
driver like qla2x00 udelays a bunch.
The module sets a timer. When the timer goes off, it then uses
the TSC (warning: portability needed) to determine how much real time
has passed.
On a normal system, the real elapsed time will be almost
identical to the expected timer duration. However, if a device decided
to udelay for 60 seconds (or some other circumstance), the module takes
notice. If the margin of error passes a threshold, the machine is
rebooted.
The module is currently used in a cluster environment. After
some time out to lunch, the rest of the cluster will have given up on a
machine. If the machine suddenly comes back and assumes it is still
"live", bad things can happen.
We can also see use for this in a debugging sense, for kernel
hangs as well as driver code. That's why I'm proposing it for general
inclusion.
Comments? Thoughts?
Joel
Building:
The module should happily build against most 2.4 kernels. The
usual module building compile line:
gcc -I /scratch/jlbec/kernel/linux-2.4.20-rc2/include \
-DMODULE -D__KERNEL__ -DLINUX -c -o hangcheck-timer.o \
hangcheck-timer.c
Running:
Load the module with insmod. There are two options.
"hangcheck_tick=<seconds>" specifies the timer timeout, and
"hangcheck_margin=<seconds" specifies the margin of error.
Joel
--
"Friends may come and go, but enemies accumulate."
- Thomas Jones
Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
next reply other threads:[~2002-11-21 20:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-11-21 20:17 Joel Becker [this message]
2002-11-21 20:31 ` [RFC] hangcheck-timer module Brian Gerst
2002-11-21 22:08 ` Joel Becker
-- strict thread matches above, loose matches on Subject: below --
2002-11-21 20:19 Joel Becker
2002-11-22 11:56 ` William Lee Irwin III
2002-11-26 13:35 ` Pavel Machek
2002-11-26 22:36 ` Joel Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021121201711.GG770@nic1-pc.us.oracle.com \
--to=joel.becker@oracle.com \
--cc=Wim.Coekaerts@oracle.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.