public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Russell Miller <rmiller@duskglow.com>
To: linux-kernel@vger.kernel.org
Subject: subsystem crashes reboot system?
Date: Wed, 2 Apr 2003 11:49:36 -0600	[thread overview]
Message-ID: <200304021149.36511.rmiller@duskglow.com> (raw)

Hi,

I have a feature request, I'm willing to hack away at it myself, but I want to 
know if there's any way of doing what I want to, or if there's a good 
technical reason why it would be impossible.

As I mentioned earlier, we had an ext3 subsystem crash, which a helpful person 
was nice enough to tell me that upgrading the kernel would fix.  All well and 
good.  But this crash left the system in a semi-functional state.  The 
networking stack was up and running, the kernel was running, but the 
filesystem was not functional and because of this the kernel was in a nearly 
unusable state.  Because the system was pingable, most tcp-stack level 
detectors would not have been able to tell that something serious was wrong.  
The machine (our main production machine that serves millions of hits a week) 
was down for three hours.

Since this was an assertion that failed, one would think that bringing the 
system down automatically in an orderly - then, if that fails, disorderly - 
fashion would be possible.  In particular, I would like for it to behave 
similar as with the panic sysctl.  If a subsystem crashes, reboot the 
machine, because the system is essentially worthless in that state.  I 
realize that this behavior isn't required for everyone, so a sysctl 
(panic_on_subsys_crash maybe) would be sufficient.

Since the machine was in a semi-usable state, one might ask why we just didn't 
have an automated process in place.  Two reasons:  a subsystem crashing 
happens rarely enough that I didn't see any reason to put the effort into it 
until now, and when the system is in a state like that it is impossible to 
tell what will work and what will not.  For example, when we did the three 
finger salute, the system would not go down all the way because one of the 
user space programs made an io call to the crashed filesystem.

In order of helpfulness, please tell me (only one of the following is more 
than enough):
- whether I can do this using the existing sysctl mechanism
- whether there is a patch available (or coming available) to do this
- whether there is a technical reason for me not to do this
- what would be a good place in the code to begin applying a patch.

Please CC me with any replies as I am not on the list.

Thanks.

--Russell

             reply	other threads:[~2003-04-02 17:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-02 17:49 Russell Miller [this message]
2003-04-02 18:06 ` subsystem crashes reboot system? Mitch Adair
2003-04-02 18:44   ` Michael Buesch
2003-04-02 18:46     ` Mitch Adair
2003-04-02 19:07     ` Philippe Troin
2003-04-02 19:32       ` Michael Buesch
2003-04-02 21:51 ` Andrew Morton
2003-04-02 21:51   ` Russell Miller
2003-04-02 22:13     ` Andrew Morton
2003-04-02 22:11       ` Russell Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200304021149.36511.rmiller@duskglow.com \
    --to=rmiller@duskglow.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox