cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Fabio M. Di Nitto <fdinitto@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] fencing conditions: what should trigger a fencing operation?
Date: Fri, 20 Nov 2009 08:26:57 +0100	[thread overview]
Message-ID: <4B0644C1.2020804@redhat.com> (raw)
In-Reply-To: <20091119194952.GE23287@redhat.com>

David Teigland wrote:
> On Thu, Nov 19, 2009 at 07:10:54PM +0100, Fabio M. Di Nitto wrote:
>
> The error is detected in gfs.  For every error in every bit of code, the
> developer needs to consider what the appropriate error handling should be:
> What are the consequences (with respect to availability and data
> integrity), both locally and remotely, of the error handling they choose?
> It's case by case.
> 
> If the error could lead to data corruption, then the proper error handling
> is usually to fail fast and hard.

of course, agreed.

> 
> If the error can result in remote nodes being blocked, then the proper
> error handling is usually self-sacrifice to avoid blocking other nodes.

ok, so this is the case we are seeing here. the cluster is half blocked
but there is no self-sacrifice action happening.

> 
> Self-sacrifice means forcibly removing the local node from the cluster so
> that others can recover for it and move on.  There are different ways of
> doing self-sacrifice:
> 
> - panic the local machine (kernel code usually uses this method)
> - killing corosync on the local machine (daemons usually do this)
> - calling reboot (I think rgmanager has used this method)

I don?t have an opinion on how it happens really, as long as it works.

> 
>> panic_on_oops is not cluster specific and not all OOPS are panic == not
>> a clean solution.
> 
> So you want gfs oopses to result in a panic, and non-gfs oopses to *not*
> result in a panic? 

Well partially yes.

We can?t take decision for OOPSes that are not generated within our
code. The user will have to configure that via panic_on_oops or other
means. Maybe our task is to make sure users are aware of this
situation/option (i didn?t check if it is documented).

You have a point by saying that it depends from error to error and this
is exactly where I?d like to head. Maybe it?s time to review our error
paths and make better decisions on what to do. At least within our code.

 There's probably a combination of options that would
> produce this effect.  Most people interested in HA will want all oopses to
> result in a panic and recovery since an oops puts a node in a precarious
> position regardless of where it came from.

I agree, but I don?t think we can kill the node on every OOPS by
default. We can agree that has to be a user configurable choice but we
can improve our stuff to do the right thing (or do better what it does now).

Fabio



  reply	other threads:[~2009-11-20  7:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 11:35 [Cluster-devel] fencing conditions: what should trigger a fencing operation? Fabio M. Di Nitto
2009-11-19 17:04 ` David Teigland
2009-11-19 16:15   ` Steven Whitehouse
2009-11-19 17:28     ` David Teigland
2009-11-19 17:16   ` David Teigland
2009-11-19 18:10   ` Fabio M. Di Nitto
2009-11-19 19:49     ` David Teigland
2009-11-20  7:26       ` Fabio M. Di Nitto [this message]
2009-11-20 17:40         ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B0644C1.2020804@redhat.com \
    --to=fdinitto@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).