From mboxrd@z Thu Jan  1 00:00:00 1970
From: Lon Hohberger <lhh@redhat.com>
Date: Tue, 09 Oct 2012 13:14:39 -0400
Subject: [Cluster-devel] fence daemon problems
In-Reply-To: <24E144B8C0207547AD09C467A8259F75576A15FF@lisa.maurer-it.com>
References: <24E144B8C0207547AD09C467A8259F755768AE73@lisa.maurer-it.com>
	<24E144B8C0207547AD09C467A8259F755769CF56@lisa.maurer-it.com>
	<20121003144614.GB12614@redhat.com>
	<24E144B8C0207547AD09C467A8259F75576A155B@lisa.maurer-it.com>
	<20121003162411.GC12614@redhat.com>
	<24E144B8C0207547AD09C467A8259F75576A15CB@lisa.maurer-it.com>
	<20121003164433.GD12614@redhat.com>
	<24E144B8C0207547AD09C467A8259F75576A15FF@lisa.maurer-it.com>
Message-ID: <50745B7F.7030404@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On 10/03/2012 12:55 PM, Dietmar Maurer wrote:
>> The intention of that is to prevent an inquorate node/partition from killing a
>> quorate group of nodes that are running normally.  e.g. if a 5 node cluster is
>> partitioned into 2/3 or 1/4.  You don't want the 2 or 1 node group to fence
>> the 3 or 4 nodes that are fine.
>
> sure, I understand that.
>
>> The difficult cases, which I think you're seeing, are partitions where no group
>> has quorum, e.g. 2/2.  In this case we do nothing, and the user has to resolve
>> it by resetting some of the nodes
>
> The problem with that is that those 'difficult' cases are very likely. For example
> a switch reboot results in that state if you do not have redundant network (yes,
> I know that this setup is simply wrong).
>
> And things get worse, because it is not possible to reboot such nodes, because
> rgmanager shutdown simply hangs. Is there any way to avoid that, so that it is at
> least possible to reboot those nodes?
>

Kill rgmanager and/or 'reboot -fn' ?

I thought inquorate reboots worked - please file a bugzilla.

-- Lon