From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lon Hohberger Date: Tue, 09 Oct 2012 13:14:39 -0400 Subject: [Cluster-devel] fence daemon problems In-Reply-To: <24E144B8C0207547AD09C467A8259F75576A15FF@lisa.maurer-it.com> References: <24E144B8C0207547AD09C467A8259F755768AE73@lisa.maurer-it.com> <24E144B8C0207547AD09C467A8259F755769CF56@lisa.maurer-it.com> <20121003144614.GB12614@redhat.com> <24E144B8C0207547AD09C467A8259F75576A155B@lisa.maurer-it.com> <20121003162411.GC12614@redhat.com> <24E144B8C0207547AD09C467A8259F75576A15CB@lisa.maurer-it.com> <20121003164433.GD12614@redhat.com> <24E144B8C0207547AD09C467A8259F75576A15FF@lisa.maurer-it.com> Message-ID: <50745B7F.7030404@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On 10/03/2012 12:55 PM, Dietmar Maurer wrote: >> The intention of that is to prevent an inquorate node/partition from killing a >> quorate group of nodes that are running normally. e.g. if a 5 node cluster is >> partitioned into 2/3 or 1/4. You don't want the 2 or 1 node group to fence >> the 3 or 4 nodes that are fine. > > sure, I understand that. > >> The difficult cases, which I think you're seeing, are partitions where no group >> has quorum, e.g. 2/2. In this case we do nothing, and the user has to resolve >> it by resetting some of the nodes > > The problem with that is that those 'difficult' cases are very likely. For example > a switch reboot results in that state if you do not have redundant network (yes, > I know that this setup is simply wrong). > > And things get worse, because it is not possible to reboot such nodes, because > rgmanager shutdown simply hangs. Is there any way to avoid that, so that it is at > least possible to reboot those nodes? > Kill rgmanager and/or 'reboot -fn' ? I thought inquorate reboots worked - please file a bugzilla. -- Lon