From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Teigland <teigland@redhat.com>
Date: Mon, 17 Dec 2018 10:46:58 -0600
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error
 occurs writing to the journal
In-Reply-To: <1890286629.55916662.1545058727858.JavaMail.zimbra@redhat.com>
References: <1033351102.55836224.1545054857301.JavaMail.zimbra@redhat.com>
	<90e95a6b-5893-d26e-95d4-e73680e0326b@citrix.com>
	<1bdf5580-76ca-c2b7-5d2f-8d780b15a06e@redhat.com>
	<1890286629.55916662.1545058727858.JavaMail.zimbra@redhat.com>
Message-ID: <20181217164658.GA13933@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On Mon, Dec 17, 2018 at 09:58:47AM -0500, Bob Peterson wrote:
> Dave Teigland recommended. Unless I'm mistaken, Dave has said that GFS2
> should never withdraw; it should always just kernel panic (Dave, correct
> me if I'm wrong). At least this patch confines that behavior to a small
> subset of withdraws.

The basic idea is that you want to get a malfunctioning node out of the
way as quickly as possible so others can recover and carry on.  Escalating
a partial failure into a total node failure is the best way to do that in
this case.  Specialized recovery paths run from a partially failed node
won't be as reliable, and are prone to blocking all the nodes.

I think a reasonable alternative to this is to just sit in an infinite
retry loop until the i/o succeeds.

Dave