From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Teigland <teigland@redhat.com>
Date: Mon, 9 Jan 2012 12:00:40 -0500
Subject: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery
	coordination
In-Reply-To: <20120109164626.GA9956@redhat.com>
References: <1325782010-8060-6-git-send-email-teigland@redhat.com>
	<1326126990.2690.43.camel@menhir>
	<20120109164626.GA9956@redhat.com>
Message-ID: <20120109170040.GB9956@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On Mon, Jan 09, 2012 at 11:46:26AM -0500, David Teigland wrote:
> On Mon, Jan 09, 2012 at 04:36:30PM +0000, Steven Whitehouse wrote:
> > On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote:
> > > This new method of managing recovery is an alternative to
> > > the previous approach of using the userland gfs_controld.
> > > 
> > > - use dlm slot numbers to assign journal id's
> > > - use dlm recovery callbacks to initiate journal recovery
> > > - use a dlm lock to determine the first node to mount fs
> > > - use a dlm lock to track journals that need recovery
> > 
> > I've just been looking at this again, and a question springs to mind...
> > how does this deal with nodes which are read-only or spectator mounts?
> > In the old system we used to propagate that information to gfs_controld
> > but I've not spotted anything similar in the patch so far, so I'm
> > wondering whether it needs to know that information or not,
> 
> The dlm allocates a "slot" for all lockspace members, so spectator mounts
> (like readonly mounts) would be given a slot/jid.  In gfs_controld,
> spectator mounts are not be given a jid (that came from the time when
> adding a journal required extending the device+fs.)  These days, there's
> probably no meaningful difference between spectator and readonly mounts.

There's one other part, and that's whether a readonly or spectator node
should attempt to recover the journal of a failed node.  In cluster3 this
decision was always a bit mixed up, with some logic in gfs_controld and
some in gfs2.

We should make a clear decision now and include it in this patch.
I think gfs2_recover_func() should return GAVEUP right at the start
for any of the cases where you don't want it doing recovery.  What
cases would you prefer?