From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Mon, 9 Jan 2012 12:00:40 -0500 Subject: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination In-Reply-To: <20120109164626.GA9956@redhat.com> References: <1325782010-8060-6-git-send-email-teigland@redhat.com> <1326126990.2690.43.camel@menhir> <20120109164626.GA9956@redhat.com> Message-ID: <20120109170040.GB9956@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Mon, Jan 09, 2012 at 11:46:26AM -0500, David Teigland wrote: > On Mon, Jan 09, 2012 at 04:36:30PM +0000, Steven Whitehouse wrote: > > On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote: > > > This new method of managing recovery is an alternative to > > > the previous approach of using the userland gfs_controld. > > > > > > - use dlm slot numbers to assign journal id's > > > - use dlm recovery callbacks to initiate journal recovery > > > - use a dlm lock to determine the first node to mount fs > > > - use a dlm lock to track journals that need recovery > > > > I've just been looking at this again, and a question springs to mind... > > how does this deal with nodes which are read-only or spectator mounts? > > In the old system we used to propagate that information to gfs_controld > > but I've not spotted anything similar in the patch so far, so I'm > > wondering whether it needs to know that information or not, > > The dlm allocates a "slot" for all lockspace members, so spectator mounts > (like readonly mounts) would be given a slot/jid. In gfs_controld, > spectator mounts are not be given a jid (that came from the time when > adding a journal required extending the device+fs.) These days, there's > probably no meaningful difference between spectator and readonly mounts. There's one other part, and that's whether a readonly or spectator node should attempt to recover the journal of a failed node. In cluster3 this decision was always a bit mixed up, with some logic in gfs_controld and some in gfs2. We should make a clear decision now and include it in this patch. I think gfs2_recover_func() should return GAVEUP right at the start for any of the cases where you don't want it doing recovery. What cases would you prefer?