From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Mon, 7 Jun 2010 12:34:14 -0500 Subject: [Cluster-devel] GFS2: Wait for journal id on mount if not specified on mount command line In-Reply-To: <1275925149.3158.203.camel@localhost.localdomain> References: <1275925149.3158.203.camel@localhost.localdomain> Message-ID: <20100607173414.GA23174@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Mon, Jun 07, 2010 at 04:39:09PM +0100, Steven Whitehouse wrote: > > This patch implements a wait for the journal id in the case that it has > not been specified on the command line. This is to allow the future > removal of the mount.gfs2 helper. The journal id would instead be > directly communicated by gfs_controld to the file system. Here is a > comparison of the two systems: > > Current: > 1. mount calls mount.gfs2 > 2. mount.gfs2 connects to gfs_controld to retrieve the journal id > 3. mount.gfs2 adds the journal id to the mount command line and calls > the mount system call > 4. gfs_controld receives the status of the mount request via a uevent > > Proposed: > 1. mount calls the mount system call (no mount.gfs2 helper) > 2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know > about already > 3. gfs_controld assigns a journal id to it via sysfs > 4. the mount system call then completes as normal (sending a uevent > according to status) Proposed is the way it originally worked. I switched to using Current back in 2005... unfortunately I don't remember all the specific reasons, but I'm pretty sure it was the error/edge cases that were better handled without sitting in the kernel early in the process. (Especially when you combine simultaneous mounting / mount failures / node failures / recovery.) A couple obvious questions from the start... - What if gfs_controld isn't running? - Won't processes start to access the fs and block during this intermediate time between mount(2) and getting a journal id? All of those processes now need errors returned if gfs_controld returns an error instead of a journal id. Another way to compare them: Current: - get all the userspace/clustering-related/error-laden overhead sorted out - then, at the very end, pull the kernel fs into the picture - collect the result of mount(2) in userpsace, which is almost always "success" Proposed: - pull the kernel fs into the picture - transition to userspace to sort out all the clustering-related / error-laden overhead - get back to the kernel with the result - collect the result of mount(2) in userspace The further you get before you encounter errors, the harder they are to handle. You want most errors to happen earlier, with fewer entities involved, so backing out is easier to do. IIRC, nfs recently moved to using a mount helper after *not* using one for many years. It would be interesting to ask them about their motivations. > The advantage of the proposed system is that it is completely backward > compatible with the current system both at the kernel and at the > userland levels. The "first" parameter can also be set the same way, > with the restriction that it must be set before the journal id is > assigned. That's not an "advantage" of new versus old, which is the missing bit of information here. I'm not against changing it per se, but it seems we'd want some substantial advantage before going to all the effort of changing such a delicate area that has worked quite well for the past 5 years. There's room for real, major improvements in this whole area, but you're barking up the wrong tree. gfs_controld has always been far too complex. But it's *not* a result of current mount helper scheme. It is a direct result of gfs_controld being required to do jobs that gfs (in kernel) should probably handle itself: allocating journal id's, coordinating who does journal recovery, coordinating first mounter recovery, sorting out valid combinations of mount options from different nodes, keeping track of recovered journals vs journals that haven't been recovered, coordinating when all journals have been successfully recovered so that normal fs access can be continued. If you want to do something that's meaningful and beneficial in this area, you need to look at moving *those* things from gfs_controld into gfs. Ocfs2 is a good example here, it handles almost all of that stuff in the kernel, and leaves only what's really necessary for ocfs2_controld. In fact, this could be a perfect area for gfs2/ocfs2 unification: adopt a single fs_controld, single mount/unmount scheme, single node failure/recovery notification scheme, single journal id/allocation scheme. Dave