From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Teigland Date: Thu, 20 Sep 2007 10:54:08 -0500 Subject: [Cluster-devel] [PATCH] gfs2: fix lock cancelling In-Reply-To: <20070920154823.GB19232@fieldses.org> References: <20070920145529.GA17195@fieldses.org> <20070920153153.GB22130@redhat.com> <20070920154823.GB19232@fieldses.org> Message-ID: <20070920155408.GC22130@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Thu, Sep 20, 2007 at 11:48:23AM -0400, J. Bruce Fields wrote: > On Thu, Sep 20, 2007 at 10:31:54AM -0500, David Teigland wrote: > > If found on the recv_list, it means the op has been sent up to the lock > > manager in userspace and is still floating around up there. If we remove > > the op from the recv_list, it means, as you say, that the lock manager > > could get an error back later when it does dev_write() to complete the op. > > (dev_write() just prints an error message currently, doesn't return an > > error to userspace.) > > > > This assumes, of course, that seeing an error, the lock manager could do > > something sensible to bring itself back in sync with the application... as > > we've discussed before, that's a hard problem that we may never solve :-) > > It's a hard problem, but it'll need to be solved some day. And it can't > be solved as long as the kernel isn't even giving userspace the > information it would need to solve the problem. > > For now, could you just generate an unlock request in the case where you > get an error on the write? That's certainly not perfect, but it's no > worse than the current behavior. Oh certainly, I have no problem with making our best attempt. Under certain conditions it may work well enough to be fine. And in the future we may find ways for the lock manager to do better. Dave