From mboxrd@z Thu Jan  1 00:00:00 1970
From: Benjamin Marzinski <bmarzins@redhat.com>
Date: Fri, 7 Mar 2008 09:36:23 -0600
Subject: [Cluster-devel] [PATCH] RHEL fix for bz428751
In-Reply-To: <1204879953.22038.519.camel@quoit>
References: <20080305064945.GD3639@ether.msp.redhat.com>
	<1204715141.3408.19.camel@localhost.localdomain>
	<20080306230525.GA22822@ether.msp.redhat.com>
	<1204879953.22038.519.camel@quoit>
Message-ID: <20080307153623.GB22822@ether.msp.redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On Fri, Mar 07, 2008 at 08:52:33AM +0000, Steven Whitehouse wrote:
> Hi,
> 
> On Thu, 2008-03-06 at 17:05 -0600, Benjamin Marzinski wrote:
> > On Wed, Mar 05, 2008 at 11:05:41AM +0000, Steven Whitehouse wrote:
> > > This doesn't look like it solves the problem... don't we need to move
> > > the ->go_inval call into gfs2_glock_drop_th() ? After all we know at that
> > > point that we'll be dropping the lock, so there is no reason not to
> > > invalidate there.
> > 
> > 
> > Moving the invalidation into the gfs2_glock_drop_th() causes problems.
> > If the page is already locked when gfs2_glock_drop_th() is called, you
> > deadlock trying to lock the pages when you invalidate the lock.
> > 
> > Looking through the code, it seems like the only time this should happen
> > is during a gfs2_readpage() call, like this.
> > 
> >  #0 [f1d6ec2c] schedule at c06072d9
> >  #1 [f1d6ec94] io_schedule at c0607974
> >  #2 [f1d6eca0] sync_page at c0455074
> >  #3 [f1d6eca4] __wait_on_bit_lock at c0607a89
> >  #4 [f1d6ecb8] __lock_page at c0454fbf
> >  #5 [f1d6ece4] truncate_inode_pages_range at c045be78
> >  #6 [f1d6ed50] truncate_inode_pages at c045bed6
> >  #7 [f1d6ed5c] inode_go_inval at f8e4fba2
> >  #8 [f1d6ed64] gfs2_glock_drop_th at f8e4ed97
> >  #9 [f1d6ed80] run_queue at f8e4ef40
> > #10 [f1d6ed9c] gfs2_glock_nq at f8e4f432
> > #11 [f1d6edb8] gfs2_glock_nq_atime at f8e5060b
> > #12 [f1d6edfc] gfs2_readpage at f8e56c95
> > 
> > From the best I can tell, it looks like it would be O.K. to unlock the page
> > before calling glops->go_inval() in this case, assuming that you knew
> > that you were the process that is holding the lock to the page and which page
> > was actually locked, and you had a way to tell gfs2_readpage not to
> > bother unlocking the page once you were finished.
> > 
> > Unfortunately, coming up with a good way to pass that information back
> > and forth isn't straightforward. As soon as I come up with a decent
> > answer, I'll post the modified fix.
> > 
> > -Ben
> > 
> 
> Nothing ought to be holding the page lock while we are unlocking the
> glock, otherwise problems will occur whichever version of the code you
> use. Are you missing the fix for #432057 perhaps?
> 

Yes I am.  Thanks.  That should make things work much better.
-Ben

> Steve.
>