From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Marzinski Date: Fri, 7 Mar 2008 09:36:23 -0600 Subject: [Cluster-devel] [PATCH] RHEL fix for bz428751 In-Reply-To: <1204879953.22038.519.camel@quoit> References: <20080305064945.GD3639@ether.msp.redhat.com> <1204715141.3408.19.camel@localhost.localdomain> <20080306230525.GA22822@ether.msp.redhat.com> <1204879953.22038.519.camel@quoit> Message-ID: <20080307153623.GB22822@ether.msp.redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Fri, Mar 07, 2008 at 08:52:33AM +0000, Steven Whitehouse wrote: > Hi, > > On Thu, 2008-03-06 at 17:05 -0600, Benjamin Marzinski wrote: > > On Wed, Mar 05, 2008 at 11:05:41AM +0000, Steven Whitehouse wrote: > > > This doesn't look like it solves the problem... don't we need to move > > > the ->go_inval call into gfs2_glock_drop_th() ? After all we know at that > > > point that we'll be dropping the lock, so there is no reason not to > > > invalidate there. > > > > > > Moving the invalidation into the gfs2_glock_drop_th() causes problems. > > If the page is already locked when gfs2_glock_drop_th() is called, you > > deadlock trying to lock the pages when you invalidate the lock. > > > > Looking through the code, it seems like the only time this should happen > > is during a gfs2_readpage() call, like this. > > > > #0 [f1d6ec2c] schedule at c06072d9 > > #1 [f1d6ec94] io_schedule at c0607974 > > #2 [f1d6eca0] sync_page at c0455074 > > #3 [f1d6eca4] __wait_on_bit_lock at c0607a89 > > #4 [f1d6ecb8] __lock_page at c0454fbf > > #5 [f1d6ece4] truncate_inode_pages_range at c045be78 > > #6 [f1d6ed50] truncate_inode_pages at c045bed6 > > #7 [f1d6ed5c] inode_go_inval at f8e4fba2 > > #8 [f1d6ed64] gfs2_glock_drop_th at f8e4ed97 > > #9 [f1d6ed80] run_queue at f8e4ef40 > > #10 [f1d6ed9c] gfs2_glock_nq at f8e4f432 > > #11 [f1d6edb8] gfs2_glock_nq_atime at f8e5060b > > #12 [f1d6edfc] gfs2_readpage at f8e56c95 > > > > From the best I can tell, it looks like it would be O.K. to unlock the page > > before calling glops->go_inval() in this case, assuming that you knew > > that you were the process that is holding the lock to the page and which page > > was actually locked, and you had a way to tell gfs2_readpage not to > > bother unlocking the page once you were finished. > > > > Unfortunately, coming up with a good way to pass that information back > > and forth isn't straightforward. As soon as I come up with a decent > > answer, I'll post the modified fix. > > > > -Ben > > > > Nothing ought to be holding the page lock while we are unlocking the > glock, otherwise problems will occur whichever version of the code you > use. Are you missing the fix for #432057 perhaps? > Yes I am. Thanks. That should make things work much better. -Ben > Steve. >