From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Marzinski Date: Thu, 6 Mar 2008 17:05:25 -0600 Subject: [Cluster-devel] [PATCH] RHEL fix for bz428751 In-Reply-To: <1204715141.3408.19.camel@localhost.localdomain> References: <20080305064945.GD3639@ether.msp.redhat.com> <1204715141.3408.19.camel@localhost.localdomain> Message-ID: <20080306230525.GA22822@ether.msp.redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, Mar 05, 2008 at 11:05:41AM +0000, Steven Whitehouse wrote: > This doesn't look like it solves the problem... don't we need to move > the ->go_inval call into gfs2_glock_drop_th() ? After all we know at that > point that we'll be dropping the lock, so there is no reason not to > invalidate there. Moving the invalidation into the gfs2_glock_drop_th() causes problems. If the page is already locked when gfs2_glock_drop_th() is called, you deadlock trying to lock the pages when you invalidate the lock. Looking through the code, it seems like the only time this should happen is during a gfs2_readpage() call, like this. #0 [f1d6ec2c] schedule at c06072d9 #1 [f1d6ec94] io_schedule at c0607974 #2 [f1d6eca0] sync_page at c0455074 #3 [f1d6eca4] __wait_on_bit_lock at c0607a89 #4 [f1d6ecb8] __lock_page at c0454fbf #5 [f1d6ece4] truncate_inode_pages_range at c045be78 #6 [f1d6ed50] truncate_inode_pages at c045bed6 #7 [f1d6ed5c] inode_go_inval at f8e4fba2 #8 [f1d6ed64] gfs2_glock_drop_th at f8e4ed97 #9 [f1d6ed80] run_queue at f8e4ef40 #10 [f1d6ed9c] gfs2_glock_nq at f8e4f432 #11 [f1d6edb8] gfs2_glock_nq_atime at f8e5060b #12 [f1d6edfc] gfs2_readpage at f8e56c95 >From the best I can tell, it looks like it would be O.K. to unlock the page before calling glops->go_inval() in this case, assuming that you knew that you were the process that is holding the lock to the page and which page was actually locked, and you had a way to tell gfs2_readpage not to bother unlocking the page once you were finished. Unfortunately, coming up with a good way to pass that information back and forth isn't straightforward. As soon as I come up with a decent answer, I'll post the modified fix. -Ben