From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Wed, 19 Mar 2008 11:38:05 -0500 Subject: [Cluster-devel] RHEL4 Test Patch: bz 345401 Message-ID: <1205944685.2741.31.camel@technetium.msp.redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, This is a RHEL5.x gfs2 patch for bug #345401. I just thought I'd toss it out here to see if this fix makes sense and get comments from people. My explanation will be longer than the patch itself. It seems like there were a couple things going on here, so I'll detail each part of the fix. I'll work from the bottom up because the code change at the bottom of the patch may be the most important one. In the failing scenario, millions (literally) of files are being opened, written, truncated and closed. In many cases, the block numbers used for dinodes are being reused, resulting in the glocks and gfs2_inode structures that are nearly indistinguishable from previous incarnations. Some of these structures may still be in memory and therein lies the problem. (1) First change, at the bottom of the patch: First, in inode_go_lock, when a glock was locked, it could call gfs2_truncatei_resume on new gfs2_inodes in cases where the flags were not yet set. There is a GL_SKIP gh_flag to indicate whether the gfs2_inode is new, so I added code to skip the truncate resume path in that case. So a reused gfs2_inode / glock should no longer call the truncatei_resume code path anymore. (2) Second change, in the middle of the patch: Second, in function scan_glock, the code was putting glocks on the reclaim list if there were no holders. However, it may be possible that there is still someone waiting for the glock, (i.e. on the waiters1, 3 list, or waiters2). There might also be items on the active items list. So I changed it so it would not put glocks on the reclaim list if they have any waiters or active items. I'm not sure if the "waiters" part of this fix is necessary because if there are waiters pending, they should be immediately granted the glock and become "holders" themselves. I don't know if scan_glock can get in there while this is happening. (3) Third change, at the top of the patch: In function search_bucket, it is looking for a glock to use, searching for it by hash key. If the dinode block was reused, it can have the same hash key. My theory is that the glock was being accessed while it was on the reclaim list, and then the reclaim daemon was nuking it part-way through some operations (during times when the glock was released). The reclaim daemon just blasts all glocks on the reclaim list. The code change is to make the code not find glocks by hash-key if the glock in question is on the reclaim list because you can't guarantee the thing won't go away at an inconvenient time. Previous instrumentation indicated this was happening. Regards, Bob Peterson -- diff -pur a/fs/gfs2/glock.c b/fs/gfs2/glock.c --- a/fs/gfs2/glock.c 2008-02-22 10:42:48.000000000 -0600 +++ b/fs/gfs2/glock.c 2008-03-19 10:40:23.000000000 -0500 @@ -251,6 +251,8 @@ static struct gfs2_glock *search_bucket( continue; if (gl->gl_sbd != sdp) continue; + if (!list_empty(&gl->gl_reclaim)) + continue; atomic_inc(&gl->gl_ref); @@ -1688,6 +1690,12 @@ static void scan_glock(struct gfs2_glock if (gl->gl_ops == &gfs2_inode_glops && gl->gl_object) return; + if (!list_empty(&gl->gl_waiters1) || + !list_empty(&gl->gl_waiters3) || + !list_empty(&gl->gl_ail_list) || + test_bit(GLF_WAITERS2, &gl->gl_flags)) + return; + if (gfs2_glmutex_trylock(gl)) { if (list_empty(&gl->gl_holders) && gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl)) diff -pur a/fs/gfs2/glops.c b/fs/gfs2/glops.c --- a/fs/gfs2/glops.c 2008-02-22 10:42:48.000000000 -0600 +++ b/fs/gfs2/glops.c 2008-03-19 10:41:00.000000000 -0500 @@ -306,7 +306,7 @@ static int inode_go_lock(struct gfs2_hol struct gfs2_inode *ip = gl->gl_object; int error = 0; - if (!ip) + if (!ip || (gh->gh_flags & GL_SKIP)) return 0; if (test_bit(GIF_INVALID, &ip->i_flags)) {