From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Wed, 13 Nov 2019 15:30:19 -0600 Subject: [Cluster-devel] [PATCH 21/32] gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS In-Reply-To: <20191113213030.237431-1-rpeterso@redhat.com> References: <20191113213030.237431-1-rpeterso@redhat.com> Message-ID: <20191113213030.237431-22-rpeterso@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch closes a timing window in which two processes compete and overlap in the execution of do_xmote for the same glock: Process A Process B ------------------------------------ ----------------------------- 1. Grabs gl_lockref and calls do_xmote 2. Grabs gl_lockref but is blocked 3. Sets GLF_INVALIDATE_IN_PROGRESS 4. Unlocks gl_lockref 5. Calls do_xmote 6. Call glops->go_sync 7. test_and_clear_bit GLF_DIRTY 8. Call gfs2_log_flush Call glops->go_sync 9. (slow IO, so it blocks a long time) test_and_clear_bit GLF_DIRTY It's not dirty (step 7) returns 10. Tests GLF_INVALIDATE_IN_PROGRESS 11. Calls go_inval (rgrp_go_inval) 12. gfs2_rgrp_relse does brelse 13. truncate_inode_pages_range 14. Calls lm_lock UN In step 14 we've just told dlm to give the glock to another node when, in fact, process A has not finished the IO and synced all buffer_heads to disk and make sure their revokes are done. This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS to use test_and_set_bit, and if the bit is already set, process B just ignores it and trusts that process A will do the do_xmote in the proper order. Signed-off-by: Bob Peterson --- fs/gfs2/glock.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index f18b8f2b8ce5..ab72797e3ba1 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -584,7 +584,19 @@ __acquires(&gl->gl_lockref.lock) GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target); if ((target == LM_ST_UNLOCKED || target == LM_ST_DEFERRED) && glops->go_inval) { - set_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags); + /* + * If another process is already doing the invalidate we should + * not interfere. If we call go_sync and it finds the glock is + * not dirty, we might call go_inval prematurely before the + * other go_sync has finished with its revokes. If we then call + * lm_lock prematurely, we've really screwed up: we cannot tell + * dlm to give the glock away until we're synced and + * invalidated. Best thing is to return and trust the other + * process will finish do_xmote tasks in their proper order. + */ + if (test_and_set_bit(GLF_INVALIDATE_IN_PROGRESS, + &gl->gl_flags)) + return; do_error(gl, 0); /* Fail queued try locks */ } gl->gl_req = target; -- 2.23.0