From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:49083 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbcBLVBp (ORCPT ); Fri, 12 Feb 2016 16:01:45 -0500 Subject: Patch "ocfs2/dlm: ignore cleaning the migration mle that is inuse" has been added to the 4.4-stable tree To: xuejiufei@huawei.com, akpm@linux-foundation.org, gregkh@linuxfoundation.org, jlbec@evilplan.org, joseph.qi@huawei.com, junxiao.bi@oracle.com, mfasheh@suse.de, torvalds@linux-foundation.org Cc: , From: Date: Fri, 12 Feb 2016 13:01:44 -0800 Message-ID: <145531090410921@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: This is a note to let you know that I've just added the patch titled ocfs2/dlm: ignore cleaning the migration mle that is inuse to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: ocfs2-dlm-ignore-cleaning-the-migration-mle-that-is-inuse.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >>From bef5502de074b6f6fa647b94b73155d675694420 Mon Sep 17 00:00:00 2001 From: xuejiufei Date: Thu, 14 Jan 2016 15:17:38 -0800 Subject: ocfs2/dlm: ignore cleaning the migration mle that is inuse From: xuejiufei commit bef5502de074b6f6fa647b94b73155d675694420 upstream. We have found that migration source will trigger a BUG that the refcount of mle is already zero before put when the target is down during migration. The situation is as follows: dlm_migrate_lockres dlm_add_migration_mle dlm_mark_lockres_migrating dlm_get_mle_inuse <<<<<< Now the refcount of the mle is 2. dlm_send_one_lockres and wait for the target to become the new master. <<<<<< o2hb detect the target down and clean the migration mle. Now the refcount is 1. dlm_migrate_lockres woken, and put the mle twice when found the target goes down which trigger the BUG with the following message: "ERROR: bad mle: ". Signed-off-by: Jiufei Xue Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/dlm/dlmmaster.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -2519,6 +2519,11 @@ static int dlm_migrate_lockres(struct dl spin_lock(&dlm->master_lock); ret = dlm_add_migration_mle(dlm, res, mle, &oldmle, name, namelen, target, dlm->node_num); + /* get an extra reference on the mle. + * otherwise the assert_master from the new + * master will destroy this. + */ + dlm_get_mle_inuse(mle); spin_unlock(&dlm->master_lock); spin_unlock(&dlm->spinlock); @@ -2554,6 +2559,7 @@ fail: if (mle_added) { dlm_mle_detach_hb_events(dlm, mle); dlm_put_mle(mle); + dlm_put_mle_inuse(mle); } else if (mle) { kmem_cache_free(dlm_mle_cache, mle); mle = NULL; @@ -2571,17 +2577,6 @@ fail: * ensure that all assert_master work is flushed. */ flush_workqueue(dlm->dlm_worker); - /* get an extra reference on the mle. - * otherwise the assert_master from the new - * master will destroy this. - * also, make sure that all callers of dlm_get_mle - * take both dlm->spinlock and dlm->master_lock */ - spin_lock(&dlm->spinlock); - spin_lock(&dlm->master_lock); - dlm_get_mle_inuse(mle); - spin_unlock(&dlm->master_lock); - spin_unlock(&dlm->spinlock); - /* notify new node and send all lock state */ /* call send_one_lockres with migration flag. * this serves as notice to the target node that a @@ -3312,6 +3307,15 @@ top: mle->new_master != dead_node) continue; + if (mle->new_master == dead_node && mle->inuse) { + mlog(ML_NOTICE, "%s: target %u died during " + "migration from %u, the MLE is " + "still keep used, ignore it!\n", + dlm->name, dead_node, + mle->master); + continue; + } + /* If we have reached this point, this mle needs to be * removed from the list and freed. */ dlm_clean_migration_mle(dlm, mle); Patches currently in stable-queue which might be from xuejiufei@huawei.com are queue-4.4/ocfs2-dlm-ignore-cleaning-the-migration-mle-that-is-inuse.patch queue-4.4/ocfs2-dlm-clear-refmap-bit-of-recovery-lock-while-doing-local-recovery-cleanup.patch