From mboxrd@z Thu Jan 1 00:00:00 1970 From: piaojun Date: Tue, 19 Feb 2019 16:35:54 +0800 Subject: [Ocfs2-devel] [PATCH] ocfs2: wait for recovering done after direct unlock request In-Reply-To: <63ADC13FD55D6546B7DECE290D39E37301278659A8@H3CMLB14-EX.srv.huawei-3com.com> References: <1550124866-20367-1-git-send-email-ge.changwei@h3c.com> <5C6659FA.2000100@huawei.com> <63ADC13FD55D6546B7DECE290D39E3730127854170@H3CMLB14-EX.srv.huawei-3com.com> <5C6679F3.3020306@huawei.com> <63ADC13FD55D6546B7DECE290D39E3730127854352@H3CMLB14-EX.srv.huawei-3com.com> <63ADC13FD55D6546B7DECE290D39E37301278653BC@H3CMLB14-EX.srv.huawei-3com.com> <5C6B58C0.1040506@huawei.com> <63ADC13FD55D6546B7DECE290D39E37301278659A8@H3CMLB14-EX.srv.huawei-3com.com> Message-ID: <5C6BBFEA.2080205@huawei.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com Hi Changwei, On 2019/2/19 10:27, Changwei Ge wrote: > Hi Jun, > > On 2019/2/19 9:16, piaojun wrote: >> Hi Changwei, >> >> Thanks for your patient explaination, and I get your point. But I have >> some suggestion about your patch below. > > Thanks your advise to this patch. > >> >> On 2019/2/18 17:46, Changwei Ge wrote: >>> Hi Jun, >>> >>> Ping... >>> >>> Do you have any further question? >>> If any question uncleared, please let me know. >>> Otherwise, could you please give a feedback? >>> >>> Thanks, >>> Changwei >>> >>> On 2019/2/15 16:48, Changwei Ge wrote: >>>> Hi Jun, >>>> >>>> On 2019/2/15 16:36, piaojun wrote: >>>>> Hi Changwei, >>>>> >>>>> Thanks for your explaination, and I still have a question below. >>>> >>>> Thank you for looking into this. >>>> >>>>> >>>>> On 2019/2/15 14:29, Changwei Ge wrote: >>>>>> Hi Jun, >>>>>> >>>>>> On 2019/2/15 14:20, piaojun wrote: >>>>>>> Hi Changwei, >>>>>>> >>>>>>> The DLM process is a little bit complex, so I suggest pasting the code >>>>>>> path. And I wonder if my code is right? >>>>>>> >>>>>>> Thanks, >>>>>>> Jun >>>>>>> >>>>>>> On 2019/2/14 14:14, Changwei Ge wrote: >>>>>>>> There is scenario causing ocfs2 umount hang when multiple hosts are >>>>>>>> rebooting at the same time. >>>>>>>> >>>>>>>> NODE1 NODE2 NODE3 >>>>>>>> send unlock requset to NODE2 >>>>>>>> dies >>>>>>>> become recovery master >>>>>>>> recover NODE2 >>>>>>>> find NODE2 dead >>>>>>>> mark resource RECOVERING >>>>>>>> directly remove lock from grant list >>>>>>> dlm_do_local_recovery_cleanup >>>>>>> dlm_move_lockres_to_recovery_list >>>>>>> res->state |= DLM_LOCK_RES_RECOVERING; >>>>>>> list_add_tail(&res->recovering, &dlm->reco.resources); >>>>>>> >>>>>>>> calculate usage but RECOVERING marked >>>>>>>> **miss the window of purging >>>>>>> dlmunlock >>>>>>> dlmunlock_remote >>>>>>> dlmunlock_common // unlock successfully directly >>>>>>> >>>>>>> dlm_lockres_calc_usage >>>>>>> __dlm_lockres_calc_usage >>>>>>> __dlm_lockres_unused >>>>>>> if (res->state & (DLM_LOCK_RES_RECOVERING| // won't purge lock as DLM_LOCK_RES_RECOVERING is set >>>>>> >>>>>> True. >>>>>> >>>>>>> >>>>>>>> clear RECOVERING >>>>>>> dlm_finish_local_lockres_recovery >>>>>>> res->state &= ~DLM_LOCK_RES_RECOVERING; >>>>>>> >>>>>>> Could you help explaining where getting stuck? >>>>>> >>>>>> Sure, >>>>>> As dlm missed the window to purge lock resource, it can't be unhashed. >>>>>> >>>>>> During umount: >>>>>> dlm_unregister_domain >>>>>> dlm_migrate_all_locks -> there is always a lock resource hashed, so can't return from dlm_migrate_all_locks() thus hang during umount. >>>>> >>>>> In dlm_migrate_all_locks, lockres will be move to purge_list and purged again: >>>>> dlm_migrate_all_locks >>>>> __dlm_lockres_calc_usage >>>>> list_add_tail(&res->purge, &dlm->purge_list); >>>>> >>>>> Do you mean this process does not work? >>>> >>>> Yes, only for Migrating lock resources, it has a chance to call __dlm_lockres_calc_usage() >>>> But in our situation, the problematic lock resource is obviously no master. So no chance for it to set stack variable *dropped* in dlm_migrate_all_locks() >>>> >>>> Thanks, >>>> Changwei >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Changwei >>>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> To reproduce this iusse, crash a host and then umount ocfs2 >>>>>>>> from another node. >>>>>>>> >>>>>>>> To sovle this, just let unlock progress wait for recovery done. >>>>>>>> >>>>>>>> Signed-off-by: Changwei Ge >>>>>>>> --- >>>>>>>> fs/ocfs2/dlm/dlmunlock.c | 23 +++++++++++++++++++---- >>>>>>>> 1 file changed, 19 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c >>>>>>>> index 63d701c..c8e9b70 100644 >>>>>>>> --- a/fs/ocfs2/dlm/dlmunlock.c >>>>>>>> +++ b/fs/ocfs2/dlm/dlmunlock.c >>>>>>>> @@ -105,7 +105,8 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, >>>>>>>> enum dlm_status status; >>>>>>>> int actions = 0; >>>>>>>> int in_use; >>>>>>>> - u8 owner; >>>>>>>> + u8 owner; >>>>>>>> + int recovery_wait = 0; >>>>>>>> >>>>>>>> mlog(0, "master_node = %d, valblk = %d\n", master_node, >>>>>>>> flags & LKM_VALBLK); >>>>>>>> @@ -208,9 +209,12 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, >>>>>>>> } >>>>>>>> if (flags & LKM_CANCEL) >>>>>>>> lock->cancel_pending = 0; >>>>>>>> - else >>>>>>>> - lock->unlock_pending = 0; >>>>>>>> - >>>>>>>> + else { >>>>>>>> + if (!lock->unlock_pending) >>>>>>>> + recovery_wait = 1; >> >> Could we just check if res->state is in DLM_LOCK_RES_RECOVERING or >> status is DLM_RECOVERING? And there is no need to define an extra >> variable. > > As the lock resource master had died, DLM_RECOVERING can't be returned. > > I prefer adding a stack variable like *recovery_wait* to tell if involved lock resource has a chance to be recovered. > In this way, we won't make code subtle comparing with normal code path. As you know, the case we discuss here is not > that possible to happen when each node in cluster works well. And we can also save some CPU cycles this way. I mean we could check if the res->state is in DLM_LOCK_RES_RECOVERING. As when lock->unlock_pending is cleared in dlm_move_lockres_to_recovery_list, res->state must be set DLM_LOCK_RES_RECOVERING. And I think of another solution which seems a little easier and save some CPU work. We could call __dlm_lockres_calc_usage to put unused lockres into purge list in dlm_finish_local_lockres_recovery. And this will not stuck the unlocking process. > > Thanks, > Changwei > >> >>>>>>>> + else >>>>>>>> + lock->unlock_pending = 0; >>>>>>>> + } >>>>>>>> } >>>>>>>> >>>>>>>> /* get an extra ref on lock. if we are just switching >>>>>>>> @@ -244,6 +248,17 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm, >>>>>>>> spin_unlock(&res->spinlock); >>>>>>>> wake_up(&res->wq); >>>>>>>> >>>>>>>> + if (recovery_wait) { >>>>>>>> + spin_lock(&res->spinlock); >>>>>>>> + /* Unlock request will directly succeed after owner dies, >>>>>>>> + * and the lock is already removed from grant list. We have to >>>>>>>> + * wait for RECOVERING done or we miss the chance to purge it >>>>>>>> + * since the removement is much faster than RECOVERING proc. >>>>>>>> + */ >>>>>>>> + __dlm_wait_on_lockres_flags(res, DLM_LOCK_RES_RECOVERING); >>>>>>>> + spin_unlock(&res->spinlock); >>>>>>>> + } >>>>>>>> + >>>>>>>> /* let the caller's final dlm_lock_put handle the actual kfree */ >>>>>>>> if (actions & DLM_UNLOCK_FREE_LOCK) { >>>>>>>> /* this should always be coupled with list removal */ >>>>>>>> >>>>>>> >>>>>> . >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Ocfs2-devel mailing list >>>> Ocfs2-devel at oss.oracle.com >>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>> >>> . >>> >> > . >