From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guoqing Jiang Subject: Re: [PATCH 4/8] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang Date: Wed, 3 Aug 2016 10:39:51 +0800 Message-ID: <57A15977.205@suse.com> References: <1469686612-16126-1-git-send-email-gqjiang@suse.com> <1469686612-16126-4-git-send-email-gqjiang@suse.com> <20160801222042.GB18810@kernel.org> <57A01272.4010209@suse.com> <20160802223631.GC98613@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20160802223631.GC98613@kernel.org> Sender: linux-raid-owner@vger.kernel.org To: Shaohua Li Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On 08/03/2016 06:36 AM, Shaohua Li wrote: > On Mon, Aug 01, 2016 at 11:24:34PM -0400, Guoqing Jiang wrote: >> >> On 08/01/2016 06:20 PM, Shaohua Li wrote: >>> On Thu, Jul 28, 2016 at 02:16:48AM -0400, Guoqing Jiang wrote: >>>> When some node leaves cluster, then it's bitmap need to be >>>> synced by another node, so "md*_recover" thread is triggered >>>> for the purpose. However, with below steps. we can find tasks >>>> hang happened either in B or C. >>>> >>>> 1. Node A create a resyncing cluster raid1, assemble it in >>>> other two nodes (B and C). >>>> 2. stop array in B and C. >>>> 3. stop array in A. >>>> >>>> linux44:~ # ps aux|grep md|grep D >>>> root 5938 0.0 0.1 19852 1964 pts/0 D+ 14:52 0:00 mdadm -S md0 >>>> root 5939 0.0 0.0 0 0 ? D 14:52 0:00 [md0_recover] >>>> >>>> linux44:~ # cat /proc/5939/stack >>>> [] dlm_lock_sync+0x71/0x90 [md_cluster] >>>> [] recover_bitmaps+0x125/0x220 [md_cluster] >>>> [] md_thread+0x16d/0x180 [md_mod] >>>> [] kthread+0xb4/0xc0 >>>> [] ret_from_fork+0x58/0x90 >>>> >>>> linux44:~ # cat /proc/5938/stack >>>> [] kthread_stop+0x6e/0x120 >>>> [] md_unregister_thread+0x40/0x80 [md_mod] >>>> [] leave+0x70/0x120 [md_cluster] >>>> [] md_cluster_stop+0x14/0x30 [md_mod] >>>> [] bitmap_free+0x14b/0x150 [md_mod] >>>> [] do_md_stop+0x35b/0x5a0 [md_mod] >>>> [] md_ioctl+0x873/0x1590 [md_mod] >>>> [] blkdev_ioctl+0x214/0x7d0 >>>> [] block_ioctl+0x3d/0x40 >>>> [] do_vfs_ioctl+0x2d4/0x4b0 >>>> [] SyS_ioctl+0x88/0xa0 >>>> [] system_call_fastpath+0x16/0x1b >>>> >>>> The problem is caused by recover_bitmaps can't reliably abort >>>> when the thread is unregistered. So dlm_lock_sync_interruptible >>>> is introduced to detect the thread's situation to fix the problem. >>>> >>>> Reviewed-by: NeilBrown >>>> Signed-off-by: Guoqing Jiang >>>> --- >>>> drivers/md/md-cluster.c | 38 +++++++++++++++++++++++++++++++++++++- >>>> 1 file changed, 37 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c >>>> index ea2699e..f3d584e 100644 >>>> --- a/drivers/md/md-cluster.c >>>> +++ b/drivers/md/md-cluster.c >>>> @@ -10,6 +10,8 @@ >>>> #include >>>> +#include >>>> +#include >>>> #include >>>> #include >>>> #include >>>> @@ -141,6 +143,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res) >>>> return dlm_lock_sync(res, DLM_LOCK_NL); >>>> } >>>> +/* An variation of dlm_lock_sync, which make lock request could >>>> + * be interrupted */ >>>> +static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode, >>>> + struct mddev *mddev) >>>> +{ >>>> + int ret = 0; >>>> + >>>> + ret = dlm_lock(res->ls, mode, &res->lksb, >>>> + res->flags, res->name, strlen(res->name), >>>> + 0, sync_ast, res, res->bast); >>>> + if (ret) >>>> + return ret; >>>> + >>>> + wait_event(res->completion.wait, >>>> + res->completion.done || kthread_should_stop()); >>> can you convert it to a waitq? Directly using the .wait/.done of completion is >>> really intrusive. >> Maybe, but we still need completion for dlm_lock_resource otherwise there >> are different types of dlm_lock_resource, we also need to keep align with >> sync_ast as dlm_lock_sync did. > Yes, we need a waitq and variable like completion.done to indicate the event is > done, and convert the completion API to waitq API in other places like > sync_ast. The point is not using the opaque data structure of 'struct > completion'. Diving into implementation details of a unrelated data structure > (completion here) is really intrusive. I don't want to argue about the intrusive, but we have to create something which could be achieved by the existed thing. OTOH, convert completion to waitqueue should be done in a new patch, and it is a big change though achieveable, we need to do full test for it. Thanks, Guoqing