linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Artur Paszkiewicz <artur.paszkiewicz@intel.com>,
	Shaohua Li <shli@kernel.org>
Cc: Linux Raid <linux-raid@vger.kernel.org>
Subject: Re: [PATCH] md: create new workqueue for object destruction
Date: Fri, 20 Oct 2017 09:28:49 +1100	[thread overview]
Message-ID: <87wp3qg4bi.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <9759b574-2d3f-de45-0840-c84d9cc10528@intel.com>

[-- Attachment #1: Type: text/plain, Size: 3988 bytes --]

On Thu, Oct 19 2017, Artur Paszkiewicz wrote:

> On 10/19/2017 12:36 AM, NeilBrown wrote:
>> On Wed, Oct 18 2017, Artur Paszkiewicz wrote:
>> 
>>> On 10/18/2017 09:29 AM, NeilBrown wrote:
>>>> On Tue, Oct 17 2017, Shaohua Li wrote:
>>>>
>>>>> On Tue, Oct 17, 2017 at 04:04:52PM +1100, Neil Brown wrote:
>>>>>>
>>>>>> lockdep currently complains about a potential deadlock
>>>>>> with sysfs access taking reconfig_mutex, and that
>>>>>> waiting for a work queue to complete.
>>>>>>
>>>>>> The cause is inappropriate overloading of work-items
>>>>>> on work-queues.
>>>>>>
>>>>>> We currently have two work-queues: md_wq and md_misc_wq.
>>>>>> They service 5 different tasks:
>>>>>>
>>>>>>   mddev->flush_work                       md_wq
>>>>>>   mddev->event_work (for dm-raid)         md_misc_wq
>>>>>>   mddev->del_work (mddev_delayed_delete)  md_misc_wq
>>>>>>   mddev->del_work (md_start_sync)         md_misc_wq
>>>>>>   rdev->del_work                          md_misc_wq
>>>>>>
>>>>>> We need to call flush_workqueue() for md_start_sync and ->event_work
>>>>>> while holding reconfig_mutex, but mustn't hold it when
>>>>>> flushing mddev_delayed_delete or rdev->del_work.
>>>>>>
>>>>>> md_wq is a bit special as it has WQ_MEM_RECLAIM so it is
>>>>>> best to leave that alone.
>>>>>>
>>>>>> So create a new workqueue, md_del_wq, and a new work_struct,
>>>>>> mddev->sync_work, so we can keep two classes of work separate.
>>>>>>
>>>>>> md_del_wq and ->del_work are used only for destroying rdev
>>>>>> and mddev.
>>>>>> md_misc_wq is used for event_work and sync_work.
>>>>>>
>>>>>> Also document the purpose of each flush_workqueue() call.
>>>>>>
>>>>>> This removes the lockdep warning.
>>>>>
>>>>> I had the exactly same patch queued internally,
>>>>
>>>> Cool :-)
>>>>
>>>>>                                                   but the mdadm test suite still
>>>>> shows lockdep warnning. I haven't time to check further.
>>>>>
>>>>
>>>> The only other lockdep I've seen later was some ext4 thing, though I
>>>> haven't tried the full test suite.  I might have a look tomorrow.
>>>
>>> I'm also seeing a lockdep warning with or without this patch,
>>> reproducible with:
>>>
>> 
>> Thanks!
>> Looks like using one workqueue for mddev->del_work and rdev->del_work
>> causes problems.
>> Can you try with this addition please?
>
> It helped for that case but now there is another warning triggered by:
>
> export IMSM_NO_PLATFORM=1 # for platforms without IMSM
> mdadm -C /dev/md/imsm0 -eimsm -n4 /dev/sd[a-d] -R
> mdadm -C /dev/md/vol0 -l5 -n4 /dev/sd[a-d] -R --assume-clean
> mdadm -If sda
> mdadm -a /dev/md127 /dev/sda
> mdadm -Ss

I tried that ... and mdmon gets a SIGSEGV.
imsm_set_disk() calls get_imsm_disk() and gets a NULL back.
It then passes the NULL to mark_failure() and that dereferences it.
(even worse things happen if "CREATE names=yes" appears in mdadm.conf.
I should fix that).

I added
  if (!disk)
     return;

and ran it in a loop, and got the lockdep warning.

This patch gets rid of it for me.  I need to think about it some more
before I commit to it though.

Thanks,
NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e1e7e8dc6878..874e4101721f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5373,8 +5373,11 @@ static int md_alloc(dev_t dev, char *name)
 
 static struct kobject *md_probe(dev_t dev, int *part, void *data)
 {
-	if (create_on_open)
+	if (create_on_open) {
+		/* Wait until bdev->bd_disk is definitely gone (mddev->del_work) */
+		flush_workqueue(md_del_wq);
 		md_alloc(dev, NULL);
+	}
 	return NULL;
 }
 
@@ -7383,8 +7386,6 @@ static int md_open(struct block_device *bdev, fmode_t mode)
 		 * bd_disk.
 		 */
 		mddev_put(mddev);
-		/* Wait until bdev->bd_disk is definitely gone (mddev->del_work) */
-		flush_workqueue(md_del_wq);
 		/* Then retry the open from the top */
 		return -ERESTARTSYS;
 	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-10-19 22:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-17  5:04 [PATCH] md: create new workqueue for object destruction NeilBrown
2017-10-18  6:21 ` Shaohua Li
2017-10-18  7:29   ` NeilBrown
2017-10-18 11:21     ` Artur Paszkiewicz
2017-10-18 22:36       ` NeilBrown
2017-10-19  8:27         ` Artur Paszkiewicz
2017-10-19 22:28           ` NeilBrown [this message]
2017-10-20 14:00             ` Artur Paszkiewicz
2017-10-22 23:31               ` NeilBrown
2017-10-27 10:44                 ` Artur Paszkiewicz
2017-10-29 22:18                   ` NeilBrown
2017-10-30 13:02                     ` Artur Paszkiewicz
2017-11-01  3:57                       ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wp3qg4bi.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=artur.paszkiewicz@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).