public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: "heming.zhao@suse.com" <heming.zhao@suse.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-raid@vger.kernel.org, song@kernel.org,
	guoqing.jiang@cloud.ionos.com, lidong.zhong@suse.com,
	xni@redhat.com, neilb@suse.de, colyli@suse.com
Subject: Re: [PATCH] md: don't create mddev in md_open
Date: Thu, 1 Apr 2021 00:42:04 +0800	[thread overview]
Message-ID: <7bef7b86-ad8b-b503-59dc-0c9c69974237@suse.com> (raw)
In-Reply-To: <20210331065512.GA987842@infradead.org>

On 3/31/21 2:55 PM, Christoph Hellwig wrote:
>> -static struct mddev *mddev_find(dev_t unit)
>> +static struct mddev *mddev_find(dev_t unit, bool create)
> 
> This just makes the mess that is mddev_find even worse.  Please take
> a look at the patches at the beginning of the
> 
>    "move bd_mutex to the gendisk"
> 
> series to try to clean this up properly.
> 

Hello Christoph,

Because your patch is related with md issue, I use this mail thread to discuss.
If you and other people think the To & Cc need to extend, please do it.

If I understanding the series patches correctly, the purpose of [path 1/15]
is to remove "return -ERESTARTSYS" path.

currently md_open, all the racing handling code is below part:

```md_open
     if (mddev->gendisk != bdev->bd_disk) {
         /* we are racing with mddev_put which is discarding this
          * bd_disk.
          */
         mddev_put(mddev);
         /* Wait until bdev->bd_disk is definitely gone */
         if (work_pending(&mddev->del_work))
             flush_workqueue(md_misc_wq);
         /* Then retry the open from the top */
         return -ERESTARTSYS;
     }
```

mddev is removed from mddev internal list in mddev_put, this function is
the key to raise discarding mddev job.

let's only focus on "mddev->gendisk != bdev->bd_disk" case. there are 2 paths:
1> in creating path
this path is impossible to trigger, userspace md device (/dev/mdX) only valid
after md_alloc successfully completing. this time mddev->gendisk must equal with
bdev->bd_disk.

2> in freeing path. (this is the Neil's patch really cared)
2.1>
md_open is running before mddev is removed from md internal list.
Neil wanted to wait queue_work to finish clean job. then return -ERESTARTSYS.
And on next turn, md_open will find the mddev is null (but in real world, the
mddev_find will alloc a new one. this is a bug, it's not Neil real thoughts)
and return -ENODEV.
Your [path 01/15] breaking this rule. you will mistakenly call mddev_get to block clean job.
In my opinion, the solution may simply return -EBUSY (instead of -ENODEV) to
fail the open path. (I will show the code later)

2.2>
the Neil's patch has a bug (I had said in 2.1), it's related with below case:
md_open is called after mddev_put removing mddev but before finishing md_free().
this time mddev is not exist in md internal list, but bdev->bd_disk still grab
the mddev pointer. this scenatio can't return -ERESTARTSYS, it will make __blkdev_get
infinitely calling md_open and trigger a soft lockup.
this case can be fixed by calling mddev_find without creating mddev job. it responses
your new [patch 04/15], the do only search job's mddev_find.

At last, the code (based on your [PATCH 01/15]) may looks like:
```
static int md_open(struct block_device *bdev, fmode_t mode)
{
     /* ...  */
     struct mddev *mddev = mddev_find(bdev->bd_dev); //hm: the new, only do searching job
     int err;

     if (!mddev) //hm: this will cover freeing path 2.2
         return -ENODEV;

     if (mddev->gendisk != bdev->bd_disk) { //hm: for freeing path 2.1
         /* we are racing with mddev_put which is discarding this
          * bd_disk.
          */
         mddev_put(mddev);
         /* Wait until bdev->bd_disk is definitely gone */
         if (work_pending(&mddev->del_work))
             flush_workqueue(md_misc_wq);
         return -EBUSY; //hm: fail this path. userspace can try later and get -ENODEV.
     }

     /* hm: below same as [PATCH 01/15]*/
     err = mutex_lock_interruptible(&mddev->open_mutex);
     if (err)
         return err;

     if (test_bit(MD_CLOSING, &mddev->flags)) {
         mutex_unlock(&mddev->open_mutex);
         return -ENODEV;
     }

     mddev_get(mddev);
     atomic_inc(&mddev->openers);
     mutex_unlock(&mddev->open_mutex);

     bdev_check_media_change(bdev);
     return 0;
}
```

Thanks,
heming


  reply	other threads:[~2021-03-31 16:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30  7:43 [PATCH] md: don't create mddev in md_open Zhao Heming
2021-03-30  8:28 ` heming.zhao
2021-03-31  6:55 ` Christoph Hellwig
2021-03-31 16:42   ` heming.zhao [this message]
2021-03-31 22:46 ` Song Liu
2021-04-01  0:43   ` heming.zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bef7b86-ad8b-b503-59dc-0c9c69974237@suse.com \
    --to=heming.zhao@suse.com \
    --cc=colyli@suse.com \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=hch@infradead.org \
    --cc=lidong.zhong@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox