public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
From: "heming.zhao@suse.com" <heming.zhao@suse.com>
To: Song Liu <song@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-raid <linux-raid@vger.kernel.org>,
	Guoqing Jiang <guoqing.jiang@cloud.ionos.com>,
	lidong.zhong@suse.com, Xiao Ni <xni@redhat.com>,
	NeilBrown <neilb@suse.de>, Coly Li <colyli@suse.com>
Subject: Re: [PATCH v2] md: don't create mddev in md_open
Date: Sat, 3 Apr 2021 10:45:27 +0800	[thread overview]
Message-ID: <98619c7a-6684-1727-e64e-70269fe2cd89@suse.com> (raw)
In-Reply-To: <CAPhsuW6RKA66Nj6ncW3+dmx6tjEzP4yZnifQiyHPoy1NSdM_7w@mail.gmail.com>

On 4/3/21 12:01 AM, Song Liu wrote:
> On Fri, Apr 2, 2021 at 1:58 AM heming.zhao@suse.com
> <heming.zhao@suse.com> wrote:
>>
>> On 4/2/21 1:58 PM, Song Liu wrote:
>>> On Thu, Apr 1, 2021 at 6:03 AM heming.zhao@suse.com
>>> <heming.zhao@suse.com> wrote:
>>>>
>>>> On 4/1/21 2:17 PM, Christoph Hellwig wrote:
>>>>> On Thu, Apr 01, 2021 at 09:34:56AM +0800, Zhao Heming wrote:
>>>>>> commit d3374825ce57 ("md: make devices disappear when they are no longer
>>>>>> needed.") introduced protection between mddev creating & removing. The
>>>>>> md_open shouldn't create mddev when all_mddevs list doesn't contain
>>>>>> mddev. With currently code logic, there will be very easy to trigger
>>>>>> soft lockup in non-preempt env.
>>>>>
>>>>> As mention below, please don't make this even more of a mess than it
>>>>> needs to.  We can just pick the two patches doing this from the series
>>>>> I sent:
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> I already got your meaning on previously email.
>>>> I sent v2 patch for Song's review comment. My patch is bugfix, it may need
>>>> to back port into branch maintenance.
>>>>
>>>> Your attachment patch files is partly my patch.
>>>> The left part is in md_open (response [PATCH 01/15] md: remove the code to flush
>>>> an old instance in md_open)
>>>> I still think you directly use bdev->bd_disk->private_data as mddev pointer is not safe.
>>>>
>>>
>>> Hi Christoph and Heming,
>>>
>>> Trying to understand the whole picture here. Please let me know if I
>>> misunderstood anything.
>>>
>>> IIUC, the primary goal of Christoph's set is to get rid of the
>>> ERESTARTSYS hack from md,
>>> and eventually move bd_mutext. 02/15 to 07/15 of this set cleans up
>>> code in md.c, and they
>>> are not very important for the rest of the set (is this correct?).
>>>
>>> Heming, you mentioned "the solution may simply return -EBUSY (instead
>>> of -ENODEV) to
>>> fail the open path". Could you please show the code? Maybe that would
>>> be enough to unblock
>>> the second half of Christoph's set (08/15 to 15/15)?
>>>
>>
>> I already sent the whole picture of md_open (email data: 2021-4-1,
>> subject: Re: [PATCH] md: don't create mddev in md_open).
>> My mail may fail to be delivered (at least, I got the "can't be delivered"
>> info on my last mail for linux-raid & guoqing's address). I put the related
>> email on attachment, anyone can read it again.
>>
>> For the convenience of discussion, I also pasted **patched** md_open below.
>> (I added more comments than enclosed email)
>>
>> ```
>> static int md_open(struct block_device *bdev, fmode_t mode)
>> {
>>      /* section 1 */
>>      struct mddev *mddev = mddev_find(bdev->bd_dev); //hm: the new style, only do searching job
>>      int err;
>>
>>      if (!mddev) //hm: this will cover freeing path 2.2 (refer enclosed file)
>>          return -ENODEV;
>>
>>      if (mddev->gendisk != bdev->bd_disk) { //hm: for freeing path 2.1 (refer enclosed file)
>>          /* we are racing with mddev_put which is discarding this
>>           * bd_disk.
>>           */
>>          mddev_put(mddev);
>>          /* Wait until bdev->bd_disk is definitely gone */
>>          if (work_pending(&mddev->del_work))
>>              flush_workqueue(md_misc_wq);
>>          return -EBUSY; //hm: fail this path. it also makes __blkdev_get return fail, userspace can try later.
>>                         //
>>                         // the legacy code flow:
>>                         //   return -ERESTARTSYS here, later __blkdev_get reentry.
>>                         //   it will trigger first 'if' in this functioin, then
>>                         //   return -ENODEV.
>>      }
>>
>>      /* section 2 */
>>      /* hm: below same as Christoph's [PATCH 01/15] */
>>      err = mutex_lock_interruptible(&mddev->open_mutex);
>>      if (err)
>>          return err;
>>
>>      if (test_bit(MD_CLOSING, &mddev->flags)) {
>>          mutex_unlock(&mddev->open_mutex);
>>          return -ENODEV;
>>      }
>>
>>      mddev_get(mddev);
>>      atomic_inc(&mddev->openers);
>>      mutex_unlock(&mddev->open_mutex);
>>
>>      bdev_check_media_change(bdev);
>>      return 0;
>> }
>> ```
>>
>> I wrote again:
>>> Christoph's patch [01/15] totally dropped <section 1>, and use bdev->bd_disk->private_data
>>> to get mddev pointer, it's not safe.
>>
>> And with above **patched** md_open, Christoph's patches 02-07 can be work happily.
>> He only needs to adjust/modify 01 patch.
>> The md layer behavior will change from return -ENODEV to -EBUSY in some racing scenario.
> 
> Thanks for the explanation.
> 
> Could you please send official patch of modified 01/15 and your fix
> (either in a set or merge into
> one patch)? This patch(set) would go to stable. Then, we can apply
> Christoph's 02 - 15 on top of
> it.
> 
> Thanks,
> Song
> 

Ok, I will resend a patch, which will replace my patch & Chistoph's patch 01.
Then Christoph could re-send 02-07 as v2 patch.
My patch need to work with Christoph's [PATCH 04/15] (md: split mddev_find) to fix soft lockup.
I will mention the relationship in my patch.

Thanks,
Heming


      reply	other threads:[~2021-04-03  2:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  1:34 [PATCH v2] md: don't create mddev in md_open Zhao Heming
2021-04-01  6:17 ` Christoph Hellwig
2021-04-01 13:01   ` heming.zhao
2021-04-02  5:58     ` Song Liu
2021-04-02 10:48       ` Christoph Hellwig
     [not found]       ` <b2288ab4-1da0-612d-8988-637cc33db99a@suse.com>
2021-04-02 16:01         ` Song Liu
2021-04-03  2:45           ` heming.zhao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98619c7a-6684-1727-e64e-70269fe2cd89@suse.com \
    --to=heming.zhao@suse.com \
    --cc=colyli@suse.com \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=hch@infradead.org \
    --cc=lidong.zhong@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox