From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org, Yu Kuai <yukuai3@huawei.com>,
AceLan Kao <acelan@gmail.com>
Subject: Re: [PATCH v2] md: do not _put wrong device in md_seq_next
Date: Wed, 13 Sep 2023 10:26:21 +0200 [thread overview]
Message-ID: <20230913102621.00001039@linux.intel.com> (raw)
In-Reply-To: <CAPhsuW5BsMP0vQo1nF44qTEtVoo5PaikfVT3jTDui5r94om2_Q@mail.gmail.com>
On Tue, 12 Sep 2023 15:49:12 -0700
Song Liu <song@kernel.org> wrote:
> On Tue, Sep 12, 2023 at 6:02 AM Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > During working on changes proposed by Kuai [1], I determined that
> > mddev->active is continusly decremented for array marked by MD_CLOSING.
> > It brought me to md_seq_next() changed by [2]. I determined the regression
> > here, if mddev_get() fails we updated mddev pointer and as a result we
> > _put failed device.
> >
> > I isolated the change in md_seq_next() and tested it- issue is no longer
> > reproducible but I don't see the root cause in this scenario. The bug
> > is obvious so I proceed with fixing. I will submit MD_CLOSING patches
> > separatelly.
> >
> > Put the device which has been _get with previous md_seq_next() call.
> > Add guard for inproper mddev_put usage(). It shouldn't be called if
> > there are less than 1 for mddev->active.
> >
> > I didn't convert atomic_t to refcount_t because refcount warns when 0 is
> > achieved which is likely to happen for mddev->active.
> >
> > [1]
> > https://lore.kernel.org/linux-raid/028a21df-4397-80aa-c2a5-7c754560f595@gmail.com/T/#m6a534677d9654a4236623661c10646d45419ee1b
> > [2] https://bugzilla.kernel.org/show_bug.cgi?id=217798
> >
> > Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk
> > is freed") Cc: Yu Kuai <yukuai3@huawei.com>
> > Cc: AceLan Kao <acelan@gmail.com>
> > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> > ---
> > drivers/md/md.c | 20 +++++++++++---------
> > 1 file changed, 11 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 0fe7ab6e8ab9..bb654ff62765 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct
> > *ws);
> >
> > void mddev_put(struct mddev *mddev)
> > {
> > + /* Guard for ambiguous put. */
> > + if (unlikely(atomic_read(&mddev->active) < 1)) {
> > + pr_warn("%s: active refcount is lower than 1\n",
> > mdname(mddev));
> > + return;
> > + }
> > +
>
> Could you please explain why we need this guard? We should probably fix
> the caller that causes this.
We have an issue, so I added warning to catch similar mistakes. Please
note that for refcount_t such warning is implemented.
If you don't see it useful I can remove it.
>
> > if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
> > return;
> > if (!mddev->raid_disks && list_empty(&mddev->disks) &&
> > @@ -8227,19 +8233,16 @@ static void *md_seq_next(struct seq_file *seq, void
> > *v, loff_t *pos) {
> > struct list_head *tmp;
> > struct mddev *next_mddev, *mddev = v;
> > - struct mddev *to_put = NULL;
>
> IIUC, all we need is the following:
>
> diff --git i/drivers/md/md.c w/drivers/md/md.c
> index 73758b754127..a104a025084d 100644
> --- i/drivers/md/md.c
> +++ w/drivers/md/md.c
> @@ -8265,7 +8265,7 @@ static void *md_seq_next(struct seq_file *seq,
> void *v, loff_t *pos)
> spin_unlock(&all_mddevs_lock);
>
> if (to_put)
> - mddev_put(mddev);
> + mddev_put(to_put);
> return next_mddev;
>
> }
>
> Is this sufficient? If so, how about we ship this first and refactor
> the code later
> in a separate patch?
That is correct it should be sufficient. If you don't want it in one patch I
will drop refactor because Kuai re-implemented it already.
Anyway, changes I made are simple and tested, I don't see it risk in it.
Your proposal will make merge conflicts less likely to appear. I that matters I
can simplify the patch.
Please let me know what you think, then I will send v3.
Thanks,
Mariusz
next prev parent reply other threads:[~2023-09-13 8:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-12 13:01 [PATCH v2] md: do not _put wrong device in md_seq_next Mariusz Tkaczyk
2023-09-12 13:25 ` Yu Kuai
2023-09-12 13:40 ` Mariusz Tkaczyk
2023-09-12 20:24 ` Song Liu
2023-09-12 22:49 ` Song Liu
2023-09-13 8:26 ` Mariusz Tkaczyk [this message]
2023-09-13 16:22 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230913102621.00001039@linux.intel.com \
--to=mariusz.tkaczyk@linux.intel.com \
--cc=acelan@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.