From: Yu Kuai <yukuai1@huaweicloud.com>
To: Xueshi Hu <xueshi.hu@smartx.com>, song@kernel.org
Cc: linux-raid@vger.kernel.org, yukuai1@huaweicloud.com,
"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
Date: Thu, 20 Jul 2023 09:36:39 +0800 [thread overview]
Message-ID: <a3a45aa9-a54c-51ee-8a80-b663a418dc29@huaweicloud.com> (raw)
In-Reply-To: <20230719070954.3084379-2-xueshi.hu@smartx.com>
Hi,
在 2023/07/19 15:09, Xueshi Hu 写道:
> When an IO error happens, reschedule_retry() will increase
> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> all r1bio in the memory pool are released, the memory pool should not be
> modified. Introduce freeze_array_totally() to solve the problem. Compared
> to freeze_array(), it's more strict because any in-flight io needs to
> complete including queued io.
>
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
> drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
> 1 file changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index dd25832eb045..5605c9680818 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra)
> /* Stop sync I/O and normal I/O and wait for everything to
> * go quiet.
> * This is called in two situations:
> - * 1) management command handlers (reshape, remove disk, quiesce).
> + * 1) management command handlers (remove disk, quiesce).
> * 2) one normal I/O request failed.
>
> * After array_frozen is set to 1, new sync IO will be blocked at
> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
> wake_up(&conf->wait_barrier);
> }
>
> +/* conf->resync_lock should be held */
> +static int get_pending(struct r1conf *conf)
> +{
> + int idx, ret;
> +
> + ret = atomic_read(&conf->nr_sync_pending);
> + for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> + ret += atomic_read(&conf->nr_pending[idx]);
> +
> + return ret;
> +}
> +
> +static void freeze_array_totally(struct r1conf *conf)
> +{
> + /*
> + * freeze_array_totally() is almost the same with freeze_array() except
> + * it requires there's no queued io. Raid1's reshape will destroy the
> + * old mempool and change r1conf::raid_disks, which are necessary when
> + * freeing the queued io.
> + */
> + spin_lock_irq(&conf->resync_lock);
> + conf->array_frozen = 1;
> + raid1_log(conf->mddev, "freeze totally");
> + wait_event_lock_irq_cmd(
> + conf->wait_barrier,
> + get_pending(conf) == 0,
> + conf->resync_lock,
> + md_wakeup_thread(conf->mddev->thread));
> + spin_unlock_irq(&conf->resync_lock);
> +}
> +
> static void alloc_behind_master_bio(struct r1bio *r1_bio,
> struct bio *bio)
> {
> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
> return -ENOMEM;
> }
>
> - freeze_array(conf, 0);
> + freeze_array_totally(conf);
I think this is wrong, raid1_reshape() can't be called with
'reconfig_mutex' grabbed, and this will deadlock because failed io need
this lock to be handled by daemon thread.(see details in [1]).
Be aware that never hold 'reconfig_mutex' to wait for io.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>
> /* ok, everything is stopped */
> oldpool = conf->r1bio_pool;
>
next prev parent reply other threads:[~2023-07-20 1:36 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
2023-07-19 7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
2023-07-20 1:36 ` Yu Kuai [this message]
2023-07-20 1:37 ` Yu Kuai
2023-07-31 14:02 ` Xueshi Hu
2023-08-01 1:24 ` Yu Kuai
2023-07-19 7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
2023-07-20 1:47 ` Yu Kuai
2023-07-19 7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
2023-07-19 7:38 ` Paul Menzel
2023-07-19 11:51 ` Xueshi Hu
2023-07-20 1:28 ` Yu Kuai
2023-07-28 14:42 ` Xueshi Hu
2023-07-29 0:58 ` Yu Kuai
2023-07-29 3:29 ` Xueshi Hu
2023-07-29 3:36 ` Yu Kuai
2023-07-29 3:51 ` Yu Kuai
2023-07-29 6:16 ` Xueshi Hu
2023-07-29 7:37 ` Yu Kuai
2023-07-29 12:23 ` Xueshi Hu
2023-07-31 1:03 ` Yu Kuai
2023-07-31 3:48 ` Xueshi Hu
2023-07-31 6:22 ` Yu Kuai
2023-07-31 14:12 ` Xueshi Hu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3a45aa9-a54c-51ee-8a80-b663a418dc29@huaweicloud.com \
--to=yukuai1@huaweicloud.com \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
--cc=xueshi.hu@smartx.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).