linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Xueshi Hu <xueshi.hu@smartx.com>, song@kernel.org
Cc: linux-raid@vger.kernel.org, yukuai1@huaweicloud.com,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape
Date: Thu, 20 Jul 2023 09:36:39 +0800	[thread overview]
Message-ID: <a3a45aa9-a54c-51ee-8a80-b663a418dc29@huaweicloud.com> (raw)
In-Reply-To: <20230719070954.3084379-2-xueshi.hu@smartx.com>

Hi,

在 2023/07/19 15:09, Xueshi Hu 写道:
> When an IO error happens, reschedule_retry() will increase
> r1conf::nr_queued, which makes freeze_array() unblocked. However, before
> all r1bio in the memory pool are released, the memory pool should not be
> modified. Introduce freeze_array_totally() to solve the problem. Compared
> to freeze_array(), it's more strict because any in-flight io needs to
> complete including queued io.
> 
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
>   drivers/md/raid1.c | 35 +++++++++++++++++++++++++++++++++--
>   1 file changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index dd25832eb045..5605c9680818 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1072,7 +1072,7 @@ static void freeze_array(struct r1conf *conf, int extra)
>   	/* Stop sync I/O and normal I/O and wait for everything to
>   	 * go quiet.
>   	 * This is called in two situations:
> -	 * 1) management command handlers (reshape, remove disk, quiesce).
> +	 * 1) management command handlers (remove disk, quiesce).
>   	 * 2) one normal I/O request failed.
>   
>   	 * After array_frozen is set to 1, new sync IO will be blocked at
> @@ -1111,6 +1111,37 @@ static void unfreeze_array(struct r1conf *conf)
>   	wake_up(&conf->wait_barrier);
>   }
>   
> +/* conf->resync_lock should be held */
> +static int get_pending(struct r1conf *conf)
> +{
> +	int idx, ret;
> +
> +	ret = atomic_read(&conf->nr_sync_pending);
> +	for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++)
> +		ret += atomic_read(&conf->nr_pending[idx]);
> +
> +	return ret;
> +}
> +
> +static void freeze_array_totally(struct r1conf *conf)
> +{
> +	/*
> +	 * freeze_array_totally() is almost the same with freeze_array() except
> +	 * it requires there's no queued io. Raid1's reshape will destroy the
> +	 * old mempool and change r1conf::raid_disks, which are necessary when
> +	 * freeing the queued io.
> +	 */
> +	spin_lock_irq(&conf->resync_lock);
> +	conf->array_frozen = 1;
> +	raid1_log(conf->mddev, "freeze totally");
> +	wait_event_lock_irq_cmd(
> +			conf->wait_barrier,
> +			get_pending(conf) == 0,
> +			conf->resync_lock,
> +			md_wakeup_thread(conf->mddev->thread));
> +	spin_unlock_irq(&conf->resync_lock);
> +}
> +
>   static void alloc_behind_master_bio(struct r1bio *r1_bio,
>   					   struct bio *bio)
>   {
> @@ -3296,7 +3327,7 @@ static int raid1_reshape(struct mddev *mddev)
>   		return -ENOMEM;
>   	}
>   
> -	freeze_array(conf, 0);
> +	freeze_array_totally(conf);

I think this is wrong, raid1_reshape() can't be called with
'reconfig_mutex' grabbed, and this will deadlock because failed io need
this lock to be handled by daemon thread.(see details in [1]).

Be aware that never hold 'reconfig_mutex' to wait for io.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=c4fe7edfc73f750574ef0ec3eee8c2de95324463
>   
>   	/* ok, everything is stopped */
>   	oldpool = conf->r1bio_pool;
> 


  reply	other threads:[~2023-07-20  1:36 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19  7:09 [PATCH v3 0/3] don't change mempool if in-flight r1bio exists Xueshi Hu
2023-07-19  7:09 ` [PATCH v3 1/3] md/raid1: freeze array more strictly when reshape Xueshi Hu
2023-07-20  1:36   ` Yu Kuai [this message]
2023-07-20  1:37     ` Yu Kuai
2023-07-31 14:02       ` Xueshi Hu
2023-08-01  1:24         ` Yu Kuai
2023-07-19  7:09 ` [PATCH v3 2/3] md/raid1: don't allow_barrier() before r1bio got freed Xueshi Hu
2023-07-20  1:47   ` Yu Kuai
2023-07-19  7:09 ` [PATCH v3 3/3] md/raid1: check array size before reshape Xueshi Hu
2023-07-19  7:38   ` Paul Menzel
2023-07-19 11:51     ` Xueshi Hu
2023-07-20  1:28       ` Yu Kuai
2023-07-28 14:42         ` Xueshi Hu
2023-07-29  0:58           ` Yu Kuai
2023-07-29  3:29             ` Xueshi Hu
2023-07-29  3:36               ` Yu Kuai
2023-07-29  3:51                 ` Yu Kuai
2023-07-29  6:16                   ` Xueshi Hu
2023-07-29  7:37                     ` Yu Kuai
2023-07-29 12:23                       ` Xueshi Hu
2023-07-31  1:03                         ` Yu Kuai
2023-07-31  3:48                           ` Xueshi Hu
2023-07-31  6:22                             ` Yu Kuai
2023-07-31 14:12                               ` Xueshi Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3a45aa9-a54c-51ee-8a80-b663a418dc29@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=xueshi.hu@smartx.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).