linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	Song Liu <song@kernel.org>, Christoph Hellwig <hch@infradead.org>,
	Guoqing Jiang <guoqing.jiang@linux.dev>, Xiao Ni <xni@redhat.com>,
	Stephen Bates <sbates@raithlin.com>,
	Martin Oliveira <Martin.Oliveira@eideticom.com>,
	David Sloan <David.Sloan@eideticom.com>
Subject: Re: [PATCH v1 14/15] md: Ensure resync is reported after it starts
Date: Sat, 21 May 2022 04:51:49 -0700	[thread overview]
Message-ID: <YojSVXNF1ITIhlUl@infradead.org> (raw)
In-Reply-To: <20220519191311.17119-15-logang@deltatee.com>

On Thu, May 19, 2022 at 01:13:10PM -0600, Logan Gunthorpe wrote:
> The 07layouts test in mdadm fails on some systems. The failure
> presents itself as the backup file not being removed before the next
> layout is grown into:
> 
>   mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
>       File exists
> 
> This is because the background mdadm process, which is responsible for
> cleaning up this backup file gets into an infinite loop waiting for
> the reshape to start. mdadm checks the mdstat file if a reshape is
> going and, if it is not, it waits for an event on the file or times
> out in 5 seconds. On faster machines, the reshape may complete before
> the 5 seconds times out, and thus the background mdadm process loops
> waiting for a reshape to start that has already occurred.
> 
> mdadm reads the mdstat file to start, but mdstat does not report that the
> reshape has begun, even though it has indeed begun. So the mdstat_wait()
> call (in mdadm) which polls on the mdstat file won't ever return until
> timing out.
> 
> The reason mdstat reports the reshape has started is due to an issue
> in status_resync(). recovery_active is subtracted from curr_resync which
> will result in a value of zero for the first chunk of reshaped data, and
> the resulting read will report no reshape in progress.
> 
> To fix this, if "resync - recovery_active" is zero: force the value to
> be 4 so the code reports a resync in progress.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> ---
>  drivers/md/md.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8273ac5eef06..dbac63c8e35c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8022,10 +8022,18 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
>  		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
>  			/* Still cleaning up */
>  			resync = max_sectors;
> -	} else if (resync > max_sectors)
> +	} else if (resync > max_sectors) {
>  		resync = max_sectors;
> -	else
> +	} else {
>  		resync -= atomic_read(&mddev->recovery_active);
> +		if (!resync) {
> +			/*
> +			 * Resync has started, but if it's zero, ensure
> +			 * it is still reported, by forcing it to be 4
> +			 */
> +			resync = 4;

Where does this magic 4 come from?

  reply	other threads:[~2022-05-21 11:51 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 19:12 [PATCH v1 00/15] Bug fixes for mdadm tests Logan Gunthorpe
2022-05-19 19:12 ` [PATCH v1 01/15] md/raid5-log: Drop extern decorators for function prototypes Logan Gunthorpe
2022-05-21 11:36   ` Christoph Hellwig
2022-05-19 19:12 ` [PATCH v1 02/15] md/raid5-cache: Refactor r5c_is_writeback() to take a struct r5conf Logan Gunthorpe
2022-05-21 11:37   ` Christoph Hellwig
2022-05-19 19:12 ` [PATCH v1 03/15] md/raid5-cache: Refactor r5l_start() " Logan Gunthorpe
2022-05-21 11:37   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 04/15] md/raid5-cache: Refactor r5l_flush_stripe_to_raid() " Logan Gunthorpe
2022-05-21 11:38   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 05/15] md/raid5-cache: Refactor r5l_wake_reclaim() " Logan Gunthorpe
2022-05-21 11:38   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 06/15] md/raid5-cache: Refactor remaining functions to take a r5conf Logan Gunthorpe
2022-05-21 11:40   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 07/15] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request() Logan Gunthorpe
2022-05-21 11:41   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 08/15] md/raid5-cache: Pass the log through to r5c_finish_cache_stripe() Logan Gunthorpe
2022-05-21 11:42   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 09/15] md/raid5-cache: Don't pass conf to r5c_calculate_new_cp() Logan Gunthorpe
2022-05-21 11:42   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 10/15] md/raid5-cache: Take struct r5l_log in r5c_log_required_to_flush_cache() Logan Gunthorpe
2022-05-21 11:43   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 11/15] md/raid5: Ensure array is suspended for calls to log_exit() Logan Gunthorpe
2022-05-21 11:44   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 12/15] md/raid5-cache: Add RCU protection to conf->log accesses Logan Gunthorpe
2022-05-21 11:50   ` Christoph Hellwig
2022-05-22  7:31   ` Donald Buczek
2022-05-23  6:47     ` Song Liu
2022-05-23 18:15       ` Song Liu
2022-05-24 16:14         ` Logan Gunthorpe
2022-05-24 15:59       ` Logan Gunthorpe
2022-05-24 18:13         ` Song Liu
2022-05-22  7:32   ` Donald Buczek
2022-05-24 15:55     ` Logan Gunthorpe
2022-05-19 19:13 ` [PATCH v1 13/15] md/raid5-cache: Annotate pslot with __rcu notation Logan Gunthorpe
2022-05-21 11:51   ` Christoph Hellwig
2022-05-19 19:13 ` [PATCH v1 14/15] md: Ensure resync is reported after it starts Logan Gunthorpe
2022-05-21 11:51   ` Christoph Hellwig [this message]
2022-05-24 15:45     ` Logan Gunthorpe
2022-05-19 19:13 ` [PATCH v1 15/15] md: Notify sysfs sync_completed in md_reap_sync_thread() Logan Gunthorpe
2022-05-21 11:52   ` Christoph Hellwig
2022-05-23  6:28 ` [PATCH v1 00/15] Bug fixes for mdadm tests Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YojSVXNF1ITIhlUl@infradead.org \
    --to=hch@infradead.org \
    --cc=David.Sloan@eideticom.com \
    --cc=Martin.Oliveira@eideticom.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=sbates@raithlin.com \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).