linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	Song Liu <song@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Donald Buczek <buczek@molgen.mpg.de>,
	Guoqing Jiang <guoqing.jiang@linux.dev>, Xiao Ni <xni@redhat.com>,
	Stephen Bates <sbates@raithlin.com>,
	Martin Oliveira <Martin.Oliveira@eideticom.com>,
	David Sloan <David.Sloan@eideticom.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Christoph Hellwig <hch@lst.de>
Subject: [PATCH v3 09/11] md: Ensure resync is reported after it starts
Date: Thu,  2 Jun 2022 12:18:15 -0600	[thread overview]
Message-ID: <20220602181818.50729-10-logang@deltatee.com> (raw)
In-Reply-To: <20220602181818.50729-1-logang@deltatee.com>

The 07layouts test in mdadm fails on some systems. The failure
presents itself as the backup file not being removed before the next
layout is grown into:

  mdadm: /dev/md0: cannot create backup file /tmp/md-test-backup:
      File exists

This is because the background mdadm process, which is responsible for
cleaning up this backup file gets into an infinite loop waiting for
the reshape to start. mdadm checks the mdstat file if a reshape is
going and, if it is not, it waits for an event on the file or times
out in 5 seconds. On faster machines, the reshape may complete before
the 5 seconds times out, and thus the background mdadm process loops
waiting for a reshape to start that has already occurred.

mdadm reads the mdstat file to start, but mdstat does not report that the
reshape has begun, even though it has indeed begun. So the mdstat_wait()
call (in mdadm) which polls on the mdstat file won't ever return until
timing out.

The reason mdstat reports the reshape has started is due to an issue
in status_resync(). recovery_active is subtracted from curr_resync which
will result in a value of zero for the first chunk of reshaped data, and
the resulting read will report no reshape in progress.

To fix this, if "resync - recovery_active" is an overloaded value, force
the value to be MD_RESYNC_ACTIVE so the code reports a resync in progress.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0893029865eb..2be429874d18 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8022,10 +8022,20 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev)
 		if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
 			/* Still cleaning up */
 			resync = max_sectors;
-	} else if (resync > max_sectors)
+	} else if (resync > max_sectors) {
 		resync = max_sectors;
-	else
+	} else {
 		resync -= atomic_read(&mddev->recovery_active);
+		if (resync < MD_RESYNC_ACTIVE) {
+			/*
+			 * Resync has started, but the subtraction has
+			 * yielded one of the special values. Force it
+			 * to active to ensure the status reports an
+			 * active resync.
+			 */
+			resync = MD_RESYNC_ACTIVE;
+		}
+	}
 
 	if (resync == MD_RESYNC_NONE) {
 		if (test_bit(MD_RESYNCING_REMOTE, &mddev->recovery)) {
-- 
2.30.2


  parent reply	other threads:[~2022-06-02 18:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-02 18:18 [PATCH v3 00/11] Bug fixes for mdadm tests Logan Gunthorpe
2022-06-02 18:18 ` [PATCH v3 01/11] md/raid5-log: Drop extern decorators for function prototypes Logan Gunthorpe
2022-06-02 18:18 ` [PATCH v3 02/11] md/raid5-ppl: Drop unused argument from ppl_handle_flush_request() Logan Gunthorpe
2022-06-02 18:18 ` [PATCH v3 03/11] md/raid5: Ensure array is suspended for calls to log_exit() Logan Gunthorpe
2022-06-02 18:18 ` [PATCH v3 04/11] md/raid5-cache: Take mddev_lock in r5c_journal_mode_show() Logan Gunthorpe
2022-06-03  6:39   ` Christoph Hellwig
2022-06-03 21:47     ` Logan Gunthorpe
2022-06-02 18:18 ` [PATCH v3 05/11] md/raid5-cache: Drop RCU usage of conf->log Logan Gunthorpe
2022-06-03  6:43   ` Christoph Hellwig
2022-06-02 18:18 ` [PATCH v3 06/11] md/raid5-cache: Clear conf->log after finishing work Logan Gunthorpe
2022-06-03  6:43   ` Christoph Hellwig
2022-06-02 18:18 ` [PATCH v3 07/11] md/raid5-cache: Annotate pslot with __rcu notation Logan Gunthorpe
2022-06-02 18:18 ` [PATCH v3 08/11] md: Use enum for overloaded magic numbers used by mddev->curr_resync Logan Gunthorpe
2022-06-02 18:18 ` Logan Gunthorpe [this message]
2022-06-02 18:18 ` [PATCH v3 10/11] md: Notify sysfs sync_completed in md_reap_sync_thread() Logan Gunthorpe
2022-06-02 18:18 ` [PATCH v3 11/11] md/raid5-ppl: Fix argument order in bio_alloc_bioset() Logan Gunthorpe
2022-06-03  6:45   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220602181818.50729-10-logang@deltatee.com \
    --to=logang@deltatee.com \
    --cc=David.Sloan@eideticom.com \
    --cc=Martin.Oliveira@eideticom.com \
    --cc=buczek@molgen.mpg.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=sbates@raithlin.com \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).