linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <Bart.VanAssche@wdc.com>
To: "ming.lei@redhat.com" <ming.lei@redhat.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"jthumshirn@suse.de" <jthumshirn@suse.de>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"oleksandr@natalenko.name" <oleksandr@natalenko.name>,
	"hare@suse.com" <hare@suse.com>,
	"shli@kernel.org" <shli@kernel.org>
Subject: Re: [PATCH v4 1/7] md: Make md resync and reshape threads freezable
Date: Wed, 27 Sep 2017 03:12:47 +0000	[thread overview]
Message-ID: <1506481915.2822.9.camel@wdc.com> (raw)
In-Reply-To: <20170926145919.GC31449@ming.t460p>

On Tue, 2017-09-26 at 22:59 +0800, Ming Lei wrote:
> On Tue, Sep 26, 2017 at 02:42:07PM +0000, Bart Van Assche wrote:
> > On Tue, 2017-09-26 at 19:17 +0800, Ming Lei wrote:
> > > Just test this patch a bit and the following failure of freezing task
> > > is triggered during suspend: [ ... ]
> > 
> > What kernel version did you start from and which patches were applied on top of
> > that kernel? Only patch 1/7 or all seven patches? What storage configuration did
> 
> It is v4.14-rc1+, and top commit is 8d93c7a43157, with all your 7 patches
> applied.
> 
> > you use in your test and what command(s) did you use to trigger suspend?
> 
> Follows my pm test script:
> 
> 	#!/bin/sh
> 	
> 	echo check > /sys/block/md127/md/sync_action
> 	
> 	mkfs.ext4 -F /dev/md127
> 	
> 	mount /dev/md0 /mnt/data
> 	
> 	dd if=/dev/zero of=/mnt/data/d1.img bs=4k count=128k&
> 	
> 	echo 9 > /proc/sys/kernel/printk
> 	echo devices > /sys/power/pm_test
> 	echo mem > /sys/power/state
> 	
> 	wait
> 	umount /mnt/data
> 
> Storage setting:
> 
> 	sudo mdadm --create /dev/md/test /dev/sda /dev/sdb --level=1 --raid-devices=2
> 	both /dev/sda and /dev/sdb are virtio-scsi.

Thanks for the detailed reply. I have been able to reproduce the freeze failure
you reported. The output of SysRq-t learned me that the md reboot notifier was
waiting for the frozen md sync thread and that this caused the freeze failure. So
I have started testing the patch below instead of the patch at the start of this
e-mail thread:


Subject: [PATCH] md: Stop resync and reshape upon system freeze

Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before a hibernation image is created,
interrupt the md resync and reshape actions if the system is
being frozen. Note: the resync and reshape will restart after
the system is resumed and a message similar to the following
will appear in the system log:

md: md0: data-check interrupted.

---
 drivers/md/md.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08fcaebc61bd..1e9d50f7345e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,7 @@
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/freezer.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -8103,6 +8104,12 @@ void md_allow_write(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(md_allow_write);
 
+static bool md_sync_interrupted(struct mddev *mddev)
+{
+	return test_bit(MD_RECOVERY_INTR, &mddev->recovery) ||
+		freezing(current);
+}
+
 #define SYNC_MARKS	10
 #define	SYNC_MARK_STEP	(3*HZ)
 #define UPDATE_FREQUENCY (5*60*HZ)
@@ -8133,6 +8140,8 @@ void md_do_sync(struct md_thread *thread)
 		return;
 	}
 
+	set_freezable();
+
 	if (mddev_is_clustered(mddev)) {
 		ret = md_cluster_ops->resync_start(mddev);
 		if (ret)
@@ -8184,7 +8193,7 @@ void md_do_sync(struct md_thread *thread)
 		mddev->curr_resync = 2;
 
 	try_again:
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			goto skip;
 		for_each_mddev(mddev2, tmp) {
 			if (mddev2 == mddev)
@@ -8208,7 +8217,7 @@ void md_do_sync(struct md_thread *thread)
 				 * be caught by 'softlockup'
 				 */
 				prepare_to_wait(&resync_wait, &wq, TASK_INTERRUPTIBLE);
-				if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
+				if (!md_sync_interrupted(mddev) &&
 				    mddev2->curr_resync >= mddev->curr_resync) {
 					if (mddev2_minor != mddev2->md_minor) {
 						mddev2_minor = mddev2->md_minor;
@@ -8335,8 +8344,7 @@ void md_do_sync(struct md_thread *thread)
 			sysfs_notify(&mddev->kobj, NULL, "sync_completed");
 		}
 
-		while (j >= mddev->resync_max &&
-		       !test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+		while (j >= mddev->resync_max && !md_sync_interrupted(mddev)) {
 			/* As this condition is controlled by user-space,
 			 * we can block indefinitely, so use '_interruptible'
 			 * to avoid triggering warnings.
@@ -8348,7 +8356,7 @@ void md_do_sync(struct md_thread *thread)
 							     &mddev->recovery));
 		}
 
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			break;
 
 		sectors = mddev->pers->sync_request(mddev, j, &skipped);
@@ -8362,7 +8370,7 @@ void md_do_sync(struct md_thread *thread)
 			atomic_add(sectors, &mddev->recovery_active);
 		}
 
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			break;
 
 		j += sectors;
@@ -8394,7 +8402,7 @@ void md_do_sync(struct md_thread *thread)
 			last_mark = next;
 		}
 
-		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+		if (md_sync_interrupted(mddev))
 			break;
 
 		/*
@@ -8427,8 +8435,7 @@ void md_do_sync(struct md_thread *thread)
 		}
 	}
 	pr_info("md: %s: %s %s.\n",mdname(mddev), desc,
-		test_bit(MD_RECOVERY_INTR, &mddev->recovery)
-		? "interrupted" : "done");
+		md_sync_interrupted(mddev) ? "interrupted" : "done");
 	/*
 	 * this also signals 'finished resyncing' to md_stop
 	 */
@@ -8436,8 +8443,7 @@ void md_do_sync(struct md_thread *thread)
 	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
 
 	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
-	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
-	    mddev->curr_resync > 3) {
+	    !md_sync_interrupted(mddev) && mddev->curr_resync > 3) {
 		mddev->curr_resync_completed = mddev->curr_resync;
 		sysfs_notify(&mddev->kobj, NULL, "sync_completed");
 	}
@@ -8446,7 +8452,7 @@ void md_do_sync(struct md_thread *thread)
 	if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
 	    mddev->curr_resync > 3) {
 		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
-			if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+			if (md_sync_interrupted(mddev)) {
 				if (mddev->curr_resync >= mddev->recovery_cp) {
 					pr_debug("md: checkpointing %s of %s.\n",
 						 desc, mdname(mddev));
@@ -8461,7 +8467,7 @@ void md_do_sync(struct md_thread *thread)
 			} else
 				mddev->recovery_cp = MaxSector;
 		} else {
-			if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+			if (!md_sync_interrupted(mddev))
 				mddev->curr_resync = MaxSector;
 			rcu_read_lock();
 			rdev_for_each_rcu(rdev, mddev)
@@ -8483,7 +8489,7 @@ void md_do_sync(struct md_thread *thread)
 		      BIT(MD_SB_CHANGE_PENDING) | BIT(MD_SB_CHANGE_DEVS));
 
 	spin_lock(&mddev->lock);
-	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
+	if (!md_sync_interrupted(mddev)) {
 		/* We completed so min/max setting can be forgotten if used. */
 		if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
 			mddev->resync_min = 0;

  reply	other threads:[~2017-09-27  3:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170925202924.16603-1-bart.vanassche@wdc.com>
2017-09-25 20:29 ` [PATCH v4 1/7] md: Make md resync and reshape threads freezable Bart Van Assche
2017-09-25 23:04   ` Ming Lei
2017-09-25 23:09     ` Bart Van Assche
2017-09-26  4:01       ` Ming Lei
2017-09-26  8:13         ` Ming Lei
2017-09-26 14:40           ` Bart Van Assche
2017-09-26 15:02             ` Ming Lei
2017-09-26  6:06   ` Hannes Reinecke
2017-09-26 11:17   ` Ming Lei
2017-09-26 14:42     ` Bart Van Assche
2017-09-26 14:59       ` Ming Lei
2017-09-27  3:12         ` Bart Van Assche [this message]
2017-09-27 11:00           ` Ming Lei
2017-10-02 15:39             ` Bart Van Assche
2017-10-02 13:26   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1506481915.2822.9.camel@wdc.com \
    --to=bart.vanassche@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=oleksandr@natalenko.name \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).