linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush()
  2016-11-08 23:21 [md PATCH 0/3] Three unrelated md patches NeilBrown
  2016-11-08 23:21 ` [md PATCH 1/3] md/raid1: fix: IO can block resync indefinitely NeilBrown
@ 2016-11-08 23:21 ` NeilBrown
  2016-11-09 20:51   ` Shaohua Li
  2016-11-08 23:21 ` [md PATCH 3/3] md: define mddev flags, recovery flags and r1bio state bits using enums NeilBrown
  2 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-11-08 23:21 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

bitmap_flush() finishes with bitmap_update_sb(), and that finishes
with write_page(..., 1), so write_page() will wait for all writes
to complete.  So there is no point calling md_super_wait()
immediately afterwards.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/md.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f389d8abe137..1f1c7f007b68 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5472,7 +5472,6 @@ static void __md_stop_writes(struct mddev *mddev)
 	del_timer_sync(&mddev->safemode_timer);
 
 	bitmap_flush(mddev);
-	md_super_wait(mddev);
 
 	if (mddev->ro == 0 &&
 	    ((!mddev->in_sync && !mddev_is_clustered(mddev)) ||



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [md PATCH 0/3] Three unrelated md patches.
@ 2016-11-08 23:21 NeilBrown
  2016-11-08 23:21 ` [md PATCH 1/3] md/raid1: fix: IO can block resync indefinitely NeilBrown
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: NeilBrown @ 2016-11-08 23:21 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

The first is a minor bug fix, the other to are just code improvements.
I have some patches to add "failfast" functionality, and I want to get
the small clean-ups out of the way first.

Thanks,
NeilBrown


---

NeilBrown (3):
      md/raid1: fix: IO can block resync indefinitely
      md: remove md_super_wait() call after bitmap_flush()
      md: define mddev flags, recovery flags and r1bio state bits using enums


 drivers/md/md.c    |    1 -
 drivers/md/md.h    |   76 +++++++++++++++++++++++++---------------------------
 drivers/md/raid1.c |    2 +
 drivers/md/raid1.h |   18 +++++++-----
 4 files changed, 47 insertions(+), 50 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [md PATCH 1/3] md/raid1: fix: IO can block resync indefinitely
  2016-11-08 23:21 [md PATCH 0/3] Three unrelated md patches NeilBrown
@ 2016-11-08 23:21 ` NeilBrown
  2016-11-09 20:50   ` Shaohua Li
  2016-11-08 23:21 ` [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush() NeilBrown
  2016-11-08 23:21 ` [md PATCH 3/3] md: define mddev flags, recovery flags and r1bio state bits using enums NeilBrown
  2 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-11-08 23:21 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

While performing a resync/recovery, raid1 divides the
array space into three regions:
 - before the resync
 - at or shortly after the resync point
 - much further ahead of the resync point.

Write requests to the first or third do not need to wait.  Write
requests to the middle region do need to wait if resync requests are
pending.

If there are any active write requests in the middle region, resync
will wait for them.

Due to an accounting error, there is a small range of addresses,
between conf->next_resync and conf->start_next_window, where write
requests will *not* be blocked, but *will* be counted in the middle
region.  This can effectively block resync indefinitely if filesystem
writes happen repeatedly to this region.

As ->next_window_requests is incremented when the sector is before
  conf->start_next_window + NEXT_NORMALIO_DISTANCE
the same boundary should be used for determining when write requests
should wait.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index aac2a05cf8d1..9ac61cd85e5c 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -834,7 +834,7 @@ static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
 	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
 		if ((conf->mddev->curr_resync_completed
 		     >= bio_end_sector(bio)) ||
-		    (conf->next_resync + NEXT_NORMALIO_DISTANCE
+		    (conf->start_next_window + NEXT_NORMALIO_DISTANCE
 		     <= bio->bi_iter.bi_sector))
 			wait = false;
 		else



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [md PATCH 3/3] md: define mddev flags, recovery flags and r1bio state bits using enums
  2016-11-08 23:21 [md PATCH 0/3] Three unrelated md patches NeilBrown
  2016-11-08 23:21 ` [md PATCH 1/3] md/raid1: fix: IO can block resync indefinitely NeilBrown
  2016-11-08 23:21 ` [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush() NeilBrown
@ 2016-11-08 23:21 ` NeilBrown
  2016-11-09 20:52   ` Shaohua Li
  2 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-11-08 23:21 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

This is less error prone than using individual #defines.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/md.h    |   76 +++++++++++++++++++++++++---------------------------
 drivers/md/raid1.h |   18 +++++++-----
 2 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 21bd94fad96a..af6b33c30d2d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -192,6 +192,25 @@ extern int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
 				int is_new);
 struct md_cluster_info;
 
+enum mddev_flags {
+	MD_CHANGE_DEVS,		/* Some device status has changed */
+	MD_CHANGE_CLEAN,	/* transition to or from 'clean' */
+	MD_CHANGE_PENDING,	/* switch from 'clean' to 'active' in progress */
+	MD_ARRAY_FIRST_USE,	/* First use of array, needs initialization */
+	MD_CLOSING,		/* If set, we are closing the array, do not open
+				 * it then */
+	MD_JOURNAL_CLEAN,	/* A raid with journal is already clean */
+	MD_HAS_JOURNAL,		/* The raid array has journal feature set */
+	MD_RELOAD_SB,		/* Reload the superblock because another node
+				 * updated it.
+				 */
+	MD_CLUSTER_RESYNC_LOCKED, /* cluster raid only, which means node
+				   * already took resync lock, need to
+				   * release the lock */
+};
+#define MD_UPDATE_SB_FLAGS (BIT(MD_CHANGE_DEVS) | \
+			    BIT(MD_CHANGE_CLEAN) | \
+			    BIT(MD_CHANGE_PENDING))	/* If these are set, md_update_sb needed */
 struct mddev {
 	void				*private;
 	struct md_personality		*pers;
@@ -199,21 +218,6 @@ struct mddev {
 	int				md_minor;
 	struct list_head		disks;
 	unsigned long			flags;
-#define MD_CHANGE_DEVS	0	/* Some device status has changed */
-#define MD_CHANGE_CLEAN 1	/* transition to or from 'clean' */
-#define MD_CHANGE_PENDING 2	/* switch from 'clean' to 'active' in progress */
-#define MD_UPDATE_SB_FLAGS (1 | 2 | 4)	/* If these are set, md_update_sb needed */
-#define MD_ARRAY_FIRST_USE 3    /* First use of array, needs initialization */
-#define MD_CLOSING	4	/* If set, we are closing the array, do not open
-				 * it then */
-#define MD_JOURNAL_CLEAN 5	/* A raid with journal is already clean */
-#define MD_HAS_JOURNAL	6	/* The raid array has journal feature set */
-#define MD_RELOAD_SB	7	/* Reload the superblock because another node
-				 * updated it.
-				 */
-#define MD_CLUSTER_RESYNC_LOCKED 8 /* cluster raid only, which means node
-				    * already took resync lock, need to
-				    * release the lock */
 
 	int				suspended;
 	atomic_t			active_io;
@@ -307,31 +311,6 @@ struct mddev {
 	int				parallel_resync;
 
 	int				ok_start_degraded;
-	/* recovery/resync flags
-	 * NEEDED:   we might need to start a resync/recover
-	 * RUNNING:  a thread is running, or about to be started
-	 * SYNC:     actually doing a resync, not a recovery
-	 * RECOVER:  doing recovery, or need to try it.
-	 * INTR:     resync needs to be aborted for some reason
-	 * DONE:     thread is done and is waiting to be reaped
-	 * REQUEST:  user-space has requested a sync (used with SYNC)
-	 * CHECK:    user-space request for check-only, no repair
-	 * RESHAPE:  A reshape is happening
-	 * ERROR:    sync-action interrupted because io-error
-	 *
-	 * If neither SYNC or RESHAPE are set, then it is a recovery.
-	 */
-#define	MD_RECOVERY_RUNNING	0
-#define	MD_RECOVERY_SYNC	1
-#define	MD_RECOVERY_RECOVER	2
-#define	MD_RECOVERY_INTR	3
-#define	MD_RECOVERY_DONE	4
-#define	MD_RECOVERY_NEEDED	5
-#define	MD_RECOVERY_REQUESTED	6
-#define	MD_RECOVERY_CHECK	7
-#define MD_RECOVERY_RESHAPE	8
-#define	MD_RECOVERY_FROZEN	9
-#define	MD_RECOVERY_ERROR	10
 
 	unsigned long			recovery;
 	/* If a RAID personality determines that recovery (of a particular
@@ -445,6 +424,23 @@ struct mddev {
 	unsigned int			good_device_nr;	/* good device num within cluster raid */
 };
 
+enum recovery_flags {
+	/*
+	 * If neither SYNC or RESHAPE are set, then it is a recovery.
+	 */
+	MD_RECOVERY_RUNNING,	/* a thread is running, or about to be started */
+	MD_RECOVERY_SYNC,	/* actually doing a resync, not a recovery */
+	MD_RECOVERY_RECOVER,	/* doing recovery, or need to try it. */
+	MD_RECOVERY_INTR,	/* resync needs to be aborted for some reason */
+	MD_RECOVERY_DONE,	/* thread is done and is waiting to be reaped */
+	MD_RECOVERY_NEEDED,	/* we might need to start a resync/recover */
+	MD_RECOVERY_REQUESTED,	/* user-space has requested a sync (used with SYNC) */
+	MD_RECOVERY_CHECK,	/* user-space request for check-only, no repair */
+	MD_RECOVERY_RESHAPE,	/* A reshape is happening */
+	MD_RECOVERY_FROZEN,	/* User request to abort, and not restart, any action */
+	MD_RECOVERY_ERROR,	/* sync-action interrupted because io-error */
+};
+
 static inline int __must_check mddev_lock(struct mddev *mddev)
 {
 	return mutex_lock_interruptible(&mddev->reconfig_mutex);
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 61c39b390cd8..5ec19449779d 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -161,14 +161,15 @@ struct r1bio {
 };
 
 /* bits for r1bio.state */
-#define	R1BIO_Uptodate	0
-#define	R1BIO_IsSync	1
-#define	R1BIO_Degraded	2
-#define	R1BIO_BehindIO	3
+enum r1bio_state {
+	R1BIO_Uptodate,
+	R1BIO_IsSync,
+	R1BIO_Degraded,
+	R1BIO_BehindIO,
 /* Set ReadError on bios that experience a readerror so that
  * raid1d knows what to do with them.
  */
-#define R1BIO_ReadError 4
+	R1BIO_ReadError,
 /* For write-behind requests, we call bi_end_io when
  * the last non-write-behind device completes, providing
  * any write was successful.  Otherwise we call when
@@ -176,10 +177,11 @@ struct r1bio {
  * with failure when last write completes (and all failed).
  * Record that bi_end_io was called with this flag...
  */
-#define	R1BIO_Returned 6
+	R1BIO_Returned,
 /* If a write for this request means we can clear some
  * known-bad-block records, we set this flag
  */
-#define	R1BIO_MadeGood 7
-#define	R1BIO_WriteError 8
+	R1BIO_MadeGood,
+	R1BIO_WriteError,
+};
 #endif



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [md PATCH 1/3] md/raid1: fix: IO can block resync indefinitely
  2016-11-08 23:21 ` [md PATCH 1/3] md/raid1: fix: IO can block resync indefinitely NeilBrown
@ 2016-11-09 20:50   ` Shaohua Li
  0 siblings, 0 replies; 9+ messages in thread
From: Shaohua Li @ 2016-11-09 20:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Nov 09, 2016 at 10:21:32AM +1100, Neil Brown wrote:
> While performing a resync/recovery, raid1 divides the
> array space into three regions:
>  - before the resync
>  - at or shortly after the resync point
>  - much further ahead of the resync point.
> 
> Write requests to the first or third do not need to wait.  Write
> requests to the middle region do need to wait if resync requests are
> pending.
> 
> If there are any active write requests in the middle region, resync
> will wait for them.
> 
> Due to an accounting error, there is a small range of addresses,
> between conf->next_resync and conf->start_next_window, where write
> requests will *not* be blocked, but *will* be counted in the middle
> region.  This can effectively block resync indefinitely if filesystem
> writes happen repeatedly to this region.

Good catch, thanks Neil! 
> As ->next_window_requests is incremented when the sector is before

I changed 'before' to 'after' when applying this patch

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush()
  2016-11-08 23:21 ` [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush() NeilBrown
@ 2016-11-09 20:51   ` Shaohua Li
  2016-11-10  0:57     ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Shaohua Li @ 2016-11-09 20:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Nov 09, 2016 at 10:21:32AM +1100, Neil Brown wrote:
> bitmap_flush() finishes with bitmap_update_sb(), and that finishes
> with write_page(..., 1), so write_page() will wait for all writes
> to complete.  So there is no point calling md_super_wait()
> immediately afterwards.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  drivers/md/md.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index f389d8abe137..1f1c7f007b68 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5472,7 +5472,6 @@ static void __md_stop_writes(struct mddev *mddev)
>  	del_timer_sync(&mddev->safemode_timer);
>  
>  	bitmap_flush(mddev);
> -	md_super_wait(mddev);

bitmap_flush() could be null if there is no bitmap, is this safe?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [md PATCH 3/3] md: define mddev flags, recovery flags and r1bio state bits using enums
  2016-11-08 23:21 ` [md PATCH 3/3] md: define mddev flags, recovery flags and r1bio state bits using enums NeilBrown
@ 2016-11-09 20:52   ` Shaohua Li
  0 siblings, 0 replies; 9+ messages in thread
From: Shaohua Li @ 2016-11-09 20:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Nov 09, 2016 at 10:21:33AM +1100, Neil Brown wrote:
> This is less error prone than using individual #defines.

applied, thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush()
  2016-11-09 20:51   ` Shaohua Li
@ 2016-11-10  0:57     ` NeilBrown
  2016-11-10  1:13       ` Shaohua Li
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-11-10  0:57 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

On Thu, Nov 10 2016, Shaohua Li wrote:

> On Wed, Nov 09, 2016 at 10:21:32AM +1100, Neil Brown wrote:
>> bitmap_flush() finishes with bitmap_update_sb(), and that finishes
>> with write_page(..., 1), so write_page() will wait for all writes
>> to complete.  So there is no point calling md_super_wait()
>> immediately afterwards.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>  drivers/md/md.c |    1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index f389d8abe137..1f1c7f007b68 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -5472,7 +5472,6 @@ static void __md_stop_writes(struct mddev *mddev)
>>  	del_timer_sync(&mddev->safemode_timer);
>>  
>>  	bitmap_flush(mddev);
>> -	md_super_wait(mddev);
>
> bitmap_flush() could be null if there is no bitmap, is this safe?

Good question.
If there is no bitmap, then all metadata updates (both superblock
and bad-block-list) are synchronous in md_update_sb(), which is always
called under ->reconfig_mutex and so which cannot race with this code.

So yes, it is safe.  That md_super_wait() was only ever intended to wait
for things that bitmap_flush() might have flushed, so it should have
been inside that function.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush()
  2016-11-10  0:57     ` NeilBrown
@ 2016-11-10  1:13       ` Shaohua Li
  0 siblings, 0 replies; 9+ messages in thread
From: Shaohua Li @ 2016-11-10  1:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Nov 10, 2016 at 11:57:35AM +1100, Neil Brown wrote:
> On Thu, Nov 10 2016, Shaohua Li wrote:
> 
> > On Wed, Nov 09, 2016 at 10:21:32AM +1100, Neil Brown wrote:
> >> bitmap_flush() finishes with bitmap_update_sb(), and that finishes
> >> with write_page(..., 1), so write_page() will wait for all writes
> >> to complete.  So there is no point calling md_super_wait()
> >> immediately afterwards.
> >> 
> >> Signed-off-by: NeilBrown <neilb@suse.com>
> >> ---
> >>  drivers/md/md.c |    1 -
> >>  1 file changed, 1 deletion(-)
> >> 
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index f389d8abe137..1f1c7f007b68 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -5472,7 +5472,6 @@ static void __md_stop_writes(struct mddev *mddev)
> >>  	del_timer_sync(&mddev->safemode_timer);
> >>  
> >>  	bitmap_flush(mddev);
> >> -	md_super_wait(mddev);
> >
> > bitmap_flush() could be null if there is no bitmap, is this safe?
> 
> Good question.
> If there is no bitmap, then all metadata updates (both superblock
> and bad-block-list) are synchronous in md_update_sb(), which is always
> called under ->reconfig_mutex and so which cannot race with this code.
> 
> So yes, it is safe.  That md_super_wait() was only ever intended to wait
> for things that bitmap_flush() might have flushed, so it should have
> been inside that function.

Ah, yes, md_super_wait follows all md_super_write. It should be safe.
Applied, thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-11-10  1:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-08 23:21 [md PATCH 0/3] Three unrelated md patches NeilBrown
2016-11-08 23:21 ` [md PATCH 1/3] md/raid1: fix: IO can block resync indefinitely NeilBrown
2016-11-09 20:50   ` Shaohua Li
2016-11-08 23:21 ` [md PATCH 2/3] md: remove md_super_wait() call after bitmap_flush() NeilBrown
2016-11-09 20:51   ` Shaohua Li
2016-11-10  0:57     ` NeilBrown
2016-11-10  1:13       ` Shaohua Li
2016-11-08 23:21 ` [md PATCH 3/3] md: define mddev flags, recovery flags and r1bio state bits using enums NeilBrown
2016-11-09 20:52   ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).