[PATCH md] 6 Little md patches for 2.6.27 for review

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH md] 6 Little md patches for 2.6.27 for review
@ 2008-08-05  6:17 NeilBrown
  2008-08-05  6:17 ` [PATCH md] Make writes to md/safe_mode_delay immediately effective NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2008-08-05  6:17 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

Hi,
 The following 6 patches are in my for-next tree and will probably
 be sent to Linus shortly.

 Any review always appreciated.

Thanks.

  git://neil.brown.name/md for-next

NeilBrown (6):
  Make writes to md/safe_mode_delay immediately effective.
  Restore force switch of md array to readonly at reboot time.
  Fail safely when trying to grow an array with a write-intent bitmap.
  Don't let a blocked_rdev interfere with read request in raid5/6
  Allow faulty devices to be removed from a readonly array.
  Allow raid10 resync to happening in larger chunks.

 drivers/md/md.c     |   29 +++++++++++++++++++++++++++--
 drivers/md/raid10.c |    9 +++++----
 drivers/md/raid5.c  |   32 ++++++++++++++++++++++++--------
 3 files changed, 56 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH md] Make writes to md/safe_mode_delay immediately effective.
  2008-08-05  6:17 [PATCH md] 6 Little md patches for 2.6.27 for review NeilBrown
@ 2008-08-05  6:17 ` NeilBrown
  2008-08-05  6:17   ` [PATCH md] Restore force switch of md array to readonly at reboot time NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2008-08-05  6:17 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

If we reduce the 'safe_mode_delay', it could still wait for the old
delay to completely expire before doing anything about safe_mode.
Thus the effect if the change is delayed.

To make the effect more immediate, run the timeout function
immediately if the delay was reduced.  This may cause it to run
slightly earlier that required, but that is the safer option.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/md.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c7aae66..48afe4f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2393,6 +2393,8 @@ static void analyze_sbs(mddev_t * mddev)
 
 }
 
+static void md_safemode_timeout(unsigned long data);
+
 static ssize_t
 safe_delay_show(mddev_t *mddev, char *page)
 {
@@ -2432,9 +2434,12 @@ safe_delay_store(mddev_t *mddev, const char *cbuf, size_t len)
 	if (msec == 0)
 		mddev->safemode_delay = 0;
 	else {
+		unsigned long old_delay = mddev->safemode_delay;
 		mddev->safemode_delay = (msec*HZ)/1000;
 		if (mddev->safemode_delay == 0)
 			mddev->safemode_delay = 1;
+		if (mddev->safemode_delay < old_delay)
+			md_safemode_timeout((unsigned long)mddev);
 	}
 	return len;
 }
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH md] Restore force switch of md array to readonly at reboot time.
  2008-08-05  6:17 ` [PATCH md] Make writes to md/safe_mode_delay immediately effective NeilBrown
@ 2008-08-05  6:17   ` NeilBrown
  2008-08-05  6:17     ` [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2008-08-05  6:17 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

A recent patch allowed do_md_stop to know whether it was being called
via an ioctl or not, and thus where to allow for an extra open file
descriptor when checking if it is in use.
This broke then switch to readonly performed by the shutdown notifier,
which needs to work even when the array is still (apparently) active
(as md doesn't get told when the filesystem becomes readonly).

So restore this feature by pretending that there can be lots of
file descriptors open, but we still want do_md_stop to switch to
readonly.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/md.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 48afe4f..8d11cd1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6237,7 +6237,11 @@ static int md_notify_reboot(struct notifier_block *this,

 		for_each_mddev(mddev, tmp)
 			if (mddev_trylock(mddev)) {
-				do_md_stop (mddev, 1, 0);
+				/* Force a switch to readonly even array
+				 * appears to still be in use.  Hence
+				 * the '100'.
+				 */
+				do_md_stop (mddev, 1, 100);
 				mddev_unlock(mddev);
 			}
 		/*
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap.
  2008-08-05  6:17   ` [PATCH md] Restore force switch of md array to readonly at reboot time NeilBrown
@ 2008-08-05  6:17     ` NeilBrown
  2008-08-05  6:17       ` [PATCH md] Don't let a blocked_rdev interfere with read request in raid5/6 NeilBrown
  2008-08-05 18:12       ` [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap Dan Williams
  0 siblings, 2 replies; 11+ messages in thread
From: NeilBrown @ 2008-08-05  6:17 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

We cannot currently change the size of a write-intent bitmap.
So if we change the size of an array which has such a bitmap, it
tries to set bits beyond the end of the bitmap.

For now, simply reject any request to change the size of an array
which has a bitmap.  mdadm can remove the bitmap and add a new one
after the array has changed size.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/md.c    |    5 +++++
 drivers/md/raid5.c |    3 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8d11cd1..6eb9545 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4639,6 +4639,11 @@ static int update_size(mddev_t *mddev, sector_t num_sectors)
 	 */
 	if (mddev->sync_thread)
 		return -EBUSY;
+	if (mddev->bitmap)
+		/* Sorry, cannot grow a bitmap yet, just remove it,
+		 * grow, and re-add.
+		 */
+		return -EBUSY;
 	rdev_for_each(rdev, tmp, mddev) {
 		sector_t avail;
 		avail = rdev->size * 2;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 40e9396..17e0953 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4446,6 +4446,9 @@ static int raid5_check_reshape(mddev_t *mddev)
 		return -EINVAL; /* Cannot shrink array or change level yet */
 	if (mddev->delta_disks == 0)
 		return 0; /* nothing to do */
+	if (mddev->bitmap)
+		/* Cannot grow a bitmap yet */
+		return -EBUSY;

 	/* Can only proceed if there are plenty of stripe_heads.
 	 * We need a minimum of one full stripe,, and for sensible progress
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH md] Don't let a blocked_rdev interfere with read request in raid5/6
  2008-08-05  6:17     ` [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap NeilBrown
@ 2008-08-05  6:17       ` NeilBrown
  2008-08-05  6:17         ` [PATCH md] Allow faulty devices to be removed from a readonly array NeilBrown
  2008-08-05 18:12       ` [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap Dan Williams
  1 sibling, 1 reply; 11+ messages in thread
From: NeilBrown @ 2008-08-05  6:17 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

When we have externally managed metadata, we need to mark a failed
device as 'Blocked' and not allow any writes until that device
have been marked as faulty in the metadata and the Blocked flag has
been removed.

However it is perfectly OK to allow read requests when there is a
Blocked device, and with a readonly array, there may not be any
metadata-handler watching for blocked devices.

So in raid5/raid6 only allow a Blocked device to interfere with
Write request or resync.  Read requests go through untouched.

raid1 and raid10 already differentiate between read and write
properly.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid5.c |   29 +++++++++++++++++++++--------
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 17e0953..224de02 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2568,10 +2568,10 @@ static bool handle_stripe5(struct stripe_head *sh)
 		if (dev->written)
 			s.written++;
 		rdev = rcu_dereference(conf->disks[i].rdev);
-		if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
+		if (blocked_rdev == NULL &&
+		    rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
 			blocked_rdev = rdev;
 			atomic_inc(&rdev->nr_pending);
-			break;
 		}
 		if (!rdev || !test_bit(In_sync, &rdev->flags)) {
 			/* The ReadError flag will just be confusing now */
@@ -2588,8 +2588,14 @@ static bool handle_stripe5(struct stripe_head *sh)
 	rcu_read_unlock();
 
 	if (unlikely(blocked_rdev)) {
-		set_bit(STRIPE_HANDLE, &sh->state);
-		goto unlock;
+		if (s.syncing || s.expanding || s.expanded ||
+		    s.to_write || s.written) {
+			set_bit(STRIPE_HANDLE, &sh->state);
+			goto unlock;
+		}
+		/* There is nothing for the blocked_rdev to block */
+		rdev_dec_pending(blocked_rdev, conf->mddev);
+		blocked_rdev = NULL;
 	}
 
 	if (s.to_fill && !test_bit(STRIPE_BIOFILL_RUN, &sh->state)) {
@@ -2832,10 +2838,10 @@ static bool handle_stripe6(struct stripe_head *sh, struct page *tmp_page)
 		if (dev->written)
 			s.written++;
 		rdev = rcu_dereference(conf->disks[i].rdev);
-		if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
+		if (blocked_rdev == NULL &&
+		    rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
 			blocked_rdev = rdev;
 			atomic_inc(&rdev->nr_pending);
-			break;
 		}
 		if (!rdev || !test_bit(In_sync, &rdev->flags)) {
 			/* The ReadError flag will just be confusing now */
@@ -2853,9 +2859,16 @@ static bool handle_stripe6(struct stripe_head *sh, struct page *tmp_page)
 	rcu_read_unlock();
 
 	if (unlikely(blocked_rdev)) {
-		set_bit(STRIPE_HANDLE, &sh->state);
-		goto unlock;
+		if (s.syncing || s.expanding || s.expanded ||
+		    s.to_write || s.written) {
+			set_bit(STRIPE_HANDLE, &sh->state);
+			goto unlock;
+		}
+		/* There is nothing for the blocked_rdev to block */
+		rdev_dec_pending(blocked_rdev, conf->mddev);
+		blocked_rdev = NULL;
 	}
+
 	pr_debug("locked=%d uptodate=%d to_read=%d"
 	       " to_write=%d failed=%d failed_num=%d,%d\n",
 	       s.locked, s.uptodate, s.to_read, s.to_write, s.failed,
-- 
1.5.6.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH md] Allow faulty devices to be removed from a readonly array.
  2008-08-05  6:17       ` [PATCH md] Don't let a blocked_rdev interfere with read request in raid5/6 NeilBrown
@ 2008-08-05  6:17         ` NeilBrown
  2008-08-05  6:17           ` [PATCH md] Allow raid10 resync to happening in larger chunks NeilBrown
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2008-08-05  6:17 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

Removing faulty devices from an array is a two stage process.
First the device is moved from being a part of the active array
to being similar to a spare device.  Then it can be removed
by a request from user space.

The first step is currently not performed for read-only arrays,
so the second step can never succeed.

So allow readonly arrays to remove failed devices (which aren't
blocked).

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/md.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6eb9545..25b893e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6003,7 +6003,7 @@ static int remove_and_add_spares(mddev_t *mddev)
 			}
 		}

-	if (mddev->degraded) {
+	if (mddev->degraded && ! mddev->ro) {
 		rdev_for_each(rdev, rtmp, mddev) {
 			if (rdev->raid_disk >= 0 &&
 			    !test_bit(In_sync, &rdev->flags) &&
@@ -6077,6 +6077,8 @@ void md_check_recovery(mddev_t *mddev)
 		flush_signals(current);
 	}

+	if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+		return;
 	if ( ! (
 		(mddev->flags && !mddev->external) ||
 		test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
@@ -6090,6 +6092,15 @@ void md_check_recovery(mddev_t *mddev)
 	if (mddev_trylock(mddev)) {
 		int spares = 0;

+		if (mddev->ro) {
+			/* Only thing we do on a ro array is remove
+			 * failed devices.
+			 */
+			remove_and_add_spares(mddev);
+			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+			goto unlock;
+		}
+
 		if (!mddev->external) {
 			int did_change = 0;
 			spin_lock_irq(&mddev->write_lock);
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH md] Allow raid10 resync to happening in larger chunks.
  2008-08-05  6:17         ` [PATCH md] Allow faulty devices to be removed from a readonly array NeilBrown
@ 2008-08-05  6:17           ` NeilBrown
  2008-08-06  9:01             ` Keld Jørn Simonsen
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2008-08-05  6:17 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown, Keld Jørn Simonsen

The raid10 resync/recovery code currently limits the amount of
in-flight resync IO to 2Meg.  This was copied from raid1 where
it seems quite adequate.  However for raid10, some layouts require
a bit of seeking to perform a resync, and allowing a larger buffer
size means that the seeking can be significantly reduced.

There is probably no real need to limit the amount of in-flight
IO at all.  Any shortage of memory will naturally reduce the
amount of buffer space available down to a set minimum, and any
concurrent normal IO will quickly cause resync IO to back off.

The only problem would be that normal IO has to wait for all resync IO
to finish, so a very large amount of resync IO could cause unpleasant
latency when normal IO starts up.

So: increase RESYNC_DEPTH to allow 32Meg of buffer (if memory is
available) which seems to be a good amount.  Also reduce the amount
of memory reserved as there is no need to keep 2Meg just for resync if
memory is tight.

Thanks to Keld for the suggestion.

Cc: Keld Jørn Simonsen <keld@dkuug.dk>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid10.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d41bebb..e34cd0e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -76,11 +76,13 @@ static void r10bio_pool_free(void *r10_bio, void *data)
 	kfree(r10_bio);
 }

+/* Maximum size of each resync request */
 #define RESYNC_BLOCK_SIZE (64*1024)
-//#define RESYNC_BLOCK_SIZE PAGE_SIZE
-#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
-#define RESYNC_WINDOW (2048*1024)
+/* amount of memory to reserve for resync requests */
+#define RESYNC_WINDOW (1024*1024)
+/* maximum number of concurrent requests, memory permitting */
+#define RESYNC_DEPTH (32*1024*1024/RESYNC_BLOCK_SIZE)

 /*
  * When performing a resync, we need to read and compare, so
@@ -690,7 +692,6 @@ static int flush_pending_writes(conf_t *conf)
  *    there is no normal IO happeing.  It must arrange to call
  *    lower_barrier when the particular background IO completes.
  */
-#define RESYNC_DEPTH 32

 static void raise_barrier(conf_t *conf, int force)
 {
-- 
1.5.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH md] Allow raid10 resync to happening in larger chunks.
  2008-08-05  6:17           ` [PATCH md] Allow raid10 resync to happening in larger chunks NeilBrown
@ 2008-08-06  9:01             ` Keld Jørn Simonsen
       [not found]               ` <200808062256.m76MupNY089428@sia.dkuug.dk>
  2008-08-11  0:37               ` Neil Brown
  0 siblings, 2 replies; 11+ messages in thread
From: Keld Jørn Simonsen @ 2008-08-06  9:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Neil made this patch based on my patch to speed up raid10 resync.

It is a bit different, although it messes with exactly the two cinstants
that I also changed. One difference is that Neil intiatlly only allocates
1 MiB for buffers while my patch allocates 32 MiB. For the patch to work 
as intended it is essential that something like 32 MiB be available for
buffers. I do not see how that is done in Neil's case, but then I do not
know the code so well.  So how does it work, Neil?

Has your patch been tested, Neil?

Anyway if this i a difference between 32 MiB being available or not, I
think it is important that it be available at the start of the process
and available for the whole duration of the process. Is it a concern of
whether 32 Mib buffers be available? My take is that if you are running
raid, then you probably always have quite some memory. 

best regards
keld


On Tue, Aug 05, 2008 at 04:17:34PM +1000, NeilBrown wrote:
> The raid10 resync/recovery code currently limits the amount of
> in-flight resync IO to 2Meg.  This was copied from raid1 where
> it seems quite adequate.  However for raid10, some layouts require
> a bit of seeking to perform a resync, and allowing a larger buffer
> size means that the seeking can be significantly reduced.
> 
> There is probably no real need to limit the amount of in-flight
> IO at all.  Any shortage of memory will naturally reduce the
> amount of buffer space available down to a set minimum, and any
> concurrent normal IO will quickly cause resync IO to back off.
> 
> The only problem would be that normal IO has to wait for all resync IO
> to finish, so a very large amount of resync IO could cause unpleasant
> latency when normal IO starts up.
> 
> So: increase RESYNC_DEPTH to allow 32Meg of buffer (if memory is
> available) which seems to be a good amount.  Also reduce the amount
> of memory reserved as there is no need to keep 2Meg just for resync if
> memory is tight.
> 
> Thanks to Keld for the suggestion.
> 
> Cc: Keld Jørn Simonsen <keld@dkuug.dk>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  drivers/md/raid10.c |    9 +++++----
>  1 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index d41bebb..e34cd0e 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -76,11 +76,13 @@ static void r10bio_pool_free(void *r10_bio, void *data)
>  	kfree(r10_bio);
>  }
>  
> +/* Maximum size of each resync request */
>  #define RESYNC_BLOCK_SIZE (64*1024)
> -//#define RESYNC_BLOCK_SIZE PAGE_SIZE
> -#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
> -#define RESYNC_WINDOW (2048*1024)
> +/* amount of memory to reserve for resync requests */
> +#define RESYNC_WINDOW (1024*1024)
> +/* maximum number of concurrent requests, memory permitting */
> +#define RESYNC_DEPTH (32*1024*1024/RESYNC_BLOCK_SIZE)
>  
>  /*
>   * When performing a resync, we need to read and compare, so
> @@ -690,7 +692,6 @@ static int flush_pending_writes(conf_t *conf)
>   *    there is no normal IO happeing.  It must arrange to call
>   *    lower_barrier when the particular background IO completes.
>   */
> -#define RESYNC_DEPTH 32
>  
>  static void raise_barrier(conf_t *conf, int force)
>  {
> -- 
> 1.5.6.3
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <200808062256.m76MupNY089428@sia.dkuug.dk>]

* Re: [PATCH md] Allow raid10 resync to happening in larger chunks.
       [not found]               ` <200808062256.m76MupNY089428@sia.dkuug.dk>
@ 2008-08-07  0:45                 ` 'Keld Jørn Simonsen'
  0 siblings, 0 replies; 11+ messages in thread
From: 'Keld Jørn Simonsen' @ 2008-08-07  0:45 UTC (permalink / raw)
  To: Guy Watkins; +Cc: 'NeilBrown', linux-raid

On Wed, Aug 06, 2008 at 06:56:40PM -0400, Guy Watkins wrote:
> } -----Original Message-----
> } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> } owner@vger.kernel.org] On Behalf Of Keld Jørn Simonsen
> } Sent: Wednesday, August 06, 2008 5:02 AM
> } To: NeilBrown
> } Cc: linux-raid@vger.kernel.org
> } Subject: Re: [PATCH md] Allow raid10 resync to happening in larger chunks.
> } 
> } Neil made this patch based on my patch to speed up raid10 resync.
> } 
> } It is a bit different, although it messes with exactly the two cinstants
> } that I also changed. One difference is that Neil intiatlly only allocates
> } 1 MiB for buffers while my patch allocates 32 MiB. For the patch to work
> } as intended it is essential that something like 32 MiB be available for
> } buffers. I do not see how that is done in Neil's case, but then I do not
> } know the code so well.  So how does it work, Neil?
> } 
> } Has your patch been tested, Neil?
> } 
> } Anyway if this i a difference between 32 MiB being available or not, I
> } think it is important that it be available at the start of the process
> } and available for the whole duration of the process. Is it a concern of
> } whether 32 Mib buffers be available? My take is that if you are running
> } raid, then you probably always have quite some memory.
> 
> Bad assumption!  My Linux firewall/router has 64MB total and works really
> well.  I have 2 disks in a RAID1.

well, well, you do not use the raid10-driver, then.

> Maybe the amount of memory could be based on a percentage of total RAM?

Anyway, if it works then Neil's patch is probably better than mine, as I
think it will aso run if 32 MiB is not availble.

Best regards
keld

> Guy
> 
> } 
> } best regards
> } keld
> } 
> } 
> } On Tue, Aug 05, 2008 at 04:17:34PM +1000, NeilBrown wrote:
> } > The raid10 resync/recovery code currently limits the amount of
> } > in-flight resync IO to 2Meg.  This was copied from raid1 where
> } > it seems quite adequate.  However for raid10, some layouts require
> } > a bit of seeking to perform a resync, and allowing a larger buffer
> } > size means that the seeking can be significantly reduced.
> } >
> } > There is probably no real need to limit the amount of in-flight
> } > IO at all.  Any shortage of memory will naturally reduce the
> } > amount of buffer space available down to a set minimum, and any
> } > concurrent normal IO will quickly cause resync IO to back off.
> } >
> } > The only problem would be that normal IO has to wait for all resync IO
> } > to finish, so a very large amount of resync IO could cause unpleasant
> } > latency when normal IO starts up.
> } >
> } > So: increase RESYNC_DEPTH to allow 32Meg of buffer (if memory is
> } > available) which seems to be a good amount.  Also reduce the amount
> } > of memory reserved as there is no need to keep 2Meg just for resync if
> } > memory is tight.
> } >
> } > Thanks to Keld for the suggestion.
> } >
> } > Cc: Keld Jørn Simonsen <keld@dkuug.dk>
> } > Signed-off-by: NeilBrown <neilb@suse.de>
> } > ---
> } >  drivers/md/raid10.c |    9 +++++----
> } >  1 files changed, 5 insertions(+), 4 deletions(-)
> } >
> } > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> } > index d41bebb..e34cd0e 100644
> } > --- a/drivers/md/raid10.c
> } > +++ b/drivers/md/raid10.c
> } > @@ -76,11 +76,13 @@ static void r10bio_pool_free(void *r10_bio, void
> } *data)
> } >  	kfree(r10_bio);
> } >  }
> } >
> } > +/* Maximum size of each resync request */
> } >  #define RESYNC_BLOCK_SIZE (64*1024)
> } > -//#define RESYNC_BLOCK_SIZE PAGE_SIZE
> } > -#define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
> } >  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
> } > -#define RESYNC_WINDOW (2048*1024)
> } > +/* amount of memory to reserve for resync requests */
> } > +#define RESYNC_WINDOW (1024*1024)
> } > +/* maximum number of concurrent requests, memory permitting */
> } > +#define RESYNC_DEPTH (32*1024*1024/RESYNC_BLOCK_SIZE)
> } >
> } >  /*
> } >   * When performing a resync, we need to read and compare, so
> } > @@ -690,7 +692,6 @@ static int flush_pending_writes(conf_t *conf)
> } >   *    there is no normal IO happeing.  It must arrange to call
> } >   *    lower_barrier when the particular background IO completes.
> } >   */
> } > -#define RESYNC_DEPTH 32
> } >
> } >  static void raise_barrier(conf_t *conf, int force)
> } >  {
> } > --
> } > 1.5.6.3
> } --
> } To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> } the body of a message to majordomo@vger.kernel.org
> } More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH md] Allow raid10 resync to happening in larger chunks.
  2008-08-06  9:01             ` Keld Jørn Simonsen
       [not found]               ` <200808062256.m76MupNY089428@sia.dkuug.dk>
@ 2008-08-11  0:37               ` Neil Brown
  1 sibling, 0 replies; 11+ messages in thread
From: Neil Brown @ 2008-08-11  0:37 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

On Wednesday August 6, keld@dkuug.dk wrote:
> Neil made this patch based on my patch to speed up raid10 resync.
> 

Yes, thanks for you suggestions.
(and sorry if I seem to have been a bit silent - just busy).

> It is a bit different, although it messes with exactly the two cinstants
> that I also changed. One difference is that Neil intiatlly only allocates
> 1 MiB for buffers while my patch allocates 32 MiB. For the patch to work 
> as intended it is essential that something like 32 MiB be available for
> buffers. I do not see how that is done in Neil's case, but then I do not
> know the code so well.  So how does it work, Neil?

The space controlled by RESYNC_WINDOW is the minimum amount of memory
to preallocate.  This amount is reserved for the duration of the
resync process and the resync process will not start unless it is
available.

The buffers for reading and writing data for the resync are allocated
using a "mempool".  You tell the mempool a minimum size when you
create it, and it will reserve this much memory and make sure it is
always available.  If you as the mempool for more memory, it will get
it if it can, or will block until some of the reserved memory is
returned. 

So the amount preallocated in a mempool should really be the minimum
amount of memory that will allow reasonable forward progress to be
made.  Preallocating a large amount because it makes things go fast is
the wrong approach.  Using as much as is easily available is a good
approach.  Arguably RESYNC_WINDOW should be quite a bit smaller -
maybe just twice RESYNC_BLOCK_SIZE, and RESYNC_DEPTH could be quite a
bit larger.

The import limit on RESYNC_DEPTH relates to latency.  If a normal IO
access arrives at the device, it has to wait for all pending resync IO
to complete before it can start.  If RESYNC_DEPTH is too high (and
there is enough available memory to support it), then it could be
multiple seconds before the regular IO request gets a look-in.

I should probably reduce the RESYNC_WINDOW for raid1 too, and increase
the RESYNC_DEPTH, but it doesn't seem so important there.

> 
> Has your patch been tested, Neil?

Yes, though on my hardware I don't see the same improvement that Jon
does.
More testing is always welcome!

> 
> Anyway if this i a difference between 32 MiB being available or not, I
> think it is important that it be available at the start of the process
> and available for the whole duration of the process. Is it a concern of
> whether 32 Mib buffers be available? My take is that if you are running
> raid, then you probably always have quite some memory. 

Assumptions like that are dangerous.  It could be that you have lots
of memory, but have lots of users of it too.  It could be that you
don't have much.
Certainly we don't need to try to fit inside 64K any more, but a good
principle is to make do with a little as possible, but make use of as
much as is available.
The raid10 resync process is constantly allocating new memory and freeing
old memory.  That means that the VM has the opportunity to use memory
for other demands as they arrive, and provide any spare memory to
resync.  This should make everyone happy.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap.
  2008-08-05  6:17     ` [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap NeilBrown
  2008-08-05  6:17       ` [PATCH md] Don't let a blocked_rdev interfere with read request in raid5/6 NeilBrown
@ 2008-08-05 18:12       ` Dan Williams
  1 sibling, 0 replies; 11+ messages in thread
From: Dan Williams @ 2008-08-05 18:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Aug 4, 2008 at 11:17 PM, NeilBrown <neilb@suse.de> wrote:
> +               /* Sorry, cannot grow a bitmap yet, just remove it,
> +                * grow, and re-add.
> +                */

Perhaps this bit should be in a pr_info().

--
Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-08-11  0:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-05  6:17 [PATCH md] 6 Little md patches for 2.6.27 for review NeilBrown
2008-08-05  6:17 ` [PATCH md] Make writes to md/safe_mode_delay immediately effective NeilBrown
2008-08-05  6:17   ` [PATCH md] Restore force switch of md array to readonly at reboot time NeilBrown
2008-08-05  6:17     ` [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap NeilBrown
2008-08-05  6:17       ` [PATCH md] Don't let a blocked_rdev interfere with read request in raid5/6 NeilBrown
2008-08-05  6:17         ` [PATCH md] Allow faulty devices to be removed from a readonly array NeilBrown
2008-08-05  6:17           ` [PATCH md] Allow raid10 resync to happening in larger chunks NeilBrown
2008-08-06  9:01             ` Keld Jørn Simonsen
     [not found]               ` <200808062256.m76MupNY089428@sia.dkuug.dk>
2008-08-07  0:45                 ` 'Keld Jørn Simonsen'
2008-08-11  0:37               ` Neil Brown
2008-08-05 18:12       ` [PATCH md] Fail safely when trying to grow an array with a write-intent bitmap Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).