Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* [PATCH 19/22] FIX: Enable metadata updates for raid0
From: Krzysztof Wojcik @ 2011-06-02 14:50 UTC (permalink / raw)
  To: neilb
  Cc: linux-raid, wojciech.neubauer, adam.kwolek, dan.j.williams,
	ed.ciechanowski
In-Reply-To: <20110602144212.27355.3706.stgit@gklab-128-111.igk.intel.com>

From: Adam Kwolek <adam.kwolek@intel.com>

When raid0 is takeovered to degraded raid4, metadata updates has to be
applied via mdmon (raid4 has to be monitored).
It is not possible due to no update_tail pointer initialization
in supertype structure.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---
 Grow.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/Grow.c b/Grow.c
index 11b2214..25be587 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1847,6 +1847,9 @@ static int reshape_array(char *container, int fd, char *devname,
 			if (!mdmon_running(st->container_dev))
 				start_mdmon(st->container_dev);
 			ping_monitor(container);
+			if (mdmon_running(st->container_dev) &&
+			    st->update_tail == NULL)
+				st->update_tail = &st->updates;
 		}
 	}
 	/* ->reshape_super might have chosen some spares from the
@@ -2264,6 +2267,8 @@ started:
 					": %s: could not set level "
 					"to %s\n", devname, c);
 		}
+		if (info->new_level == 0)
+			st->update_tail = NULL;
 	}
 out:
 	if (forked)


^ permalink raw reply related

* [PATCH 20/22] Do not use backup file for external metadata
From: Krzysztof Wojcik @ 2011-06-02 14:50 UTC (permalink / raw)
  To: neilb
  Cc: linux-raid, wojciech.neubauer, adam.kwolek, dan.j.williams,
	ed.ciechanowski
In-Reply-To: <20110602144212.27355.3706.stgit@gklab-128-111.igk.intel.com>

From: Adam Kwolek <adam.kwolek@intel.com>

When external metatdata handler supports manage_reshape()
and recover_backup() functions in super switch backup file is not required
and can be omitted. For backup purposes metadata specific mechanisms
are used.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---
 Grow.c |   40 ++++++++++++++++++++++------------------
 1 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/Grow.c b/Grow.c
index 25be587..8e67be2 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2038,25 +2038,29 @@ started:
 	if (d < 0) {
 		goto release;
 	}
-	if (backup_file == NULL) {
-		if (reshape.after.data_disks <= reshape.before.data_disks) {
-			fprintf(stderr,
-				Name ": %s: Cannot grow - need backup-file\n", 
-				devname);
-			goto release;
-		} else if (sra->array.spare_disks == 0) {
-			fprintf(stderr, Name ": %s: Cannot grow - need a spare or "
-				"backup-file to backup critical section\n",
-				devname);
-			goto release;
-		}
-	} else {
-		if (!reshape_open_backup_file(backup_file, fd, devname,
-					      (signed)blocks,
-					      fdlist+d, offsets+d, restart)) {
-			goto release;
+	if ((st->ss->manage_reshape == NULL) ||
+	    (st->ss->recover_backup == NULL)) {
+		if (backup_file == NULL) {
+			if (reshape.after.data_disks <=
+			    reshape.before.data_disks) {
+				fprintf(stderr, Name ": %s: Cannot grow - "
+					"need backup-file\n", devname);
+				goto release;
+			} else if (sra->array.spare_disks == 0) {
+				fprintf(stderr, Name ": %s: Cannot grow - "
+					"need a spare or backup-file to backup "
+					"critical section\n", devname);
+				goto release;
+			}
+		} else {
+			if (!reshape_open_backup_file(backup_file, fd, devname,
+						      (signed)blocks,
+						      fdlist+d, offsets+d,
+						      restart)) {
+				goto release;
+			}
+			d++;
 		}
-		d++;
 	}
 
 	/* lastly, check that the internal stripe cache is


^ permalink raw reply related

* [PATCH 21/22] imsm: Remove user warning before reshape start
From: Krzysztof Wojcik @ 2011-06-02 14:51 UTC (permalink / raw)
  To: neilb
  Cc: linux-raid, wojciech.neubauer, adam.kwolek, dan.j.williams,
	ed.ciechanowski
In-Reply-To: <20110602144212.27355.3706.stgit@gklab-128-111.igk.intel.com>

From: Adam Kwolek <adam.kwolek@intel.com>

imsm's arrays supports imsm native check-pointing now.
User warning is no longer required.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---
 super-intel.c |   31 -------------------------------
 1 files changed, 0 insertions(+), 31 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index b6369c6..f615bb1 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8411,30 +8411,6 @@ int imsm_takeover(struct supertype *st, struct geo_params *geo)
 	return 0;
 }
 
-static int warn_user_about_risk(void)
-{
-	int rv = 0;
-
-	fprintf(stderr,
-		"\nThis is an experimental feature. Data on the RAID volume(s) "
-		"can be lost!!!\n\n"
-		"To continue command execution please make sure that\n"
-		"the grow process will not be interrupted. Use safe power\n"
-		"supply to avoid unexpected system reboot. Make sure that\n"
-		"reshaped container is not assembled automatically during\n"
-		"system boot.\n"
-		"If reshape is interrupted, assemble array manually\n"
-		"using e.g. '-Ac' option and up to date mdadm.conf file.\n"
-		"Assembly in scan mode is not possible in such case.\n"
-		"Growing container with boot array is not possible.\n"
-		"If boot array reshape is interrupted, whole file system\n"
-		"can be lost.\n\n");
-	rv = ask("Do you want to continue? ");
-	fprintf(stderr, "\n");
-
-	return rv;
-}
-
 static int imsm_reshape_super(struct supertype *st, long long size, int level,
 			      int layout, int chunksize, int raid_disks,
 			      int delta_disks, char *backup, char *dev,
@@ -8468,13 +8444,6 @@ static int imsm_reshape_super(struct supertype *st, long long size, int level,
 		dprintf("imsm: info: Container operation\n");
 		int old_raid_disks = 0;
 
-		/* this warning will be removed when imsm checkpointing
-		 * will be implemented, and restoring from check-point
-		 * operation will be transparent for reboot process
-		 */
-		if (warn_user_about_risk() == 0)
-			return ret_val;
-
 		if (imsm_reshape_is_allowed_on_container(
 			    st, &geo, &old_raid_disks)) {
 			struct imsm_update_reshape *u = NULL;


^ permalink raw reply related

* [PATCH 22/22] imsm: Unit Tests - remove backup-file during grow command
From: Krzysztof Wojcik @ 2011-06-02 14:51 UTC (permalink / raw)
  To: neilb
  Cc: linux-raid, wojciech.neubauer, adam.kwolek, dan.j.williams,
	ed.ciechanowski
In-Reply-To: <20110602144212.27355.3706.stgit@gklab-128-111.igk.intel.com>

From: Adam Kwolek <adam.kwolek@intel.com>

Update reshape/migration unit tests to not to use backup file.
Imsm native check-pointing has to be used (internally) instead.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
---
 tests/imsm-grow-template |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/imsm-grow-template b/tests/imsm-grow-template
index 191f056..8022e3a 100644
--- a/tests/imsm-grow-template
+++ b/tests/imsm-grow-template
@@ -14,10 +14,9 @@ function grow_member() {
 	local offset=$6
 	local chunk=$7
 	local array_size=$((comps * size))
-	local backup_imsm=/tmp/backup_imsm
 
 	rm -f $backup_imsm
-	( set -ex; mdadm --grow $member --chunk=$chunk --level=$level --backup-file=$backup_imsm )
+	( set -ex; mdadm --grow $member --chunk=$chunk --level=$level )
 	local status=$?
 	if [ $negative_test -ne 0 ]; then
 		if [ $status -eq 0 ]; then
@@ -83,7 +82,7 @@ if [ $migration_test -ne 0 ]; then
 	fi
 else
 	rm -f $backup_imsm
-	( set -x; mdadm --grow $container --raid-disks=$num_disks --backup-file=$backup_imsm )
+	( set -x; mdadm --grow $container --raid-disks=$num_disks )
 	grow_status=$?
 	if [ $negative_test -ne 0 ]; then
 		if [ $grow_status -eq 0 ]; then


^ permalink raw reply related

* Re: 2.6.39: raid1 check blocks jbd on other md more than 120 seconds
From: Frank van Maarseveen @ 2011-06-03  7:38 UTC (permalink / raw)
  To: Mathias Burén; +Cc: linux-raid
In-Reply-To: <BANLkTimzMOo9uGjt-sDWPVBBdzSQaqVTWw@mail.gmail.com>

On Thu, Jun 02, 2011 at 11:46:38AM +0200, Mathias Burén wrote:
> On 2 June 2011 11:36, Frank van Maarseveen <frankvm@frankvm.com> wrote:
> > The system runs FC14 with an (almost) stock 2.6.39 kernel, configured to
> > panic if it seems to hang. That's exactly what started to happen without
> > anything being logged in the normal way except over netconsole.
> >
> > /proc/mdstat:
> > Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4]
> > md3 : active raid1 sda3[0] sdb3[1]
> >      1885338488 blocks super 1.2 [2/2] [UU]
> >
> > md1 : active raid1 sda1[0] sdb1[1]
> >      33555384 blocks super 1.2 [2/2] [UU]
> >
> > kernel messages:
> >        (/etc/cron.weekly/99-raid-check kicks in)
> > Jun  2 04:04:00 janus md: data-check of RAID array md3
> > Jun  2 04:04:00 janus md: delaying data-check of md1 until md3 has finished (they share one or more physical units)
> > Jun  2 04:04:00 janus md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> > Jun  2 04:04:00 janus md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> > Jun  2 04:04:00 janus md: using 128k window, over a total of 1885338488 blocks.
> > Jun  2 04:55:54 janus INFO: task jbd2/md1-8:1188 blocked for more than 120 seconds.
> [...]
> 
> Same behavior if you lower this?
> 
> Jun  2 04:04:00 janus md: using maximum available idle IO bandwidth
> (but not more than 200000 KB/sec) for data-check.

Practical bandwidth is usually slightly more than 100MB/s at start
to approximately 60MB/s at the end of the disk. I tried setting
sync_speed_max at 70000kB/s. The problem seems to correlate with the
max. practical bandwidth because at the end of the data-check there were
a couple of hung task messages again, referring to postfix- and other
daemons this time. Timeline:

Jun  2 11:52:30 janus kernel: md: data-check of RAID array md3
Jun  2 11:52:30 janus kernel: md: using maximum available idle IO bandwidth (but not more than 70000 KB/sec) for data-check.
Jun  2 18:48:44 hung task
Jun  2 18:48:44 hung task
Jun  2 18:50:44 hung task
Jun  2 18:50:45 hung task
Jun  2 19:28:45 hung task
Jun  2 19:28:45 hung task
Jun  2 19:34:45 hung task
Jun  2 19:34:45 hung task
Jun  2 19:34:45 hung task
Jun  2 19:34:45 hung task
Jun  2 19:53:29 janus kernel: md: md3: data-check done.

Kernel has been booted with hung_task_panic=0.

-- 
Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: 2.6.39: raid1 check blocks jbd on other md more than 120 seconds
From: Thomas Harold @ 2011-06-03 12:08 UTC (permalink / raw)
  To: Frank van Maarseveen, linux-raid
In-Reply-To: <20110602093644.GA8620@janus>

On 6/2/2011 5:36 AM, Frank van Maarseveen wrote:
> The system runs FC14 with an (almost) stock 2.6.39 kernel, configured to
> panic if it seems to hang. That's exactly what started to happen without
> anything being logged in the normal way except over netconsole.
>
> /proc/mdstat:
> Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4]
> md3 : active raid1 sda3[0] sdb3[1]
>        1885338488 blocks super 1.2 [2/2] [UU]
>
> md1 : active raid1 sda1[0] sdb1[1]
>        33555384 blocks super 1.2 [2/2] [UU]
>
> kernel messages:
> 	(/etc/cron.weekly/99-raid-check kicks in)
> Jun  2 04:04:00 janus md: data-check of RAID array md3
> Jun  2 04:04:00 janus md: delaying data-check of md1 until md3 has finished (they share one or more physical units)
> Jun  2 04:04:00 janus md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Jun  2 04:04:00 janus md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jun  2 04:04:00 janus md: using 128k window, over a total of 1885338488 blocks.
> Jun  2 04:55:54 janus INFO: task jbd2/md1-8:1188 blocked for more than 120 seconds.
> Jun  2 04:55:54 "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun  2 04:55:54 janus jbd2/md1-8     D

That's a bug that you'll see in CentOS/RHEL in cases where there are 
multiple arrays to be checked, that use the same set of disks.  I first 
saw it in CentOS 5.5 (or maybe 5.6).

https://bugzilla.redhat.com/show_bug.cgi?id=573106

It's an annoying message, but the weekly raid sync runs fine.

^ permalink raw reply

* Re: 2.6.39: raid1 check blocks jbd on other md more than 120 seconds
From: Frank van Maarseveen @ 2011-06-03 12:36 UTC (permalink / raw)
  To: Thomas Harold; +Cc: linux-raid
In-Reply-To: <4DE8CEA1.9090906@nybeta.com>

On Fri, Jun 03, 2011 at 08:08:01AM -0400, Thomas Harold wrote:
> On 6/2/2011 5:36 AM, Frank van Maarseveen wrote:
> >The system runs FC14 with an (almost) stock 2.6.39 kernel, configured to
> >panic if it seems to hang. That's exactly what started to happen without
> >anything being logged in the normal way except over netconsole.
> >
> >/proc/mdstat:
> >Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4]
> >md3 : active raid1 sda3[0] sdb3[1]
> >       1885338488 blocks super 1.2 [2/2] [UU]
> >
> >md1 : active raid1 sda1[0] sdb1[1]
> >       33555384 blocks super 1.2 [2/2] [UU]
> >
> >kernel messages:
> >	(/etc/cron.weekly/99-raid-check kicks in)
> >Jun  2 04:04:00 janus md: data-check of RAID array md3
> >Jun  2 04:04:00 janus md: delaying data-check of md1 until md3 has finished (they share one or more physical units)
> >Jun  2 04:04:00 janus md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> >Jun  2 04:04:00 janus md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> >Jun  2 04:04:00 janus md: using 128k window, over a total of 1885338488 blocks.
> >Jun  2 04:55:54 janus INFO: task jbd2/md1-8:1188 blocked for more than 120 seconds.
> >Jun  2 04:55:54 "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >Jun  2 04:55:54 janus jbd2/md1-8     D
> 
> That's a bug that you'll see in CentOS/RHEL in cases where there are
> multiple arrays to be checked, that use the same set of disks.  I
> first saw it in CentOS 5.5 (or maybe 5.6).
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=573106
> 
> It's an annoying message, but the weekly raid sync runs fine.

According to the bugzilla report it was the resync itself which got stuck,
unlike what I am seeing where any random program may get stuck. Depending
on kernel configuration it may trigger a kernel panic. Last time:

Jun  2 18:48:44 janus kernel: INFO: task master:2705 blocked for more than 120 seconds.
Jun  2 18:48:44 janus kernel: INFO: task pickup:19276 blocked for more than 120 seconds.
Jun  2 18:50:44 janus kernel: INFO: task jbd2/md1-8:1187 blocked for more than 120 seconds.
Jun  2 18:50:45 janus kernel: INFO: task python:1890 blocked for more than 120 seconds.
Jun  2 19:28:45 janus kernel: INFO: task master:2705 blocked for more than 120 seconds.
Jun  2 19:28:45 janus kernel: INFO: task pickup:20589 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task jbd2/md1-8:1187 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task master:2705 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task qmgr:2718 blocked for more than 120 seconds.
Jun  2 19:34:45 janus kernel: INFO: task pickup:20589 blocked for more than 120 seconds.

-- 
Frank

^ permalink raw reply

* [PATCH 4/8] md/raid: use printk_ratelimited instead of printk_ratelimit
From: Christian Dietrich @ 2011-06-04 15:36 UTC (permalink / raw)
  To: Neil Brown, linux-raid, linux-kernel, trivial
In-Reply-To: <cover.1307199715.git.christian.dietrich@informatik.uni-erlangen.de>

As per printk_ratelimit comment, it should not be used.

Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
---
 drivers/md/raid1.c  |   22 ++++++++++++----------
 drivers/md/raid10.c |   22 ++++++++++++----------
 drivers/md/raid5.c  |   39 +++++++++++++++++++--------------------
 3 files changed, 43 insertions(+), 40 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5d09609..30af10e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -35,6 +35,7 @@
 #include <linux/delay.h>
 #include <linux/blkdev.h>
 #include <linux/seq_file.h>
+#include <linux/ratelimit.h>
 #include "md.h"
 #include "raid1.h"
 #include "bitmap.h"
@@ -287,10 +288,11 @@ static void raid1_end_read_request(struct bio *bio, int error)
 		 * oops, read error:
 		 */
 		char b[BDEVNAME_SIZE];
-		if (printk_ratelimit())
-			printk(KERN_ERR "md/raid1:%s: %s: rescheduling sector %llu\n",
-			       mdname(conf->mddev),
-			       bdevname(conf->mirrors[mirror].rdev->bdev,b), (unsigned long long)r1_bio->sector);
+		printk_ratelimited(KERN_ERR "md/raid1:%s: %s: "
+				   "rescheduling sector %llu\n",
+				   mdname(conf->mddev),
+				   bdevname(conf->mirrors[mirror].rdev->bdev, b),
+				   (unsigned long long)r1_bio->sector);
 		reschedule_retry(r1_bio);
 	}
 
@@ -1574,12 +1576,12 @@ static void raid1d(mddev_t *mddev)
 						      GFP_NOIO, mddev);
 				r1_bio->bios[r1_bio->read_disk] = bio;
 				rdev = conf->mirrors[disk].rdev;
-				if (printk_ratelimit())
-					printk(KERN_ERR "md/raid1:%s: redirecting sector %llu to"
-					       " other mirror: %s\n",
-					       mdname(mddev),
-					       (unsigned long long)r1_bio->sector,
-					       bdevname(rdev->bdev,b));
+				printk_ratelimited(KERN_ERR
+						   "md/raid1:%s: redirecting sector %llu to"
+						   " other mirror: %s\n",
+						   mdname(mddev),
+						   (unsigned long long)r1_bio->sector,
+						   bdevname(rdev->bdev, b));
 				bio->bi_sector = r1_bio->sector + rdev->data_offset;
 				bio->bi_bdev = rdev->bdev;
 				bio->bi_end_io = raid1_end_read_request;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6e84668..e80475a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -22,6 +22,7 @@
 #include <linux/delay.h>
 #include <linux/blkdev.h>
 #include <linux/seq_file.h>
+#include <linux/ratelimit.h>
 #include "md.h"
 #include "raid10.h"
 #include "raid0.h"
@@ -277,10 +278,11 @@ static void raid10_end_read_request(struct bio *bio, int error)
 		 * oops, read error - keep the refcount on the rdev
 		 */
 		char b[BDEVNAME_SIZE];
-		if (printk_ratelimit())
-			printk(KERN_ERR "md/raid10:%s: %s: rescheduling sector %llu\n",
-			       mdname(conf->mddev),
-			       bdevname(conf->mirrors[dev].rdev->bdev,b), (unsigned long long)r10_bio->sector);
+		printk_ratelimited(KERN_ERR
+				   "md/raid10:%s: %s: rescheduling sector %llu\n",
+				   mdname(conf->mddev),
+				   bdevname(conf->mirrors[dev].rdev->bdev, b),
+				   (unsigned long long)r10_bio->sector);
 		reschedule_retry(r10_bio);
 	}
 }
@@ -1667,12 +1669,12 @@ static void raid10d(mddev_t *mddev)
 				bio_put(bio);
 				slot = r10_bio->read_slot;
 				rdev = conf->mirrors[mirror].rdev;
-				if (printk_ratelimit())
-					printk(KERN_ERR "md/raid10:%s: %s: redirecting sector %llu to"
-					       " another mirror\n",
-					       mdname(mddev),
-					       bdevname(rdev->bdev,b),
-					       (unsigned long long)r10_bio->sector);
+				printk_ratelimited(KERN_ERR
+						   "md/raid10:%s: %s: redirecting"
+						   "sector %llu to another mirror\n",
+						   mdname(mddev),
+						   bdevname(rdev->bdev, b),
+						   (unsigned long long)r10_bio->sector);
 				bio = bio_clone_mddev(r10_bio->master_bio,
 						      GFP_NOIO, mddev);
 				r10_bio->devs[slot].bio = bio;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 346e69b..8927c26 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -51,6 +51,7 @@
 #include <linux/seq_file.h>
 #include <linux/cpu.h>
 #include <linux/slab.h>
+#include <linux/ratelimit.h>
 #include "md.h"
 #include "raid5.h"
 #include "raid0.h"
@@ -96,8 +97,6 @@
 #define __inline__
 #endif
 
-#define printk_rl(args...) ((void) (printk_ratelimit() && printk(args)))
-
 /*
  * We maintain a biased count of active stripes in the bottom 16 bits of
  * bi_phys_segments, and a count of processed stripes in the upper 16 bits
@@ -1587,12 +1586,12 @@ static void raid5_end_read_request(struct bio * bi, int error)
 		set_bit(R5_UPTODATE, &sh->dev[i].flags);
 		if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
 			rdev = conf->disks[i].rdev;
-			printk_rl(KERN_INFO "md/raid:%s: read error corrected"
-				  " (%lu sectors at %llu on %s)\n",
-				  mdname(conf->mddev), STRIPE_SECTORS,
-				  (unsigned long long)(sh->sector
-						       + rdev->data_offset),
-				  bdevname(rdev->bdev, b));
+			printk_ratelimited(KERN_INFO "md/raid:%s: read error corrected"
+					   " (%lu sectors at %llu on %s)\n",
+					   mdname(conf->mddev), STRIPE_SECTORS,
+					   (unsigned long long)(sh->sector
+								+ rdev->data_offset),
+					   bdevname(rdev->bdev, b));
 			clear_bit(R5_ReadError, &sh->dev[i].flags);
 			clear_bit(R5_ReWrite, &sh->dev[i].flags);
 		}
@@ -1606,21 +1605,21 @@ static void raid5_end_read_request(struct bio * bi, int error)
 		clear_bit(R5_UPTODATE, &sh->dev[i].flags);
 		atomic_inc(&rdev->read_errors);
 		if (conf->mddev->degraded >= conf->max_degraded)
-			printk_rl(KERN_WARNING
-				  "md/raid:%s: read error not correctable "
-				  "(sector %llu on %s).\n",
-				  mdname(conf->mddev),
-				  (unsigned long long)(sh->sector
-						       + rdev->data_offset),
+			printk_ratelimited(KERN_WARNING
+					   "md/raid:%s: read error not correctable "
+					   "(sector %llu on %s).\n",
+					   mdname(conf->mddev),
+					   (unsigned long long)(sh->sector
+								+ rdev->data_offset),
 				  bdn);
 		else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
 			/* Oh, no!!! */
-			printk_rl(KERN_WARNING
-				  "md/raid:%s: read error NOT corrected!! "
-				  "(sector %llu on %s).\n",
-				  mdname(conf->mddev),
-				  (unsigned long long)(sh->sector
-						       + rdev->data_offset),
+			printk_ratelimited(KERN_WARNING
+					   "md/raid:%s: read error NOT corrected!! "
+					   "(sector %llu on %s).\n",
+					   mdname(conf->mddev),
+					   (unsigned long long)(sh->sector
+								+ rdev->data_offset),
 				  bdn);
 		else if (atomic_read(&rdev->read_errors)
 			 > conf->max_nr_stripes)
-- 
1.7.1

^ permalink raw reply related

* Fwd: Maximizing failed disk replacement on a RAID5 array
From: Durval Menezes @ 2011-06-05 14:22 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <BANLkTimBYFhjQ-sC9DhTMO+PG-Ox+A9S2Q@mail.gmail.com>

Hello folks,

A few days ago, the smartd daemon running on my Lucid system at home
(kernel 2.6.32-32-generic, mdadm 2.6.7.1) has started warning me about
a few (less than 50 so far) offline uncorrectable and other errors on
one of my 1.5TB HDs three-disk RAID5 array. This failing HD is still
online (ie, hasn't been kicked off the array), at least for now.

I have another disk ready for replacement, and I'm trying to determine
the safer (not necessarily the simpler) way of proceeding.

I understand that, if I do it the "standard" way (ie, power down the
system, remove the failing disk, add the replacement disk, then boot
up and use "mdadm --add" to add the new disk to the array) I run the
risk of running into unreadable sectors on one of the other two disks,
and then my RAID5 is kaput.

What I would really like to do is to be able to add the new HD to the
array WITHOUT removing the failing HD, somehow sync it with the rest,
and THEN remove the failing HD: that way, an eventual failed read from
one of the two other HDs could possibly be satisfied from the failing
HD (unless EXACTLY that same sector is also unreadable on it, which I
find unlikely), and so avoid losing the whole array in the above case.

So far, the only way I've been able to figure to do that would be to
convert the  array from RAID5 to RAID6, add the new disk, wait for the
array to sync, remove the failing disk, and then convert the array
back from RAID6 to RAID5 (and I'm not really sure that this is a good
idea, or even doable).

So, folks, what do you say? Is there a better way? Any gotchas in the
RAID5->RAID6->RAID6 approach?

Thanks,
--
   Durval Menezes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RAID5: failing an active component during spare rebuild - arrays hangs
From: Alexander Lyakas @ 2011-06-05 19:41 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <BANLkTikkeoCsr3-UBSPEDrYwh4jGSn=MaA@mail.gmail.com>

Hello everybody,
I am testing a scenario, in which I create a RAID5 with three devices:
/dev/sd{a,b,c}. Since I don't supply --force to mdadm during creation,
it treats the array as degraded and starts rebuilding the sdc as a
spare. This is as documented.

Then I do --fail on /dev/sda. I understand that at this point my data
is gone, but I think should still be able to tear down the array.

Sometimes I see that /dev/sda is kicked from the array as faulty, and
/dev/sdc is also removed and marked as a spare. Then I am able to tear
down the array.

But sometimes, it looks like the system hits some kind of a deadlock.
mdadm --detail produces:

    Update Time : Sun Jun  5 21:54:34 2011
          State : active, FAILED
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : ubuntu:zvp_1123
           UUID : 48a15fb6:b6410bb9:a2ca173e:0092032c
         Events : 67

    Number   Major   Minor   RaidDevice State
       0       8        0        0      faulty spare rebuilding   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       3       8       32        2      spare rebuilding   /dev/sdc

So the faulty device and the spare are not kicked out of the array. At
this point I am unable to do anything with the array:

root@ubuntu:~# sudo mdadm --stop /dev/md1123
mdadm: failed to stop array /dev/md1123: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?
root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sda
mdadm: hot remove failed for /dev/sda: Device or resource busy
root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdb
mdadm: hot remove failed for /dev/sdb: Device or resource busy
root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdc
mdadm: hot remove failed for /dev/sdc: Device or resource busy

This is happening on ubuntu-natty, with mdadm - v3.1.4 - 31st August 2010.
Looking at some code in mdadm/Detail.c, it looks like /dev/sda has
been marked only as MD_DISK_FAULTY, but has not yet been kicked out of
the array. The "spare" and "rebuilding" prints also result from that.

Same thing also happens (sometimes) when I manually initiate resync
(by writing 'repair' to 'sync_action'), and later manually failing one
of the devices. Then I also saw messages like this in the syslog:
Jun  5 21:42:00 ubuntu kernel: [ 2280.350454] INFO: task
md1123_resync:7993 blocked for more than 120 seconds.
Jun  5 21:42:00 ubuntu kernel: [ 2280.350552] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  5 21:42:00 ubuntu kernel: [ 2280.350644] md1123_resync   D
0000000000000000     0  7993      2 0x00000004
Jun  5 21:42:00 ubuntu kernel: [ 2280.350647]  ffff8800b56b1cd0
0000000000000046 ffff8800b56b1fd8 ffff8800b56b0000
Jun  5 21:42:00 ubuntu kernel: [ 2280.350649]  0000000000013d00
ffff880036c09a98 ffff8800b56b1fd8 0000000000013d00
Jun  5 21:42:00 ubuntu kernel: [ 2280.350652]  ffff8800b7f1adc0
ffff880036c096e0 ffff8800b56b1cb0 ffff880036c56610
Jun  5 21:42:00 ubuntu kernel: [ 2280.350654] Call Trace:
Jun  5 21:42:00 ubuntu kernel: [ 2280.350657]  [<ffffffff81492885>]
md_do_sync+0xb45/0xc90
Jun  5 21:42:00 ubuntu kernel: [ 2280.350660]  [<ffffffff81087940>] ?
autoremove_wake_function+0x0/0x40
Jun  5 21:42:00 ubuntu kernel: [ 2280.350663]  [<ffffffff8107861b>] ?
recalc_sigpending+0x1b/0x50
Jun  5 21:42:00 ubuntu kernel: [ 2280.350665]  [<ffffffff8148c516>]
md_thread+0x116/0x150
Jun  5 21:42:00 ubuntu kernel: [ 2280.350667]  [<ffffffff8148c400>] ?
md_thread+0x0/0x150
Jun  5 21:42:00 ubuntu kernel: [ 2280.350669]  [<ffffffff810871f6>]
kthread+0x96/0xa0
Jun  5 21:42:00 ubuntu kernel: [ 2280.350672]  [<ffffffff8100cde4>]
kernel_thread_helper+0x4/0x10
Jun  5 21:42:00 ubuntu kernel: [ 2280.350674]  [<ffffffff81087160>] ?
kthread+0x0/0xa0
Jun  5 21:42:00 ubuntu kernel: [ 2280.350676]  [<ffffffff8100cde0>] ?
kernel_thread_helper+0x0/0x10

This is pretty easy for me to reproduce.

Basically, I would like to know what the user is expected to do when
more than one RAID5 array component fails during rebuild/resync.

Thanks,
  Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* sector I/O error cause disk to be "faulty" in raid5
From: hank peng @ 2011-06-06 13:28 UTC (permalink / raw)
  To: linux-raid

Hi, everybody:
In current raid5 implementation, if a r/w error occured at some
specific sectors on a disk, the disk will be labeled as "faulty".
Here, I want to say in most cases, this is failure indication of those
sectors not the whole disk. Should we make some changes to be more
reasonable?

-- 
The simplest is not all best but the best is surely the simplest!

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Drew @ 2011-06-06 15:02 UTC (permalink / raw)
  To: Durval Menezes; +Cc: linux-raid
In-Reply-To: <BANLkTimOxCF7diZfeFgtqCBWKZVF2pxyLg@mail.gmail.com>

> I understand that, if I do it the "standard" way (ie, power down the
> system, remove the failing disk, add the replacement disk, then boot
> up and use "mdadm --add" to add the new disk to the array) I run the
> risk of running into unreadable sectors on one of the other two disks,
> and then my RAID5 is kaput.
>
> What I would really like to do is to be able to add the new HD to the
> array WITHOUT removing the failing HD, somehow sync it with the rest,
> and THEN remove the failing HD: that way, an eventual failed read from
> one of the two other HDs could possibly be satisfied from the failing
> HD (unless EXACTLY that same sector is also unreadable on it, which I
> find unlikely), and so avoid losing the whole array in the above case.

A reshape from RAID5 -> RAID6 -> RAID5 will hammer your disks so if
either of the other two are ready to die, this will most likely tip
them over the edge.

A far simpler way would be to take the array offline, dd (or
dd_rescue) the old drive's contents onto the new disk, pull the old
disk, and restart the array with the new drive in it's place. With
luck you won't need a resync *and* you're not hammering the other two
drives in the process.


-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Brad Campbell @ 2011-06-06 15:20 UTC (permalink / raw)
  To: Drew; +Cc: Durval Menezes, linux-raid
In-Reply-To: <BANLkTi=px08AWxfJJq+zNepCZM8aAsKECA@mail.gmail.com>

On 06/06/11 23:02, Drew wrote:
>> I understand that, if I do it the "standard" way (ie, power down the
>> system, remove the failing disk, add the replacement disk, then boot
>> up and use "mdadm --add" to add the new disk to the array) I run the
>> risk of running into unreadable sectors on one of the other two disks,
>> and then my RAID5 is kaput.
>>
>> What I would really like to do is to be able to add the new HD to the
>> array WITHOUT removing the failing HD, somehow sync it with the rest,
>> and THEN remove the failing HD: that way, an eventual failed read from
>> one of the two other HDs could possibly be satisfied from the failing
>> HD (unless EXACTLY that same sector is also unreadable on it, which I
>> find unlikely), and so avoid losing the whole array in the above case.
> A reshape from RAID5 ->  RAID6 ->  RAID5 will hammer your disks so if
> either of the other two are ready to die, this will most likely tip
> them over the edge.
>
> A far simpler way would be to take the array offline, dd (or
> dd_rescue) the old drive's contents onto the new disk, pull the old
> disk, and restart the array with the new drive in it's place. With
> luck you won't need a resync *and* you're not hammering the other two
> drives in the process.
<afterthought>
Bear with me, I've had a few scotches and this might not be as coherent as it might be, but I think 
I spot a very, very fatal flaw in your plan.
</afterthought>

I thought this initially also, except it blows up in the scenario where the dud sectors are data and 
not parity.

If you do it the way you suggest and choose dd_rescue in place of dd, dodgy data from the dud 
sectors will be replicated as kosher sectors on the replacement disk (or zero, or random or whatever)

If you execute a "repair" first, it will strike the dud sectors, see they are toast, re-calculate 
them from parity and write them back forcing a reallocation.

You can then replicate the failing disk using "dd", *not* dd_rescue. If dd fails due to a read error 
then you know that part of your data is likely to be toast on the replaced disk, and you can go 
about making provisions for a backup/restore operation using the original disk (which will likely 
succeed as the data read from the array will be re-built from parity where required).

dd_rescue is a blessing and a curse. It's _very_ good at getting you access to data that you have no 
backup of, and you have no other way of getting back. On the other hand, it will happily go and 
replicate whatever trash it happens to get back from the source disk, or skip those sectors and 
leave you with an incomplete copy that will leave no trace of it being incomplete until you find 
chunks missing (like superblocks or your formula for a zero cost petroleum replacement).

If your array works, but has a badly failing drive you are far better to buy some cheap 2TB disks 
and back it up, then restore it onto a re-created array than chance losing chunks of data by using a 
dd_rescue'd clone disk.

Now, if I'm off the wall and missing something blindingly obvious feel free to thump me with a clue 
bat (it would not be the first time).

I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB to complete idiocy on 
my part, so I know the sting of lost or corrupted data.

Brad

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Drew @ 2011-06-06 15:37 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Durval Menezes, linux-raid
In-Reply-To: <4DECF025.9040006@fnarfbargle.com>

> Now, if I'm off the wall and missing something blindingly obvious feel free
> to thump me with a clue bat (it would not be the first time).
>
> I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB
> to complete idiocy on my part, so I know the sting of lost or corrupted
> data.

I think you've covered the process in more detail, including pitfalls,
then I have. :-) Only catch is where would you find a cheap 2-3TB
drive right now?

I also know the sting of mixing stupidity and dd. ;-) A friend was
helping me do some complex rework with dd on one of my disks. Being
the n00b I followed his instructions exactly, and him being the expert
(and assuming I wasn't the n00b I was back then) didn't double check
my work. Net result was I backed the MBR/Partition Table up using dd,
but did so to a partition on the drive we were working on. There may
have been some alcohol involved (I was in University), the revised
data we inserted failed, and next thing you know I'm running Partition
Magic (the gnu tools circa 2005 failed to detect anything) to try and
recover the partition table. No backups obviously. ;-)

-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Brad Campbell @ 2011-06-06 15:54 UTC (permalink / raw)
  To: Drew; +Cc: Durval Menezes, linux-raid
In-Reply-To: <BANLkTi=jPiXLySacVZqDkeThdG80K_HTxQ@mail.gmail.com>

On 06/06/11 23:37, Drew wrote:
>> Now, if I'm off the wall and missing something blindingly obvious feel free
>> to thump me with a clue bat (it would not be the first time).
>>
>> I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB
>> to complete idiocy on my part, so I know the sting of lost or corrupted
>> data.
> I think you've covered the process in more detail, including pitfalls,
> then I have. :-) Only catch is where would you find a cheap 2-3TB
> drive right now?

I bought 10 recently for about $90 each. It's all relative, but I consider ~$45 / TB cheap.

> I also know the sting of mixing stupidity and dd. ;-) A friend was
> helping me do some complex rework with dd on one of my disks. Being
> the n00b I followed his instructions exactly, and him being the expert
> (and assuming I wasn't the n00b I was back then) didn't double check
> my work. Net result was I backed the MBR/Partition Table up using dd,
> but did so to a partition on the drive we were working on. There may
> have been some alcohol involved (I was in University), the revised
> data we inserted failed, and next thing you know I'm running Partition
> Magic (the gnu tools circa 2005 failed to detect anything) to try and
> recover the partition table. No backups obviously. ;-)

Similar to my

dd if=/dev/zero of=/dev/sdb bs=1M count=100

except instead of the target disk, it was to a raid array member that was currently active. To its 
credit, ext3 and fsck managed to give me most of my data back, even if I had to spend months 
intermittently sorting/renaming inode numbers from lost+found into files and directories.

I'd like to claim Alcohol as a mitigating factor (hell, it gets people off charges in our court 
system all the time) but unfortunately I was just stupid.


^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Durval Menezes @ 2011-06-06 18:06 UTC (permalink / raw)
  To: linux-raid; +Cc: Brad Campbell, Drew
In-Reply-To: <4DECF841.1060906@fnarfbargle.com>

Hello Brad, Drew,

Thanks for reminding me of the hammering a RAID level conversion would  cause.
This is certainly a major  reason to avoid the RAID5->RAID6->RAID5 route.

The "repair" has been running here for a few days already, with the
server online, and ought to finish in 24 more hours. So far (thanks to
the automatic rewrite relocation) the number of  uncorrectable sectors
being reported by SMART has dropped from 40 to 20 , so it seems the
repair is  doing its job. Lets just hope the disk has enough  spare
sectors  to remap all the bad sectors; if it does, a simple "dd "from
the bad disk to  its replacement ought to  do the job  (as you have
indicated).

On the other hand, as this "dd" has to be done with the array offline,
it will entail in some downtime (although not as much as having to
restore the whole array from backups).... not ideal, but not too bad
either.

In case worst comes to worst, I have an up-to-date offline backup of
the contents of the whole array, so if something really bad happens, I
have something to restore from.

It would be great to have a
"duplicate-this-bad-old-disk-into-this-shiny-new-disk"  functionality,
as it would enable  an almost-no-downtime disk replacement with
minimum  risk, but it seems we can't have everything... :-0 Maybe it's
something for the wishlist?

About mishaps with "dd", I think everyone  who ever dealt with a
system  (not just Linux)  on the level we do has sometime gone through
something similar... the last time I remember doing this was many
years ago, before  Linux existed, when me and a few friends spent a
wonderful night installing  William Jolitz ' then-new 386/BSD  on a HD
 (a process which *required*  dd)  and trashing its Windows partitions
(which contained the only copy of the graduation thesis of one of us,
due in a few days).

Thanks for all the help,
--
   Durval Menezes.

On Mon, Jun 6, 2011 at 12:54 PM, Brad Campbell <brad@fnarfbargle.com> wrote:
>
> On 06/06/11 23:37, Drew wrote:
>>>
>>> Now, if I'm off the wall and missing something blindingly obvious feel free
>>> to thump me with a clue bat (it would not be the first time).
>>>
>>> I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB
>>> to complete idiocy on my part, so I know the sting of lost or corrupted
>>> data.
>>
>> I think you've covered the process in more detail, including pitfalls,
>> then I have. :-) Only catch is where would you find a cheap 2-3TB
>> drive right now?
>
> I bought 10 recently for about $90 each. It's all relative, but I consider ~$45 / TB cheap.
>
>> I also know the sting of mixing stupidity and dd. ;-) A friend was
>> helping me do some complex rework with dd on one of my disks. Being
>> the n00b I followed his instructions exactly, and him being the expert
>> (and assuming I wasn't the n00b I was back then) didn't double check
>> my work. Net result was I backed the MBR/Partition Table up using dd,
>> but did so to a partition on the drive we were working on. There may
>> have been some alcohol involved (I was in University), the revised
>> data we inserted failed, and next thing you know I'm running Partition
>> Magic (the gnu tools circa 2005 failed to detect anything) to try and
>> recover the partition table. No backups obviously. ;-)
>
> Similar to my
>
> dd if=/dev/zero of=/dev/sdb bs=1M count=100
>
> except instead of the target disk, it was to a raid array member that was currently active. To its credit, ext3 and fsck managed to give me most of my data back, even if I had to spend months intermittently sorting/renaming inode numbers from lost+found into files and directories.
>
> I'd like to claim Alcohol as a mitigating factor (hell, it gets people off charges in our court system all the time) but unfortunately I was just stupid.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RAID5: failing an active component during spare rebuild - arrays hangs
From: Alexander Lyakas @ 2011-06-06 18:19 UTC (permalink / raw)
  To: Nagilum, linux-raid
In-Reply-To: <20110605230014.14822hd7b50rcqww@cakebox.homeunix.net>

Hello,

the kernel version is:

root@ubuntu:~# uname -a
Linux ubuntu 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC
2011 x86_64 x86_64 x86_64 GNU/Linux

mdadm version is:
root@ubuntu:~# mdadm -V
mdadm - v3.1.4 - 31st August 2010

Examining the three array components:

root@ubuntu:~# mdadm -E /dev/sd{a,b,c}
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
           Name : vc:zvp_1123
  Creation Time : Mon Jun  6 21:10:38 2011
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 41940992 (20.00 GiB 21.47 GB)
     Array Size : 83879936 (40.00 GiB 42.95 GB)
  Used Dev Size : 41939968 (20.00 GiB 21.47 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 8db90071:be80216e:09468262:1f5046b1

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Jun  6 21:10:46 2011
       Checksum : 2e424556 - correct
         Events : 10

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A.A ('A' == active, '.' == missing)
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
           Name : vc:zvp_1123
  Creation Time : Mon Jun  6 21:10:38 2011
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 41940992 (20.00 GiB 21.47 GB)
     Array Size : 83879936 (40.00 GiB 42.95 GB)
  Used Dev Size : 41939968 (20.00 GiB 21.47 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 9f41313b:b1aa70f8:6cf0ca2f:c6ea0a64

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Jun  6 21:10:44 2011
       Checksum : 2d23c61 - correct
         Events : 8

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA ('A' == active, '.' == missing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x3
     Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
           Name : vc:zvp_1123
  Creation Time : Mon Jun  6 21:10:38 2011
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 41940992 (20.00 GiB 21.47 GB)
     Array Size : 83879936 (40.00 GiB 42.95 GB)
  Used Dev Size : 41939968 (20.00 GiB 21.47 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
Recovery Offset : 999424 sectors
          State : active
    Device UUID : 61189a9d:ec082cea:a3ba32fb:800fe84b

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Jun  6 21:10:46 2011
       Checksum : a47a059 - correct
         Events : 10

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.A ('A' == active, '.' == missing)

Details about the array:

root@ubuntu:~#  mdadm -Q --detail /dev/md1123
/dev/md1123:
        Version : 1.2
  Creation Time : Mon Jun  6 21:10:38 2011
     Raid Level : raid5
     Array Size : 41939968 (40.00 GiB 42.95 GB)
  Used Dev Size : 20969984 (20.00 GiB 21.47 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Jun  6 21:10:46 2011
          State : active, FAILED
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : vc:zvp_1123
           UUID : b5802763:fd4790dd:ee8bdeb2:2418097f
         Events : 10

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      faulty spare rebuilding   /dev/sdb
       3       8       32        2      spare rebuilding   /dev/sdc


Basically, the thing is that the faulty (and the rebuilding spare)
component are not kicked out of the array, and the array is stuck in
this state.

Thanks,
  Alex.


2011/6/6 Nagilum <nagilum@nagilum.org>:
> Make sure you provide all relevant details such as kernel version, mdadm
> version and maybe also mdadm -E /dev/sd{a,b,c}, mdadm -Q --detail /dev/md0,
> ..
>
> ----- Message from alex.bolshoy@gmail.com ---------
>    Date: Sun, 5 Jun 2011 22:41:55 +0300
>    From: Alexander Lyakas <alex.bolshoy@gmail.com>
>  Subject: RAID5: failing an active component during spare rebuild - arrays
> hangs
>      To: linux-raid@vger.kernel.org
>
>
>> Hello everybody,
>> I am testing a scenario, in which I create a RAID5 with three devices:
>> /dev/sd{a,b,c}. Since I don't supply --force to mdadm during creation,
>> it treats the array as degraded and starts rebuilding the sdc as a
>> spare. This is as documented.
>>
>> Then I do --fail on /dev/sda. I understand that at this point my data
>> is gone, but I think should still be able to tear down the array.
>>
>> Sometimes I see that /dev/sda is kicked from the array as faulty, and
>> /dev/sdc is also removed and marked as a spare. Then I am able to tear
>> down the array.
>>
>> But sometimes, it looks like the system hits some kind of a deadlock.
>> mdadm --detail produces:
>>
>>     Update Time : Sun Jun  5 21:54:34 2011
>>           State : active, FAILED
>>  Active Devices : 1
>> Working Devices : 2
>>  Failed Devices : 1
>>   Spare Devices : 1
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>            Name : ubuntu:zvp_1123
>>            UUID : 48a15fb6:b6410bb9:a2ca173e:0092032c
>>          Events : 67
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8        0        0      faulty spare rebuilding   /dev/sda
>>        1       8       16        1      active sync   /dev/sdb
>>        3       8       32        2      spare rebuilding   /dev/sdc
>>
>> So the faulty device and the spare are not kicked out of the array. At
>> this point I am unable to do anything with the array:
>>
>> root@ubuntu:~# sudo mdadm --stop /dev/md1123
>> mdadm: failed to stop array /dev/md1123: Device or resource busy
>> Perhaps a running process, mounted filesystem or active volume group?
>> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sda
>> mdadm: hot remove failed for /dev/sda: Device or resource busy
>> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdb
>> mdadm: hot remove failed for /dev/sdb: Device or resource busy
>> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdc
>> mdadm: hot remove failed for /dev/sdc: Device or resource busy
>>
>> This is happening on ubuntu-natty, with mdadm - v3.1.4 - 31st August 2010.
>> Looking at some code in mdadm/Detail.c, it looks like /dev/sda has
>> been marked only as MD_DISK_FAULTY, but has not yet been kicked out of
>> the array. The "spare" and "rebuilding" prints also result from that.
>>
>> Same thing also happens (sometimes) when I manually initiate resync
>> (by writing 'repair' to 'sync_action'), and later manually failing one
>> of the devices. Then I also saw messages like this in the syslog:
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350454] INFO: task
>> md1123_resync:7993 blocked for more than 120 seconds.
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350552] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350644] md1123_resync   D
>> 0000000000000000     0  7993      2 0x00000004
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350647]  ffff8800b56b1cd0
>> 0000000000000046 ffff8800b56b1fd8 ffff8800b56b0000
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350649]  0000000000013d00
>> ffff880036c09a98 ffff8800b56b1fd8 0000000000013d00
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350652]  ffff8800b7f1adc0
>> ffff880036c096e0 ffff8800b56b1cb0 ffff880036c56610
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350654] Call Trace:
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350657]  [<ffffffff81492885>]
>> md_do_sync+0xb45/0xc90
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350660]  [<ffffffff81087940>] ?
>> autoremove_wake_function+0x0/0x40
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350663]  [<ffffffff8107861b>] ?
>> recalc_sigpending+0x1b/0x50
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350665]  [<ffffffff8148c516>]
>> md_thread+0x116/0x150
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350667]  [<ffffffff8148c400>] ?
>> md_thread+0x0/0x150
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350669]  [<ffffffff810871f6>]
>> kthread+0x96/0xa0
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350672]  [<ffffffff8100cde4>]
>> kernel_thread_helper+0x4/0x10
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350674]  [<ffffffff81087160>] ?
>> kthread+0x0/0xa0
>> Jun  5 21:42:00 ubuntu kernel: [ 2280.350676]  [<ffffffff8100cde0>] ?
>> kernel_thread_helper+0x0/0x10
>>
>> This is pretty easy for me to reproduce.
>>
>> Basically, I would like to know what the user is expected to do when
>> more than one RAID5 array component fails during rebuild/resync.
>>
>> Thanks,
>>   Alex.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> ----- End message from alex.bolshoy@gmail.com -----
>
>
>
> ========================================================================
> #    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
> #   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
> #  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
> # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
> #           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
> ========================================================================
>
>
> ----------------------------------------------------------------
> cakebox.homeunix.net - all the machine one needs..
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* md array does not detect drive removal: mdadm 3.2.1, Linux 2.6.38
From: fibreraid @ 2011-06-06 18:20 UTC (permalink / raw)
  To: linux-raid, fibre raid

Hello,

I am running Linux kernel 2.6.38 64-bit version with mdadm 3.2.1. The
server hardware has dual socket Westmere CPUs (4 cores each), 24 GB of
RAM, and 24 hard drives connected via SAS.

I create an md0 array with 23 active drives, 1 hot-spare, RAID 5, and
64K chunk. After synchronization is complete, I have:

root::~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdf1[23](S) sdi1[22] sdh1[21] sdg1[20] sde1[19]
sdd1[18] sdc1[17] sdo1[16] sdn1[15] sdq1[14] sdp1[13] sdr1[12]
sdm1[11] sdl1[10] sdk1[9] sdj1[8] sdv1[7] sdu1[6] sdt1[5] sds1[4]
sdy1[3] sdx1[2] sdb1[1] sdw1[0]
      2149005056 blocks super 1.2 level 5, 64k chunk, algorithm 2
[23/23] [UUUUUUUUUUUUUUUUUUUUUUU]

Then I remove an active drive from the system by unplugging it. udev
catches the event, and fdisk -l reports one less drive. In this case,
I remove /dev/sdv.

However, /proc/mdstat remains unchanged. It's as if md has no idea
that the drive disappeared. I would expect md at this point to have
detected the removal, and to have automatically kicked-off a resync
using the included hot-spare. But this does not occur.

If I then run mdadm -R /dev/md0, in an attempt to "wake up" md, then
md does realize the change, and does start the resyncing.

I do not believe this is normal behavior. Can you advise?

Thank you!
-Tommy

^ permalink raw reply

* Re: md array does not detect drive removal: mdadm 3.2.1, Linux 2.6.38
From: CoolCold @ 2011-06-06 21:25 UTC (permalink / raw)
  To: fibreraid@gmail.com; +Cc: linux-raid
In-Reply-To: <BANLkTikv0avX6tYoG8g+pgUznopHCA4tyA@mail.gmail.com>

On Mon, Jun 6, 2011 at 10:20 PM, fibreraid@gmail.com
<fibreraid@gmail.com> wrote:
> Hello,
>
> I am running Linux kernel 2.6.38 64-bit version with mdadm 3.2.1. The
> server hardware has dual socket Westmere CPUs (4 cores each), 24 GB of
> RAM, and 24 hard drives connected via SAS.
>
> I create an md0 array with 23 active drives, 1 hot-spare, RAID 5, and
> 64K chunk. After synchronization is complete, I have:
>
> root::~# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdf1[23](S) sdi1[22] sdh1[21] sdg1[20] sde1[19]
> sdd1[18] sdc1[17] sdo1[16] sdn1[15] sdq1[14] sdp1[13] sdr1[12]
> sdm1[11] sdl1[10] sdk1[9] sdj1[8] sdv1[7] sdu1[6] sdt1[5] sds1[4]
> sdy1[3] sdx1[2] sdb1[1] sdw1[0]
>      2149005056 blocks super 1.2 level 5, 64k chunk, algorithm 2
> [23/23] [UUUUUUUUUUUUUUUUUUUUUUU]
>
> Then I remove an active drive from the system by unplugging it. udev
> catches the event, and fdisk -l reports one less drive. In this case,
> I remove /dev/sdv.
>
> However, /proc/mdstat remains unchanged. It's as if md has no idea
> that the drive disappeared. I would expect md at this point to have
> detected the removal, and to have automatically kicked-off a resync
> using the included hot-spare. But this does not occur.
>
> If I then run mdadm -R /dev/md0, in an attempt to "wake up" md, then
> md does realize the change, and does start the resyncing.
I guess md realizes there is no drive when write/read error occurs,
which gonna happen pretty soon if array is in usage, can you set some
dd reading and then remove drive?

>
> I do not believe this is normal behavior. Can you advise?
>
> Thank you!
> -Tommy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Durval Menezes @ 2011-06-07  5:03 UTC (permalink / raw)
  To: linux-raid; +Cc: Brad Campbell, Drew
In-Reply-To: <BANLkTikPxQvyDD3d_A6d3+GLKOLgqFJbFw@mail.gmail.com>

Hello Folks,

Just finished the "repair". It completed OK, and over SMART the HD now
shows a "Reallocated_Sector_Ct" of 291 (which shows that many bad
sectors have been remapped), but it's also still reporting 4
"Current_Pending_Sector" and 4 "Offline_Uncorrectable"... which I
think means exactly the same thing, ie, that there are 4 "active"
(from the HD perspective) sectors on the drive still detected as bad
and not remapped.

I've been thinking about exactly what that means, and I think that
these 4 sectors are either A) outside the RAID partition (not very
probable as this partition occupies more than 99.99% of the disk,
leaving just a small, less than 105MB area at the beginning), or B)
some kind of metadata or unused space that hasn't been read and
rewritten by the "repair" I've just completed. I've just done a "dd
bs=1024k count=105 </dev/DISK >/dev/null" to account for the
hyphotesys A), and come out empty: no errors, and the drive still
shows 4 bad, unmapped sectors on SMART.

So, by elimination, it must be either case B) above, or a bug in the
linux md code (which prevents it from hitting every needed block on
the disk), or a bug in SMART (which makes it report inexistent bad
sectors). I've just started running a "smart long test" on the disk
(which will try to read all of its sectors, reporting the first error
by LBA) and see what happens. If it shows no errors, I will know it's
a SMART bug. If it shows errors, it must be in a unused/metadata block
or a bug in linux md.

Either way, my plan is then to try a  plain "dd" (no "dd_repair", at
least not now) of this failing disk to a new one; if it goes by
without any errors, I will know it's a bug in SMART. if it hits any
errors, I will have the first errors position (from "dd" point of
view) and then I will try and dump that specific sector with dd_repair
and examine it.

I will keep you posted.

Cheers,
-- 
   Durval Menezes.


On Mon, Jun 6, 2011 at 3:06 PM, Durval Menezes <durval.menezes@gmail.com> wrote:
> Hello Brad, Drew,
>
> Thanks for reminding me of the hammering a RAID level conversion would  cause.
> This is certainly a major  reason to avoid the RAID5->RAID6->RAID5 route.
>
> The "repair" has been running here for a few days already, with the
> server online, and ought to finish in 24 more hours. So far (thanks to
> the automatic rewrite relocation) the number of  uncorrectable sectors
> being reported by SMART has dropped from 40 to 20 , so it seems the
> repair is  doing its job. Lets just hope the disk has enough  spare
> sectors  to remap all the bad sectors; if it does, a simple "dd "from
> the bad disk to  its replacement ought to  do the job  (as you have
> indicated).
>
> On the other hand, as this "dd" has to be done with the array offline,
> it will entail in some downtime (although not as much as having to
> restore the whole array from backups).... not ideal, but not too bad
> either.
>
> In case worst comes to worst, I have an up-to-date offline backup of
> the contents of the whole array, so if something really bad happens, I
> have something to restore from.
>
> It would be great to have a
> "duplicate-this-bad-old-disk-into-this-shiny-new-disk"  functionality,
> as it would enable  an almost-no-downtime disk replacement with
> minimum  risk, but it seems we can't have everything... :-0 Maybe it's
> something for the wishlist?
>
> About mishaps with "dd", I think everyone  who ever dealt with a
> system  (not just Linux)  on the level we do has sometime gone through
> something similar... the last time I remember doing this was many
> years ago, before  Linux existed, when me and a few friends spent a
> wonderful night installing  William Jolitz ' then-new 386/BSD  on a HD
>  (a process which *required*  dd)  and trashing its Windows partitions
> (which contained the only copy of the graduation thesis of one of us,
> due in a few days).
>
> Thanks for all the help,
> --
>    Durval Menezes.
>
> On Mon, Jun 6, 2011 at 12:54 PM, Brad Campbell <brad@fnarfbargle.com> wrote:
>>
>> On 06/06/11 23:37, Drew wrote:
>>>>
>>>> Now, if I'm off the wall and missing something blindingly obvious feel free
>>>> to thump me with a clue bat (it would not be the first time).
>>>>
>>>> I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB
>>>> to complete idiocy on my part, so I know the sting of lost or corrupted
>>>> data.
>>>
>>> I think you've covered the process in more detail, including pitfalls,
>>> then I have. :-) Only catch is where would you find a cheap 2-3TB
>>> drive right now?
>>
>> I bought 10 recently for about $90 each. It's all relative, but I consider ~$45 / TB cheap.
>>
>>> I also know the sting of mixing stupidity and dd. ;-) A friend was
>>> helping me do some complex rework with dd on one of my disks. Being
>>> the n00b I followed his instructions exactly, and him being the expert
>>> (and assuming I wasn't the n00b I was back then) didn't double check
>>> my work. Net result was I backed the MBR/Partition Table up using dd,
>>> but did so to a partition on the drive we were working on. There may
>>> have been some alcohol involved (I was in University), the revised
>>> data we inserted failed, and next thing you know I'm running Partition
>>> Magic (the gnu tools circa 2005 failed to detect anything) to try and
>>> recover the partition table. No backups obviously. ;-)
>>
>> Similar to my
>>
>> dd if=/dev/zero of=/dev/sdb bs=1M count=100
>>
>> except instead of the target disk, it was to a raid array member that was currently active. To its credit, ext3 and fsck managed to give me most of my data back, even if I had to spend months intermittently sorting/renaming inode numbers from lost+found into files and directories.
>>
>> I'd like to claim Alcohol as a mitigating factor (hell, it gets people off charges in our court system all the time) but unfortunately I was just stupid.
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: Brad Campbell @ 2011-06-07  5:35 UTC (permalink / raw)
  To: Durval Menezes; +Cc: linux-raid, Drew
In-Reply-To: <BANLkTi=+2GFWRFKT0SCXRtkdn71msgDY=g@mail.gmail.com>

On 07/06/11 13:03, Durval Menezes wrote:
> Hello Folks,
>
> Just finished the "repair". It completed OK, and over SMART the HD now
> shows a "Reallocated_Sector_Ct" of 291 (which shows that many bad
> sectors have been remapped), but it's also still reporting 4
> "Current_Pending_Sector" and 4 "Offline_Uncorrectable"... which I
> think means exactly the same thing, ie, that there are 4 "active"
> (from the HD perspective) sectors on the drive still detected as bad
> and not remapped.
>
> I've been thinking about exactly what that means, and I think that
> these 4 sectors are either A) outside the RAID partition (not very
> probable as this partition occupies more than 99.99% of the disk,
> leaving just a small, less than 105MB area at the beginning), or B)
> some kind of metadata or unused space that hasn't been read and
> rewritten by the "repair" I've just completed. I've just done a "dd
> bs=1024k count=105</dev/DISK>/dev/null" to account for the
> hyphotesys A), and come out empty: no errors, and the drive still
> shows 4 bad, unmapped sectors on SMART.
>
> So, by elimination, it must be either case B) above, or a bug in the
> linux md code (which prevents it from hitting every needed block on
> the disk), or a bug in SMART (which makes it report inexistent bad
>
Try running a SMART long test smartctl -t long and it will tell you whether the sectors are really 
bad or not.
I've had instances where the firmware still thought that some previously pending sectors were still 
pending until I forced a test, at which time the drive came to its senses and they went away.

I believe if you wait until the drive gets around to doing its periodic offline data collection 
you'll see the same thing, but a long test is nice as it will give you an actual block number for 
the first failure (if you have one)


^ permalink raw reply

* Re: md array does not detect drive removal: mdadm 3.2.1, Linux 2.6.38
From: fibreraid @ 2011-06-07  7:01 UTC (permalink / raw)
  To: CoolCold; +Cc: linux-raid
In-Reply-To: <BANLkTi=HkQNQgow-47Mr7cofNGAhO0vOiQ@mail.gmail.com>

Hello,

I did test IO, and upon issuing IO, then md correctly detected the
failure and began a rebuild. However, my opinion is that this is
inadequate and actually, I do not believe this is correct behavior. As
I recall from prior experiences with md, md would initiate a rebuild
based on drive removal only as well, even without any pending IO.

I would appreciate some further feedback as to this behavior. Thanks!

-Tommy


On Mon, Jun 6, 2011 at 2:25 PM, CoolCold <coolthecold@gmail.com> wrote:
> On Mon, Jun 6, 2011 at 10:20 PM, fibreraid@gmail.com
> <fibreraid@gmail.com> wrote:
>> Hello,
>>
>> I am running Linux kernel 2.6.38 64-bit version with mdadm 3.2.1. The
>> server hardware has dual socket Westmere CPUs (4 cores each), 24 GB of
>> RAM, and 24 hard drives connected via SAS.
>>
>> I create an md0 array with 23 active drives, 1 hot-spare, RAID 5, and
>> 64K chunk. After synchronization is complete, I have:
>>
>> root::~# cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sdf1[23](S) sdi1[22] sdh1[21] sdg1[20] sde1[19]
>> sdd1[18] sdc1[17] sdo1[16] sdn1[15] sdq1[14] sdp1[13] sdr1[12]
>> sdm1[11] sdl1[10] sdk1[9] sdj1[8] sdv1[7] sdu1[6] sdt1[5] sds1[4]
>> sdy1[3] sdx1[2] sdb1[1] sdw1[0]
>>      2149005056 blocks super 1.2 level 5, 64k chunk, algorithm 2
>> [23/23] [UUUUUUUUUUUUUUUUUUUUUUU]
>>
>> Then I remove an active drive from the system by unplugging it. udev
>> catches the event, and fdisk -l reports one less drive. In this case,
>> I remove /dev/sdv.
>>
>> However, /proc/mdstat remains unchanged. It's as if md has no idea
>> that the drive disappeared. I would expect md at this point to have
>> detected the removal, and to have automatically kicked-off a resync
>> using the included hot-spare. But this does not occur.
>>
>> If I then run mdadm -R /dev/md0, in an attempt to "wake up" md, then
>> md does realize the change, and does start the resyncing.
> I guess md realizes there is no drive when write/read error occurs,
> which gonna happen pretty soon if array is in usage, can you set some
> dd reading and then remove drive?
>
>>
>> I do not believe this is normal behavior. Can you advise?
>>
>> Thank you!
>> -Tommy
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Best regards,
> [COOLCOLD-RIPN]
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* 2.6.39[.1]: raid1 check blocks jbd on other md more than 120 seconds -- workaround
From: Frank van Maarseveen @ 2011-06-07  8:24 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <20110602093644.GA8620@janus>

I did some more testing and apparently the issue can be avoided by
choosing the deadline scheduler instead of cfq. Either configure the
kernel with

	CONFIG_DEFAULT_IOSCHED="deadline"

or select the deadline I/O scheduler per device. In my case:

	echo deadline >/sys/block/sda/queue/scheduler
	echo deadline >/sys/block/sdb/queue/scheduler

-- 
Frank

^ permalink raw reply

* Re: Maximizing failed disk replacement on a RAID5 array
From: John Robinson @ 2011-06-07  8:52 UTC (permalink / raw)
  To: Durval Menezes; +Cc: linux-raid, Brad Campbell, Drew
In-Reply-To: <BANLkTikPxQvyDD3d_A6d3+GLKOLgqFJbFw@mail.gmail.com>

On 06/06/2011 19:06, Durval Menezes wrote:
[...]
> It would be great to have a
> "duplicate-this-bad-old-disk-into-this-shiny-new-disk"  functionality,
> as it would enable  an almost-no-downtime disk replacement with
> minimum  risk, but it seems we can't have everything... :-0 Maybe it's
> something for the wishlist?

It's already on the wishlist, described as a hot replace.

Cheers,

John.


^ permalink raw reply

* Re: sector I/O error cause disk to be "faulty" in raid5
From: John Robinson @ 2011-06-07  8:53 UTC (permalink / raw)
  To: hank peng; +Cc: linux-raid
In-Reply-To: <BANLkTikpcH=navhOAFA2Jw+CT+rQfpOwcg@mail.gmail.com>

On 06/06/2011 14:28, hank peng wrote:
> Hi, everybody:
> In current raid5 implementation, if a r/w error occured at some
> specific sectors on a disk, the disk will be labeled as "faulty".
> Here, I want to say in most cases, this is failure indication of those
> sectors not the whole disk. Should we make some changes to be more
> reasonable?

It's already on the wishlist as a bad block map.

Cheers,

John.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox