linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] md superblock update failures
@ 2005-03-16 13:05 Lars Marowsky-Bree
  2005-03-16 13:22 ` Lars Marowsky-Bree
  2005-03-16 22:15 ` Neil Brown
  0 siblings, 2 replies; 6+ messages in thread
From: Lars Marowsky-Bree @ 2005-03-16 13:05 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 343 bytes --]

Mark found a bug where md doesn't handle write failures when trying to
update the superblock.

Attached is the fix he sent to us, and which seems to apply fine to
2.6.11 too.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business


[-- Attachment #2: md-superblock-failures --]
[-- Type: text/plain, Size: 2050 bytes --]

From: Mark Rustad
Subject: md does not handle write failures for the superblock
Patch-mainline: 2.6.12
References: 65306

Description by Mark:

I have found that superblock updates that experience write failures to a
raid component device, do not fail the device out of the raid. This
results in the raid superblock being updated 100 times and ultimately
simply fails. It takes a different type of failing access to the failed
device to finally fail the device out of the raid. This can be seen by
simply pulling out a raid device in an idle system (but with sgraidmon &
mdadmd running).

The following patch will fail the failing device out of the raid after
the attempted superblock update and then retry the update with one fewer
device.  This seems to work very well in our system.
 
 
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Lars Marowsky-Bree <lmb@suse.de>

Index: linux-2.6.5/drivers/md/md.c
===================================================================
--- linux-2.6.5.orig/drivers/md/md.c	2005-03-16 13:57:10.381445927 +0100
+++ linux-2.6.5/drivers/md/md.c	2005-03-16 13:57:10.714396523 +0100
@@ -1115,6 +1115,7 @@ static void export_array(mddev_t *mddev)
 {
 	struct list_head *tmp;
 	mdk_rdev_t *rdev;
+	mdk_rdev_t *frdev;
 
 	ITERATE_RDEV(mddev,rdev,tmp) {
 		if (!rdev->mddev) {
@@ -1288,6 +1289,7 @@ repeat:
 		mdname(mddev),mddev->in_sync);
 
 	err = 0;
+	frdev = 0;
 	ITERATE_RDEV(mddev,rdev,tmp) {
 		char b[BDEVNAME_SIZE];
 		dprintk(KERN_INFO "md: ");
@@ -1296,13 +1298,21 @@ repeat:
 
 		dprintk("%s ", bdevname(rdev->bdev,b));
 		if (!rdev->faulty) {
-			err += write_disk_sb(rdev);
+			int ret;
+			ret = write_disk_sb(rdev);
+			if (ret) {
+				frdev = rdev;	/* Save failed device */
+				err += ret;
+			}
 		} else
 			dprintk(")\n");
 		if (!err && mddev->level == LEVEL_MULTIPATH)
 			/* only need to write one superblock... */
 			break;
 	}
+	if (frdev)
+		md_error(mddev, frdev);	/* Fail the failed device */
+
 	if (err) {
 		if (--count) {
 			printk(KERN_ERR "md: errors occurred during superblock"

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-03-18 12:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-16 13:05 [patch] md superblock update failures Lars Marowsky-Bree
2005-03-16 13:22 ` Lars Marowsky-Bree
2005-03-16 14:01   ` Michael Tokarev
2005-03-16 22:15 ` Neil Brown
2005-03-18 10:39   ` Lars Marowsky-Bree
2005-03-18 12:57     ` Peter T. Breuer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).