linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH md 0 of 4] Introduction
@ 2004-11-02  3:37 NeilBrown
  2004-11-02  3:37 ` [PATCH md 1 of 4] Fix problem with md/linear for devices larger than 2 terabytes NeilBrown
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: NeilBrown @ 2004-11-02  3:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid

Following are 4 patches for md/raid against 2.6.10-rc1-mm2.

1/ Fix problem with linear arrays if component devices are > 2terabytes
2/ Fix data corruption in (experimental) RAID6 personality
3/ Fix possible oops with unplug_timer firing at the wrong time.
4/ Add new md personality "faulty".
    "Faulty" can be used to inject faults and so test failure modes
    of other raid levels and of filesystes.

NeilBrown


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH md 2 of 4] Fix raid6 problem
  2004-11-02  3:37 [PATCH md 0 of 4] Introduction NeilBrown
  2004-11-02  3:37 ` [PATCH md 1 of 4] Fix problem with md/linear for devices larger than 2 terabytes NeilBrown
@ 2004-11-02  3:37 ` NeilBrown
  2004-11-02  3:37 ` [PATCH md 3 of 4] Delete unplug timer before shutting down md array NeilBrown
  2004-11-02  3:37 ` [PATCH md 4 of 4] "Faulty" personality of md NeilBrown
  3 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2004-11-02  3:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Sometimes it didn't read all (working) drives before
a parity calculation.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/raid6main.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2004-11-02 14:20:15.000000000 +1100
+++ ./drivers/md/raid6main.c	2004-11-02 14:20:15.000000000 +1100
@@ -734,7 +734,6 @@ static void compute_parity(struct stripe
 	case READ_MODIFY_WRITE:
 		BUG();		/* READ_MODIFY_WRITE N/A for RAID-6 */
 	case RECONSTRUCT_WRITE:
-	case UPDATE_PARITY:	/* Is this right? */
 		for (i= disks; i-- ;)
 			if ( i != pd_idx && i != qd_idx && sh->dev[i].towrite ) {
 				chosen = sh->dev[i].towrite;
@@ -770,7 +769,8 @@ static void compute_parity(struct stripe
 		i = d0_idx;
 		do {
 			ptrs[count++] = page_address(sh->dev[i].page);
-
+			if (count <= disks-2 && !test_bit(R5_UPTODATE, &sh->dev[i].flags))
+				printk("block %d/%d not uptodate on parity calc\n", i,count);
 			i = raid6_next_disk(i, disks);
 		} while ( i != d0_idx );
 //		break;
@@ -818,7 +818,7 @@ static void compute_block_1(struct strip
 			if (test_bit(R5_UPTODATE, &sh->dev[i].flags))
 				ptr[count++] = p;
 			else
-				PRINTK("compute_block() %d, stripe %llu, %d"
+				printk("compute_block() %d, stripe %llu, %d"
 				       " not present\n", dd_idx,
 				       (unsigned long long)sh->sector, i);
 
@@ -875,6 +875,9 @@ static void compute_block_2(struct strip
 		do {
 			ptrs[count++] = page_address(sh->dev[i].page);
 			i = raid6_next_disk(i, disks);
+			if (i != dd_idx1 && i != dd_idx2 &&
+			    !test_bit(R5_UPTODATE, &sh->dev[i].flags))
+				printk("compute_2 with missing block %d/%d\n", count, i);
 		} while ( i != d0_idx );
 
 		if ( failb == disks-2 ) {
@@ -1157,17 +1160,15 @@ static void handle_stripe(struct stripe_
 	 * parity, or to satisfy requests
 	 * or to load a block that is being partially written.
 	 */
-	if (to_read || non_overwrite || (syncing && (uptodate < disks))) {
+	if (to_read || non_overwrite || (to_write && failed) || (syncing && (uptodate < disks))) {
 		for (i=disks; i--;) {
 			dev = &sh->dev[i];
 			if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
 			    (dev->toread ||
 			     (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) ||
 			     syncing ||
-			     (failed >= 1 && (sh->dev[failed_num[0]].toread ||
-					 (sh->dev[failed_num[0]].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num[0]].flags)))) ||
-			     (failed >= 2 && (sh->dev[failed_num[1]].toread ||
-					 (sh->dev[failed_num[1]].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num[1]].flags))))
+			     (failed >= 1 && (sh->dev[failed_num[0]].toread || to_write)) ||
+			     (failed >= 2 && (sh->dev[failed_num[1]].toread || to_write))
 				    )
 				) {
 				/* we would like to get this block, possibly

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  3:37 [PATCH md 0 of 4] Introduction NeilBrown
  2004-11-02  3:37 ` [PATCH md 1 of 4] Fix problem with md/linear for devices larger than 2 terabytes NeilBrown
  2004-11-02  3:37 ` [PATCH md 2 of 4] Fix raid6 problem NeilBrown
@ 2004-11-02  3:37 ` NeilBrown
  2004-11-02  4:57   ` Andrew Morton
  2004-11-03 23:03   ` Andrew Morton
  2004-11-02  3:37 ` [PATCH md 4 of 4] "Faulty" personality of md NeilBrown
  3 siblings, 2 replies; 20+ messages in thread
From: NeilBrown @ 2004-11-02  3:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


As the unplug timer can potentially fire at any time, and
and it access data that is released by the md ->stop function,
we need to del_timer_sync before releasing that data.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/linear.c    |    1 +
 ./drivers/md/multipath.c |    1 +
 ./drivers/md/raid0.c     |    1 +
 ./drivers/md/raid1.c     |    1 +
 ./drivers/md/raid10.c    |    1 +
 ./drivers/md/raid5.c     |    1 +
 ./drivers/md/raid6main.c |    1 +
 7 files changed, 7 insertions(+)

diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
--- ./drivers/md/linear.c~current~	2004-11-02 14:20:12.000000000 +1100
+++ ./drivers/md/linear.c	2004-11-02 14:20:18.000000000 +1100
@@ -231,6 +231,7 @@ static int linear_stop (mddev_t *mddev)
 {
 	linear_conf_t *conf = mddev_to_conf(mddev);
   
+	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
 	kfree(conf->hash_table);
 	kfree(conf);
 

diff ./drivers/md/multipath.c~current~ ./drivers/md/multipath.c
--- ./drivers/md/multipath.c~current~	2004-11-02 14:20:18.000000000 +1100
+++ ./drivers/md/multipath.c	2004-11-02 14:20:18.000000000 +1100
@@ -566,6 +566,7 @@ static int multipath_stop (mddev_t *mdde
 
 	md_unregister_thread(mddev->thread);
 	mddev->thread = NULL;
+	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
 	mempool_destroy(conf->pool);
 	kfree(conf->multipaths);
 	kfree(conf);

diff ./drivers/md/raid0.c~current~ ./drivers/md/raid0.c
--- ./drivers/md/raid0.c~current~	2004-11-02 14:20:18.000000000 +1100
+++ ./drivers/md/raid0.c	2004-11-02 14:20:18.000000000 +1100
@@ -385,6 +385,7 @@ static int raid0_stop (mddev_t *mddev)
 {
 	raid0_conf_t *conf = mddev_to_conf(mddev);
 
+	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
 	kfree (conf->hash_table);
 	conf->hash_table = NULL;
 	kfree (conf->strip_zone);

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2004-11-02 14:20:18.000000000 +1100
+++ ./drivers/md/raid1.c	2004-11-02 14:20:18.000000000 +1100
@@ -1293,6 +1293,7 @@ static int stop(mddev_t *mddev)
 
 	md_unregister_thread(mddev->thread);
 	mddev->thread = NULL;
+	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
 	if (conf->r1bio_pool)
 		mempool_destroy(conf->r1bio_pool);
 	if (conf->mirrors)

diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~	2004-11-02 14:20:18.000000000 +1100
+++ ./drivers/md/raid10.c	2004-11-02 14:20:18.000000000 +1100
@@ -1744,6 +1744,7 @@ static int stop(mddev_t *mddev)
 
 	md_unregister_thread(mddev->thread);
 	mddev->thread = NULL;
+	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
 	if (conf->r10bio_pool)
 		mempool_destroy(conf->r10bio_pool);
 	if (conf->mirrors)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2004-11-02 14:20:18.000000000 +1100
+++ ./drivers/md/raid5.c	2004-11-02 14:20:18.000000000 +1100
@@ -1707,6 +1707,7 @@ static int stop (mddev_t *mddev)
 	mddev->thread = NULL;
 	shrink_stripes(conf);
 	free_pages((unsigned long) conf->stripe_hashtbl, HASH_PAGES_ORDER);
+	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
 	kfree(conf);
 	mddev->private = NULL;
 	return 0;

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2004-11-02 14:20:15.000000000 +1100
+++ ./drivers/md/raid6main.c	2004-11-02 14:20:18.000000000 +1100
@@ -1878,6 +1878,7 @@ static int stop (mddev_t *mddev)
 	mddev->thread = NULL;
 	shrink_stripes(conf);
 	free_pages((unsigned long) conf->stripe_hashtbl, HASH_PAGES_ORDER);
+	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
 	kfree(conf);
 	mddev->private = NULL;
 	return 0;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH md 4 of 4] "Faulty" personality of md
  2004-11-02  3:37 [PATCH md 0 of 4] Introduction NeilBrown
                   ` (2 preceding siblings ...)
  2004-11-02  3:37 ` [PATCH md 3 of 4] Delete unplug timer before shutting down md array NeilBrown
@ 2004-11-02  3:37 ` NeilBrown
  2004-11-02  9:31   ` Lars Marowsky-Bree
  3 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2004-11-02  3:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


The 'faulty' personality provides a layer over any block device
in which errors may be synthesised.

A variety of errors are possible including transient and persistent
read and write errors, and read errors that persist until the next
write.

There error mode can be changed on a live array.

Accessing this personality requires mdadm 2.8.0 or later.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/Kconfig        |    9 +
 ./drivers/md/Makefile       |    1 
 ./drivers/md/faulty.c       |  343 ++++++++++++++++++++++++++++++++++++++++++++
 ./drivers/md/md.c           |   13 +
 ./include/linux/raid/md_k.h |    7 
 5 files changed, 371 insertions(+), 2 deletions(-)

diff ./drivers/md/Kconfig~current~ ./drivers/md/Kconfig
--- ./drivers/md/Kconfig~current~	2004-11-02 14:20:22.000000000 +1100
+++ ./drivers/md/Kconfig	2004-11-02 14:20:22.000000000 +1100
@@ -164,6 +164,15 @@ config MD_MULTIPATH
 
 	  If unsure, say N.
 
+config MD_FAULTY
+	tristate "Faulty test module for MD"
+	depends on BLK_DEV_MD
+	help
+	  The "faulty" module allows for a block device that occasionally returns
+	  read or write errors.  It is useful for testing.
+
+	  In unsure, say N.
+
 config BLK_DEV_DM
 	tristate "Device mapper support"
 	depends on MD

diff ./drivers/md/Makefile~current~ ./drivers/md/Makefile
--- ./drivers/md/Makefile~current~	2004-11-02 14:20:22.000000000 +1100
+++ ./drivers/md/Makefile	2004-11-02 14:20:22.000000000 +1100
@@ -24,6 +24,7 @@ obj-$(CONFIG_MD_RAID10)		+= raid10.o
 obj-$(CONFIG_MD_RAID5)		+= raid5.o xor.o
 obj-$(CONFIG_MD_RAID6)		+= raid6.o xor.o
 obj-$(CONFIG_MD_MULTIPATH)	+= multipath.o
+obj-$(CONFIG_MD_FAULTY)		+= faulty.o
 obj-$(CONFIG_BLK_DEV_MD)	+= md.o
 obj-$(CONFIG_BLK_DEV_DM)	+= dm-mod.o
 obj-$(CONFIG_DM_CRYPT)		+= dm-crypt.o

diff ./drivers/md/faulty.c~current~ ./drivers/md/faulty.c
--- ./drivers/md/faulty.c~current~	2004-11-02 14:20:22.000000000 +1100
+++ ./drivers/md/faulty.c	2004-11-02 14:20:22.000000000 +1100
@@ -0,0 +1,343 @@
+/*
+ * faulty.c : Multiple Devices driver for Linux
+ *
+ * Copyright (C) 2004 Neil Brown
+ *
+ * fautly-device-simulator personality for md
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * You should have received a copy of the GNU General Public License
+ * (for example /usr/src/linux/COPYING); if not, write to the Free
+ * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+
+/* 
+ * The "faulty" personality causes some requests to fail.
+ *
+ * Possible failure modes are:
+ *   reads fail "randomly" but succeed on retry
+ *   writes fail "randomly" but succeed on retry
+ *   reads for some address fail and then persist until a write
+ *   reads for some address fail and then persist irrespective of write
+ *   writes for some address fail and persist
+ *   all writes fail
+ *
+ * Different modes can be active at a time, but only
+ * one can be set at array creation.  Others can be added later.
+ * A mode can be one-shot or recurrent with the recurrance being 
+ * once in every N requests.
+ * The bottom 5 bits of the "layout" indicate the mode.  The 
+ * remainder indicate a period, or 0 for one-shot.
+ *
+ * There is an implementation limit on the number of concurrently
+ * persisting-faulty blocks. When a new fault is requested that would
+ * exceed the limit, it is ignored.
+ * All current faults can be clear using a layout of "0".
+ *
+ * Requests are always sent to the device.  If they are to fail,
+ * we clone the bio and insert a new b_end_io into the chain.
+ */
+
+#define	WriteTransient	0
+#define	ReadTransient	1
+#define	WritePersistent	2
+#define	ReadPersistent	3
+#define	WriteAll	4 /* doesn't go to device */
+#define	ReadFixable	5
+#define	Modes	6
+
+#define	ClearErrors	31
+#define	ClearFaults	30
+
+#define AllPersist	100 /* internal use only */
+#define	NoPersist	101
+
+#define	ModeMask	0x1f
+#define	ModeShift	5
+
+#define MaxFault	50
+#include <linux/raid/md.h>
+
+
+static int faulty_fail(struct bio *bio, unsigned int bytes_done, int error)
+{
+	struct bio *b = bio->bi_private;
+
+	b->bi_size = bio->bi_size;
+	b->bi_sector = bio->bi_sector;
+
+	if (bio->bi_size == 0)
+		bio_put(bio);
+
+	clear_bit(BIO_UPTODATE, &b->bi_flags);
+	return (b->bi_end_io)(b, bytes_done, -EIO);
+}
+
+typedef struct faulty_conf {
+	int period[Modes];
+	atomic_t counters[Modes];
+	sector_t faults[MaxFault];
+	int	modes[MaxFault];
+	int nfaults;
+	mdk_rdev_t *rdev;
+} conf_t;
+
+static int check_mode(conf_t *conf, int mode)
+{
+	if (conf->period[mode] == 0 &&
+	    atomic_read(&conf->counters[mode]) <= 0)
+		return 0; /* no failure, no decrement */
+
+
+	if (atomic_dec_and_test(&conf->counters[mode])) {
+		if (conf->period[mode])
+			atomic_set(&conf->counters[mode], conf->period[mode]);
+		return 1;
+	}
+	return 0;
+}
+
+static int check_sector(conf_t *conf, sector_t start, sector_t end, int dir)
+{
+	/* If we find a ReadFixable sector, we fix it ... */
+	int i;
+	for (i=0; i<conf->nfaults; i++)
+		if (conf->faults[i] >= start &&
+		    conf->faults[i] < end) {
+			/* found it ... */
+			switch (conf->modes[i] * 2 + dir) {
+			case WritePersistent*2+WRITE: return 1;
+			case ReadPersistent*2+READ: return 1;
+			case ReadFixable*2+READ: return 1;
+			case ReadFixable*2+WRITE:
+				conf->modes[i] = NoPersist;
+				return 0;
+			case AllPersist*2+READ:
+			case AllPersist*2+WRITE: return 1;
+			default:
+				return 0;
+			}
+		}
+	return 0;
+}
+
+static void add_sector(conf_t *conf, sector_t start, int mode)
+{
+	int i;
+	int n = conf->nfaults;
+	for (i=0; i<conf->nfaults; i++)
+		if (conf->faults[i] == start) {
+			switch(mode) {
+			case NoPersist: conf->modes[i] = mode; return;
+			case WritePersistent:
+				if (conf->modes[i] == ReadPersistent ||
+				    conf->modes[i] == ReadFixable)
+					conf->modes[i] = AllPersist;
+				else
+					conf->modes[i] = WritePersistent;
+				return;
+			case ReadPersistent:
+				if (conf->modes[i] == WritePersistent)
+					conf->modes[i] = AllPersist;
+				else
+					conf->modes[i] = ReadPersistent;
+				return;
+			case ReadFixable:
+				if (conf->modes[i] == WritePersistent ||
+				    conf->modes[i] == ReadPersistent)
+					conf->modes[i] = AllPersist;
+				else
+					conf->modes[i] = ReadFixable;
+				return;
+			}
+		} else if (conf->modes[i] == NoPersist)
+			n = i;
+
+	if (n >= MaxFault)
+		return;
+	conf->faults[n] = start;
+	conf->modes[n] = mode;
+	if (conf->nfaults == n)
+		conf->nfaults = n+1;
+}
+
+static int make_request(request_queue_t *q, struct bio *bio)
+{
+	mddev_t *mddev = q->queuedata;
+	conf_t *conf = (conf_t*)mddev->private;
+	int failit = 0;
+
+	if (bio->bi_rw & 1) {
+		/* write request */
+		if (atomic_read(&conf->counters[WriteAll])) {
+			/* special case - don't decrement, don't generic_make_request,
+			 * just fail immediately
+			 */
+			bio_endio(bio, bio->bi_size, -EIO);
+			return 0;
+		}
+
+		if (check_sector(conf, bio->bi_sector, bio->bi_sector+(bio->bi_size>>9),
+				 WRITE))
+			failit = 1;
+		if (check_mode(conf, WritePersistent)) {
+			add_sector(conf, bio->bi_sector, WritePersistent);
+			failit = 1;
+		}
+		if (check_mode(conf, WriteTransient))
+			failit = 1;
+	} else {
+		/* read request */
+		if (check_sector(conf, bio->bi_sector, bio->bi_sector + (bio->bi_size>>9),
+				 READ))
+			failit = 1;
+		if (check_mode(conf, ReadTransient))
+			failit = 1;
+		if (check_mode(conf, ReadPersistent)) {
+			add_sector(conf, bio->bi_sector, ReadPersistent);
+			failit = 1;
+		}
+		if (check_mode(conf, ReadFixable)) {
+			add_sector(conf, bio->bi_sector, ReadFixable);
+			failit = 1;
+		}
+	}
+	if (failit) {
+		struct bio *b = bio_clone(bio, GFP_NOIO);
+		b->bi_bdev = conf->rdev->bdev;
+		b->bi_private = bio;
+		b->bi_end_io = faulty_fail;
+		generic_make_request(b);
+		return 0;
+	} else {
+		bio->bi_bdev = conf->rdev->bdev;
+		return 1;
+	}
+}
+
+static void status(struct seq_file *seq, mddev_t *mddev)
+{
+	conf_t *conf = (conf_t*)mddev->private;
+	int n;
+
+	if ((n=atomic_read(&conf->counters[WriteTransient])) != 0)
+		seq_printf(seq, " WriteTransient=%d(%d)",
+			   n, conf->period[WriteTransient]);
+
+	if ((n=atomic_read(&conf->counters[ReadTransient])) != 0)
+		seq_printf(seq, " ReadTransient=%d(%d)",
+			   n, conf->period[ReadTransient]);
+
+	if ((n=atomic_read(&conf->counters[WritePersistent])) != 0)
+		seq_printf(seq, " WritePersistent=%d(%d)",
+			   n, conf->period[WritePersistent]);
+
+	if ((n=atomic_read(&conf->counters[ReadPersistent])) != 0)
+		seq_printf(seq, " ReadPersistent=%d(%d)",
+			   n, conf->period[ReadPersistent]);
+
+
+	if ((n=atomic_read(&conf->counters[ReadFixable])) != 0)
+		seq_printf(seq, " ReadFixable=%d(%d)",
+			   n, conf->period[ReadFixable]);
+
+	if ((n=atomic_read(&conf->counters[WriteAll])) != 0)
+		seq_printf(seq, " WriteAll");
+
+	seq_printf(seq, " nfaults=%d", conf->nfaults);
+}
+
+
+static int reconfig(mddev_t *mddev, int layout, int chunk_size)
+{
+	int mode = layout & ModeMask;
+	int count = layout >> ModeShift;
+	conf_t *conf = mddev->private;
+
+	if (chunk_size != -1)
+		return -EINVAL;
+
+	/* new layout */
+	if (mode == ClearFaults) 
+		conf->nfaults = 0;
+	else if (mode == ClearErrors) {
+		int i;
+		for (i=0 ; i < Modes ; i++) {
+			conf->period[i] = 0;
+			atomic_set(&conf->counters[i], 0);
+		}
+	} else if (mode < Modes) {
+		conf->period[mode] = count;
+		if (!count) count++;
+		atomic_set(&conf->counters[mode], count);
+	} else
+		return -EINVAL;
+	mddev->layout = -1; /* makes sure further changes come through */
+	return 0;
+}
+
+static int run(mddev_t *mddev)
+{
+	mdk_rdev_t *rdev;
+	struct list_head *tmp;
+	int i;
+
+	conf_t *conf = kmalloc(sizeof(*conf), GFP_KERNEL);
+
+	for (i=0; i<Modes; i++) {
+		atomic_set(&conf->counters[i], 0);
+		conf->period[i] = 0;
+	}
+	conf->nfaults = 0;
+
+	ITERATE_RDEV(mddev, rdev, tmp)
+		conf->rdev = rdev;
+
+	mddev->array_size = mddev->size;
+	mddev->private = conf;
+
+	reconfig(mddev, mddev->layout, -1);
+
+	return 0;
+}
+
+static int stop(mddev_t *mddev)
+{
+	conf_t *conf = (conf_t *)mddev->private;
+
+	kfree(conf);
+	mddev->private = NULL;
+	return 0;
+}
+
+static mdk_personality_t faulty_personality =
+{
+	.name		= "faulty",
+	.owner		= THIS_MODULE,
+	.make_request	= make_request,
+	.run		= run,
+	.stop		= stop,
+	.status		= status,
+	.reconfig	= reconfig,
+};
+
+static int __init raid_init(void)
+{
+	return register_md_personality(FAULTY, &faulty_personality);
+}
+
+static void raid_exit(void)
+{
+	unregister_md_personality(FAULTY);
+}
+
+module_init(raid_init);
+module_exit(raid_exit);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("md-personality-10"); /* faulty */

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2004-11-02 14:20:22.000000000 +1100
+++ ./drivers/md/md.c	2004-11-02 14:20:22.000000000 +1100
@@ -2397,16 +2397,27 @@ static int update_array_info(mddev_t *md
 /*	    mddev->patch_version != info->patch_version || */
 	    mddev->ctime         != info->ctime         ||
 	    mddev->level         != info->level         ||
-	    mddev->layout        != info->layout        ||
+/*	    mddev->layout        != info->layout        || */
 	    !mddev->persistent	 != info->not_persistent||
 	    mddev->chunk_size    != info->chunk_size    )
 		return -EINVAL;
 	/* Check there is only one change */
 	if (mddev->size != info->size) cnt++;
 	if (mddev->raid_disks != info->raid_disks) cnt++;
+	if (mddev->layout != info->layout) cnt++;
 	if (cnt == 0) return 0;
 	if (cnt > 1) return -EINVAL;
 
+	if (mddev->layout != info->layout) {
+		/* Change layout
+		 * we don't need to do anything at the md level, the
+		 * personality will take care of it all.
+		 */
+		if (mddev->pers->reconfig == NULL)
+			return -EINVAL;
+		else
+			return mddev->pers->reconfig(mddev, info->layout, -1);
+	}
 	if (mddev->size != info->size) {
 		mdk_rdev_t * rdev;
 		struct list_head *tmp;

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2004-11-02 14:20:22.000000000 +1100
+++ ./include/linux/raid/md_k.h	2004-11-02 14:20:22.000000000 +1100
@@ -25,10 +25,12 @@
 #define MULTIPATH         7UL
 #define RAID6		  8UL
 #define	RAID10		  9UL
-#define MAX_PERSONALITY   10UL
+#define FAULTY		  10UL
+#define MAX_PERSONALITY   11UL
 
 #define	LEVEL_MULTIPATH		(-4)
 #define	LEVEL_LINEAR		(-1)
+#define	LEVEL_FAULTY		(-5)
 
 #define MaxSector (~(sector_t)0)
 #define MD_THREAD_NAME_MAX 14
@@ -36,6 +38,7 @@
 static inline int pers_to_level (int pers)
 {
 	switch (pers) {
+		case FAULTY:		return LEVEL_FAULTY;
 		case MULTIPATH:		return LEVEL_MULTIPATH;
 		case HSM:		return -3;
 		case TRANSLUCENT:	return -2;
@@ -53,6 +56,7 @@ static inline int pers_to_level (int per
 static inline int level_to_pers (int level)
 {
 	switch (level) {
+		case LEVEL_FAULTY: return FAULTY;
 		case LEVEL_MULTIPATH: return MULTIPATH;
 		case -3: return HSM;
 		case -2: return TRANSLUCENT;
@@ -290,6 +294,7 @@ struct mdk_personality_s
 	int (*sync_request)(mddev_t *mddev, sector_t sector_nr, int go_faster);
 	int (*resize) (mddev_t *mddev, sector_t sectors);
 	int (*reshape) (mddev_t *mddev, int raid_disks);
+	int (*reconfig) (mddev_t *mddev, int layout, int chunk_size);
 };
 
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH md 1 of 4] Fix problem with md/linear for devices larger than 2 terabytes
  2004-11-02  3:37 [PATCH md 0 of 4] Introduction NeilBrown
@ 2004-11-02  3:37 ` NeilBrown
  2004-11-02  3:37 ` [PATCH md 2 of 4] Fix raid6 problem NeilBrown
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: NeilBrown @ 2004-11-02  3:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Some size fields were "int" instead of "sector_t".

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/linear.c         |    8 +++++---
 ./include/linux/raid/linear.h |    4 ++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
--- ./drivers/md/linear.c~current~	2004-11-02 14:20:12.000000000 +1100
+++ ./drivers/md/linear.c	2004-11-02 14:20:12.000000000 +1100
@@ -116,7 +116,8 @@ static int linear_run (mddev_t *mddev)
 	linear_conf_t *conf;
 	struct linear_hash *table;
 	mdk_rdev_t *rdev;
-	int size, i, nb_zone, cnt;
+	int i, nb_zone, cnt;
+	sector_t size;
 	unsigned int curr_offset;
 	struct list_head *tmp;
 
@@ -265,10 +266,11 @@ static int linear_make_request (request_
 		char b[BDEVNAME_SIZE];
 
 		printk("linear_make_request: Block %llu out of bounds on "
-			"dev %s size %ld offset %ld\n",
+			"dev %s size %llu offset %llu\n",
 			(unsigned long long)block,
 			bdevname(tmp_dev->rdev->bdev, b),
-			tmp_dev->size, tmp_dev->offset);
+			(unsigned long long)tmp_dev->size,
+		        (unsigned long long)tmp_dev->offset);
 		bio_io_error(bio, bio->bi_size);
 		return 0;
 	}

diff ./include/linux/raid/linear.h~current~ ./include/linux/raid/linear.h
--- ./include/linux/raid/linear.h~current~	2004-11-02 14:20:12.000000000 +1100
+++ ./include/linux/raid/linear.h	2004-11-02 14:20:12.000000000 +1100
@@ -5,8 +5,8 @@
 
 struct dev_info {
 	mdk_rdev_t	*rdev;
-	unsigned long	size;
-	unsigned long	offset;
+	sector_t	size;
+	sector_t	offset;
 };
 
 typedef struct dev_info dev_info_t;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  5:00     ` Andrew Morton
@ 2004-11-02  4:28       ` Neil Brown
  2004-11-02  5:33         ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Brown @ 2004-11-02  4:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid

On Monday November 1, akpm@osdl.org wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> >
> > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > >
> > > As the unplug timer can potentially fire at any time, and
> > >  and it access data that is released by the md ->stop function,
> > >  we need to del_timer_sync before releasing that data.
> > 
> > raid really shouldn't know about unplug_timer.
> > 
> 
> I mean, if we're actually destroying the queue then blk_cleanup_queue()
> should have done the right thing here.  If we're not destroying the queue
> then an unplug timer expiry should be benign.
> 
> Bit confused as to what's happening here.

Not surprising.  It's a bit ugly.

md has the misfortune of working in a way that doesn't fit very well
with some aspects of the blkdev layer.

To create an md array, you open /dev/mdX, and use some ioctls to set
up the array.  Thus the block device must exist internally as a block
device before any details about it are available.

Similarly, to shut down an array, you open /dev/mdX and use an ioctl
to shut it down.  This it will still be open (though only once) after
it has been shut down.

(as I understand it) I can only safely use blk_cleanup_queue on the
queue after (or during) the last close.  Thus I cannot use it while
shutting down an array.
However I need to release the md personality as part of shutting down
the array, so I have to disable the unplug function (which is handled
by the personality).

I could change ->unplug_fn to point to something safe, but there is
still a race with the timer going off at just the wrong time.

Maybe an API that allowed by to change ->make_request_fn and
->unplug_fn (and possibly others) and then call some interface, and be
sure that no internal use of those functions would occur after given
interface was called.
This interface could then just do the del_timer_sync.

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  3:37 ` [PATCH md 3 of 4] Delete unplug timer before shutting down md array NeilBrown
@ 2004-11-02  4:57   ` Andrew Morton
  2004-11-02  5:00     ` Andrew Morton
  2004-11-03 23:03   ` Andrew Morton
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2004-11-02  4:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown <neilb@cse.unsw.edu.au> wrote:
>
> As the unplug timer can potentially fire at any time, and
>  and it access data that is released by the md ->stop function,
>  we need to del_timer_sync before releasing that data.

raid really shouldn't know about unplug_timer.

Is there some missing block level API?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  4:57   ` Andrew Morton
@ 2004-11-02  5:00     ` Andrew Morton
  2004-11-02  4:28       ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2004-11-02  5:00 UTC (permalink / raw)
  To: neilb, linux-raid

Andrew Morton <akpm@osdl.org> wrote:
>
> NeilBrown <neilb@cse.unsw.edu.au> wrote:
> >
> > As the unplug timer can potentially fire at any time, and
> >  and it access data that is released by the md ->stop function,
> >  we need to del_timer_sync before releasing that data.
> 
> raid really shouldn't know about unplug_timer.
> 

I mean, if we're actually destroying the queue then blk_cleanup_queue()
should have done the right thing here.  If we're not destroying the queue
then an unplug timer expiry should be benign.

Bit confused as to what's happening here.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  4:28       ` Neil Brown
@ 2004-11-02  5:33         ` Andrew Morton
  2004-11-02  7:12           ` Luca Berra
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2004-11-02  5:33 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Jens Axboe


(Added Jens)

Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> On Monday November 1, akpm@osdl.org wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > >
> > > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > > >
> > > > As the unplug timer can potentially fire at any time, and
> > > >  and it access data that is released by the md ->stop function,
> > > >  we need to del_timer_sync before releasing that data.
> > > 
> > > raid really shouldn't know about unplug_timer.
> > > 
> > 
> > I mean, if we're actually destroying the queue then blk_cleanup_queue()
> > should have done the right thing here.  If we're not destroying the queue
> > then an unplug timer expiry should be benign.
> > 
> > Bit confused as to what's happening here.
> 
> Not surprising.  It's a bit ugly.
> 
> md has the misfortune of working in a way that doesn't fit very well
> with some aspects of the blkdev layer.
> 
> To create an md array, you open /dev/mdX, and use some ioctls to set
> up the array.  Thus the block device must exist internally as a block
> device before any details about it are available.
> 
> Similarly, to shut down an array, you open /dev/mdX and use an ioctl
> to shut it down.  This it will still be open (though only once) after
> it has been shut down.
> 
> (as I understand it) I can only safely use blk_cleanup_queue on the
> queue after (or during) the last close.  Thus I cannot use it while
> shutting down an array.
> However I need to release the md personality as part of shutting down
> the array, so I have to disable the unplug function (which is handled
> by the personality).
> 
> I could change ->unplug_fn to point to something safe, but there is
> still a race with the timer going off at just the wrong time.

hm, OK.  So perhaps using /dev/mdX as a control channel for creating and
destroying /dev/mdX was unfortunate design.  Ho hum.

> Maybe an API that allowed by to change ->make_request_fn and
> ->unplug_fn (and possibly others) and then call some interface, and be
> sure that no internal use of those functions would occur after given
> interface was called.
> This interface could then just do the del_timer_sync.

Something like that would be nicer, I think - a centralised point with
nicely apologetic commentary rather than mysterious open-coded poking at
block internals ;)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  5:33         ` Andrew Morton
@ 2004-11-02  7:12           ` Luca Berra
  2004-11-03  0:29             ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Luca Berra @ 2004-11-02  7:12 UTC (permalink / raw)
  To: linux-raid

On Mon, Nov 01, 2004 at 09:33:33PM -0800, Andrew Morton wrote:
>> md has the misfortune of working in a way that doesn't fit very well
>> with some aspects of the blkdev layer.
>> 
>> To create an md array, you open /dev/mdX, and use some ioctls to set
>> up the array.  Thus the block device must exist internally as a block
>> device before any details about it are available.
>> 
>> Similarly, to shut down an array, you open /dev/mdX and use an ioctl
>> to shut it down.  This it will still be open (though only once) after
>> it has been shut down.
>> 
>> (as I understand it) I can only safely use blk_cleanup_queue on the
>> queue after (or during) the last close.  Thus I cannot use it while
>> shutting down an array.
>> However I need to release the md personality as part of shutting down
>> the array, so I have to disable the unplug function (which is handled
>> by the personality).
>> 
>> I could change ->unplug_fn to point to something safe, but there is
>> still a race with the timer going off at just the wrong time.
>
>hm, OK.  So perhaps using /dev/mdX as a control channel for creating and
>destroying /dev/mdX was unfortunate design.  Ho hum.

would it be unfeasible to create a custom (misc) character device, say
/dev/mdadm and use that to control arrays instead of /dev/md0?

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 4 of 4] "Faulty" personality of md
  2004-11-02  3:37 ` [PATCH md 4 of 4] "Faulty" personality of md NeilBrown
@ 2004-11-02  9:31   ` Lars Marowsky-Bree
  2004-11-02 23:30     ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Lars Marowsky-Bree @ 2004-11-02  9:31 UTC (permalink / raw)
  To: NeilBrown, Andrew Morton; +Cc: linux-raid

On 2004-11-02T14:37:45, NeilBrown <neilb@cse.unsw.edu.au> wrote:

> The 'faulty' personality provides a layer over any block device
> in which errors may be synthesised.
> 
> A variety of errors are possible including transient and persistent
> read and write errors, and read errors that persist until the next
> write.
> 
> There error mode can be changed on a live array.

Duplicates functionality already present with the device-mapper error
target, or functionality present with the (not yet merged) flakey dm
target.

I think we should strive for convergence, not for more duplication.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 4 of 4] "Faulty" personality of md
  2004-11-02  9:31   ` Lars Marowsky-Bree
@ 2004-11-02 23:30     ` Neil Brown
  0 siblings, 0 replies; 20+ messages in thread
From: Neil Brown @ 2004-11-02 23:30 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Andrew Morton, linux-raid

On Tuesday November 2, lmb@suse.de wrote:
> On 2004-11-02T14:37:45, NeilBrown <neilb@cse.unsw.edu.au> wrote:
> 
> > The 'faulty' personality provides a layer over any block device
> > in which errors may be synthesised.
> > 
> > A variety of errors are possible including transient and persistent
> > read and write errors, and read errors that persist until the next
> > write.
> > 
> > There error mode can be changed on a live array.
> 
> Duplicates functionality already present with the device-mapper error
> target, or functionality present with the (not yet merged) flakey dm
> target.

so this means you won't need to merge your "flakey" dm target??
(I couldn't find the "error' target after a quick look).

> 
> I think we should strive for convergence, not for more duplication.

Why? (genuine question)

And if we should, why does dm support raid1 and multipath?
(Actually I would be quite happy to discard the md/multipath code).

I'm somewhat ambivalent about convergence, but if you think is it a
good thing and want to work towards it, I'm open to co-operation.

However, that doesn't involve saying "We've already done that" with (I
think) an implication that what I had submitted therefore isn't
wanted.
Rather, it involves saying "how can we unite what we have" and making
some practical suggestions.

As I said, I'm open to suggestions, but it isn't something that I
particularly want to push.

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  7:12           ` Luca Berra
@ 2004-11-03  0:29             ` Neil Brown
  0 siblings, 0 replies; 20+ messages in thread
From: Neil Brown @ 2004-11-03  0:29 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

On Tuesday November 2, bluca@comedia.it wrote:
> >hm, OK.  So perhaps using /dev/mdX as a control channel for creating and
> >destroying /dev/mdX was unfortunate design.  Ho hum.
> 
> would it be unfeasible to create a custom (misc) character device, say
> /dev/mdadm and use that to control arrays instead of /dev/md0?
> 

Sure it would.  Though I would rather so sort of filesystem interface.

But it is unreasonable to remove the current interface without a
reasonable transition period.  So the current code needs to be made to
work.

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-02  3:37 ` [PATCH md 3 of 4] Delete unplug timer before shutting down md array NeilBrown
  2004-11-02  4:57   ` Andrew Morton
@ 2004-11-03 23:03   ` Andrew Morton
  2004-11-04  9:21     ` Jens Axboe
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2004-11-03 23:03 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Jens Axboe

NeilBrown <neilb@cse.unsw.edu.au> wrote:
>
> As the unplug timer can potentially fire at any time, and
> and it access data that is released by the md ->stop function,
> we need to del_timer_sync before releasing that data.

I don't think I saw an update to this patch so I just knocked up the below
simple conversion.  If anyone can think up a nice description of what
blk_sync_queue() should do, it would be appreciated ;)



Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/drivers/block/ll_rw_blk.c |   12 ++++++++++--
 25-akpm/drivers/md/linear.c       |    2 +-
 25-akpm/drivers/md/multipath.c    |    2 +-
 25-akpm/drivers/md/raid0.c        |    2 +-
 25-akpm/drivers/md/raid1.c        |    2 +-
 25-akpm/drivers/md/raid10.c       |    2 +-
 25-akpm/drivers/md/raid5.c        |    2 +-
 25-akpm/drivers/md/raid6main.c    |    2 +-
 25-akpm/include/linux/blkdev.h    |    1 +
 9 files changed, 18 insertions(+), 9 deletions(-)

diff -puN drivers/block/ll_rw_blk.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/block/ll_rw_blk.c
--- 25/drivers/block/ll_rw_blk.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/block/ll_rw_blk.c	Wed Nov  3 15:01:14 2004
@@ -1358,10 +1358,19 @@ void blk_stop_queue(request_queue_t *q)
 	blk_remove_plug(q);
 	set_bit(QUEUE_FLAG_STOPPED, &q->queue_flags);
 }
-
 EXPORT_SYMBOL(blk_stop_queue);
 
 /**
+ * blk_sync_queue - prepare to destroy a queue
+ * @q: the queue
+ */
+void blk_sync_queue(struct request_queue *q)
+{
+	del_timer_sync(&q->unplug_timer);
+}
+EXPORT_SYMBOL(blk_sync_queue);
+
+/**
  * blk_run_queue - run a single device queue
  * @q:	The queue to run
  */
@@ -1374,7 +1383,6 @@ void blk_run_queue(struct request_queue 
 	q->request_fn(q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
-
 EXPORT_SYMBOL(blk_run_queue);
 
 /**
diff -puN drivers/md/linear.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/md/linear.c
--- 25/drivers/md/linear.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/md/linear.c	Wed Nov  3 15:01:14 2004
@@ -231,7 +231,7 @@ static int linear_stop (mddev_t *mddev)
 {
 	linear_conf_t *conf = mddev_to_conf(mddev);
   
-	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
+	blk_sync_queue(&mddev->queue); /* the unplug fn references 'conf'*/
 	kfree(conf->hash_table);
 	kfree(conf);
 
diff -puN drivers/md/multipath.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/md/multipath.c
--- 25/drivers/md/multipath.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/md/multipath.c	Wed Nov  3 15:01:14 2004
@@ -566,7 +566,7 @@ static int multipath_stop (mddev_t *mdde
 
 	md_unregister_thread(mddev->thread);
 	mddev->thread = NULL;
-	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
+	blk_sync_queue(&mddev->queue); /* the unplug fn references 'conf'*/
 	mempool_destroy(conf->pool);
 	kfree(conf->multipaths);
 	kfree(conf);
diff -puN drivers/md/raid0.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/md/raid0.c
--- 25/drivers/md/raid0.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/md/raid0.c	Wed Nov  3 15:01:14 2004
@@ -385,7 +385,7 @@ static int raid0_stop (mddev_t *mddev)
 {
 	raid0_conf_t *conf = mddev_to_conf(mddev);
 
-	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
+	blk_sync_queue(&mddev->queue); /* the unplug fn references 'conf'*/
 	kfree (conf->hash_table);
 	conf->hash_table = NULL;
 	kfree (conf->strip_zone);
diff -puN drivers/md/raid10.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/md/raid10.c
--- 25/drivers/md/raid10.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/md/raid10.c	Wed Nov  3 15:01:14 2004
@@ -1744,7 +1744,7 @@ static int stop(mddev_t *mddev)
 
 	md_unregister_thread(mddev->thread);
 	mddev->thread = NULL;
-	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
+	blk_sync_queue(&mddev->queue); /* the unplug fn references 'conf'*/
 	if (conf->r10bio_pool)
 		mempool_destroy(conf->r10bio_pool);
 	if (conf->mirrors)
diff -puN drivers/md/raid1.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/md/raid1.c
--- 25/drivers/md/raid1.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/md/raid1.c	Wed Nov  3 15:01:14 2004
@@ -1293,7 +1293,7 @@ static int stop(mddev_t *mddev)
 
 	md_unregister_thread(mddev->thread);
 	mddev->thread = NULL;
-	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
+	blk_sync_queue(&mddev->queue); /* the unplug fn references 'conf'*/
 	if (conf->r1bio_pool)
 		mempool_destroy(conf->r1bio_pool);
 	if (conf->mirrors)
diff -puN drivers/md/raid5.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/md/raid5.c
--- 25/drivers/md/raid5.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/md/raid5.c	Wed Nov  3 15:01:14 2004
@@ -1707,7 +1707,7 @@ static int stop (mddev_t *mddev)
 	mddev->thread = NULL;
 	shrink_stripes(conf);
 	free_pages((unsigned long) conf->stripe_hashtbl, HASH_PAGES_ORDER);
-	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
+	blk_sync_queue(&mddev->queue); /* the unplug fn references 'conf'*/
 	kfree(conf);
 	mddev->private = NULL;
 	return 0;
diff -puN drivers/md/raid6main.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup drivers/md/raid6main.c
--- 25/drivers/md/raid6main.c~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:14 2004
+++ 25-akpm/drivers/md/raid6main.c	Wed Nov  3 15:01:14 2004
@@ -1878,7 +1878,7 @@ static int stop (mddev_t *mddev)
 	mddev->thread = NULL;
 	shrink_stripes(conf);
 	free_pages((unsigned long) conf->stripe_hashtbl, HASH_PAGES_ORDER);
-	del_timer_sync(&mddev->queue->unplug_timer); /* the unplug fn references 'conf'*/
+	blk_sync_queue(&mddev->queue); /* the unplug fn references 'conf'*/
 	kfree(conf);
 	mddev->private = NULL;
 	return 0;
diff -puN include/linux/blkdev.h~md-delete-unplug-timer-before-shutting-down-md-array-cleanup include/linux/blkdev.h
--- 25/include/linux/blkdev.h~md-delete-unplug-timer-before-shutting-down-md-array-cleanup	Wed Nov  3 15:01:34 2004
+++ 25-akpm/include/linux/blkdev.h	Wed Nov  3 15:01:55 2004
@@ -522,6 +522,7 @@ extern int blk_hw_contig_segment(request
 extern int scsi_cmd_ioctl(struct file *, struct gendisk *, unsigned int, void __user *);
 extern void blk_start_queue(request_queue_t *q);
 extern void blk_stop_queue(request_queue_t *q);
+extern void blk_sync_queue(struct request_queue *q);
 extern void __blk_stop_queue(request_queue_t *q);
 extern void blk_run_queue(request_queue_t *);
 extern void blk_queue_activity_fn(request_queue_t *, activity_fn *, void *);
_


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-03 23:03   ` Andrew Morton
@ 2004-11-04  9:21     ` Jens Axboe
  2004-11-05  8:52       ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2004-11-04  9:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: NeilBrown, linux-raid

On Wed, Nov 03 2004, Andrew Morton wrote:
> NeilBrown <neilb@cse.unsw.edu.au> wrote:
> >
> > As the unplug timer can potentially fire at any time, and
> > and it access data that is released by the md ->stop function,
> > we need to del_timer_sync before releasing that data.
> 
> I don't think I saw an update to this patch so I just knocked up the below
> simple conversion.  If anyone can think up a nice description of what
> blk_sync_queue() should do, it would be appreciated ;)

I don't see much merrit in this patch, you have to prevent new requests
from setting it off again. Basically, from my understanding, Neil needs
a way to atomically kill the timer and the unplug function. Correct?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-04  9:21     ` Jens Axboe
@ 2004-11-05  8:52       ` Jens Axboe
  2004-11-07 23:38         ` Neil Brown
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2004-11-05  8:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: NeilBrown, linux-raid

On Thu, Nov 04 2004, Jens Axboe wrote:
> On Wed, Nov 03 2004, Andrew Morton wrote:
> > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > >
> > > As the unplug timer can potentially fire at any time, and
> > > and it access data that is released by the md ->stop function,
> > > we need to del_timer_sync before releasing that data.
> > 
> > I don't think I saw an update to this patch so I just knocked up the below
> > simple conversion.  If anyone can think up a nice description of what
> > blk_sync_queue() should do, it would be appreciated ;)
> 
> I don't see much merrit in this patch, you have to prevent new requests
> from setting it off again. Basically, from my understanding, Neil needs
> a way to atomically kill the timer and the unplug function. Correct?

Actually, with the online io scheduler switching we have a way to do
this already:

	blk_wait_queue_drained(q, 1);
	blk_sync_queue(q);

Then you are sure the queue is idle, switch unplug (or
->make_request_fn) and:

	blk_finish_queue_drain(q);

We need to move the ->rl waitqueue initializers to
blk_queue_make_request() then.

Would this work for you, Neil?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-05  8:52       ` Jens Axboe
@ 2004-11-07 23:38         ` Neil Brown
  2004-11-07 23:42           ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Brown @ 2004-11-07 23:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-raid

On Friday November 5, axboe@suse.de wrote:
> On Thu, Nov 04 2004, Jens Axboe wrote:
> > On Wed, Nov 03 2004, Andrew Morton wrote:
> > > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > > >
> > > > As the unplug timer can potentially fire at any time, and
> > > > and it access data that is released by the md ->stop function,
> > > > we need to del_timer_sync before releasing that data.
> > > 
> > > I don't think I saw an update to this patch so I just knocked up the below
> > > simple conversion.  If anyone can think up a nice description of what
> > > blk_sync_queue() should do, it would be appreciated ;)
> > 
> > I don't see much merrit in this patch, you have to prevent new requests
> > from setting it off again. Basically, from my understanding, Neil needs
> > a way to atomically kill the timer and the unplug function. Correct?
> 
> Actually, with the online io scheduler switching we have a way to do
> this already:
> 
> 	blk_wait_queue_drained(q, 1);
> 	blk_sync_queue(q);
> 

I couldn't find blk_sync_queue in 2.6.10-rc1-mm2 ....

> 
> Would this work for you, Neil?

I need it to do:
	del_timer_sync(&q->unplug_timer);
	kblockd_flush();

(I now realise that the patch I started with wasn't enough, as
kblockd_flush is needed as well).

Anything that I can that eventually does that should be OK.

NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-07 23:38         ` Neil Brown
@ 2004-11-07 23:42           ` Andrew Morton
  2004-11-08  1:21             ` Neil Brown
  2004-11-08 11:23             ` Jens Axboe
  0 siblings, 2 replies; 20+ messages in thread
From: Andrew Morton @ 2004-11-07 23:42 UTC (permalink / raw)
  To: Neil Brown; +Cc: axboe, linux-raid

Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> On Friday November 5, axboe@suse.de wrote:
> > On Thu, Nov 04 2004, Jens Axboe wrote:
> > > On Wed, Nov 03 2004, Andrew Morton wrote:
> > > > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > > > >
> > > > > As the unplug timer can potentially fire at any time, and
> > > > > and it access data that is released by the md ->stop function,
> > > > > we need to del_timer_sync before releasing that data.
> > > > 
> > > > I don't think I saw an update to this patch so I just knocked up the below
> > > > simple conversion.  If anyone can think up a nice description of what
> > > > blk_sync_queue() should do, it would be appreciated ;)
> > > 
> > > I don't see much merrit in this patch, you have to prevent new requests
> > > from setting it off again. Basically, from my understanding, Neil needs
> > > a way to atomically kill the timer and the unplug function. Correct?
> > 
> > Actually, with the online io scheduler switching we have a way to do
> > this already:
> > 
> > 	blk_wait_queue_drained(q, 1);
> > 	blk_sync_queue(q);
> > 
> 
> I couldn't find blk_sync_queue in 2.6.10-rc1-mm2 ....

I added it in -mm3 to replace all the open-coded del_timer_sync()s.

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10-rc1/2.6.10-rc1-mm3/broken-out/md-delete-unplug-timer-before-shutting-down-md-array-cleanup.patch

> > 
> > Would this work for you, Neil?
> 
> I need it to do:
> 	del_timer_sync(&q->unplug_timer);
> 	kblockd_flush();
> 
> (I now realise that the patch I started with wasn't enough, as
> kblockd_flush is needed as well).
> 
> Anything that I can that eventually does that should be OK.
> 

We should be able to stick all the above inside block_sync_queue()

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-07 23:42           ` Andrew Morton
@ 2004-11-08  1:21             ` Neil Brown
  2004-11-08 11:23             ` Jens Axboe
  1 sibling, 0 replies; 20+ messages in thread
From: Neil Brown @ 2004-11-08  1:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: axboe, linux-raid

On Sunday November 7, akpm@osdl.org wrote:
> > I couldn't find blk_sync_queue in 2.6.10-rc1-mm2 ....
> 
> I added it in -mm3 to replace all the open-coded del_timer_sync()s.

Yeh... right... of course ... I saw that mail...

I think I missed the trust of Jens' mail.

I now see the point was that:

  .... you have to prevent new requests from setting it (the timer)
  off again.

That isn't a problem in my case. By the time I  to want to remove
the timer, I know that no-one has the device open, and so no requests
are going to arrive.
The blk_wait_queue_drained has no relevance for md as we don't use the
same sort of queue.
So I think blk_sync_queue is all that is needed.

> 
> We should be able to stick all the above inside block_sync_queue()


Maybe:

###Comments for ChangeSet
Document blk_sync_queue and complete it's functionality.

blk_sync_queue needs to call kblock_flush as the unplug function
is not actually called by the timer, but rather by a work_queue that
the timer triggers.

Also, use blk_sync_queue to abstract identical functionality from 
blk_cleanup_queue.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/block/ll_rw_blk.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff ./drivers/block/ll_rw_blk.c~current~ ./drivers/block/ll_rw_blk.c
--- ./drivers/block/ll_rw_blk.c~current~	2004-11-08 11:57:40.000000000 +1100
+++ ./drivers/block/ll_rw_blk.c	2004-11-08 12:17:53.000000000 +1100
@@ -1361,12 +1361,21 @@ void blk_stop_queue(request_queue_t *q)
 EXPORT_SYMBOL(blk_stop_queue);
 
 /**
- * blk_sync_queue - prepare to destroy a queue
+ * blk_sync_queue - cancel any pending callbacks a queue
  * @q: the queue
+ *
+ * Description:
+ *     The block layer may perform asynchronous callback activity
+ *     on a queue, such as calling the unplug function after a timeout.
+ *     A block device may call blk_sync_queue to ensure that any
+ *     such activity is cancelled, thus allowing it to release resources
+ *     the the callbacks might use.
+ *
  */
 void blk_sync_queue(struct request_queue *q)
 {
 	del_timer_sync(&q->unplug_timer);
+	kblockd_flush();
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
@@ -1410,8 +1419,7 @@ void blk_cleanup_queue(request_queue_t *
 	if (q->elevator)
 		elevator_exit(q->elevator);
 
-	del_timer_sync(&q->unplug_timer);
-	kblockd_flush();
+	blk_sync_queue(q);
 
 	if (rl->rq_pool)
 		mempool_destroy(rl->rq_pool);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH md 3 of 4] Delete unplug timer before shutting down md array.
  2004-11-07 23:42           ` Andrew Morton
  2004-11-08  1:21             ` Neil Brown
@ 2004-11-08 11:23             ` Jens Axboe
  1 sibling, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2004-11-08 11:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Neil Brown, linux-raid

On Sun, Nov 07 2004, Andrew Morton wrote:
> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> >
> > On Friday November 5, axboe@suse.de wrote:
> > > On Thu, Nov 04 2004, Jens Axboe wrote:
> > > > On Wed, Nov 03 2004, Andrew Morton wrote:
> > > > > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > > > > >
> > > > > > As the unplug timer can potentially fire at any time, and
> > > > > > and it access data that is released by the md ->stop function,
> > > > > > we need to del_timer_sync before releasing that data.
> > > > > 
> > > > > I don't think I saw an update to this patch so I just knocked up the below
> > > > > simple conversion.  If anyone can think up a nice description of what
> > > > > blk_sync_queue() should do, it would be appreciated ;)
> > > > 
> > > > I don't see much merrit in this patch, you have to prevent new requests
> > > > from setting it off again. Basically, from my understanding, Neil needs
> > > > a way to atomically kill the timer and the unplug function. Correct?
> > > 
> > > Actually, with the online io scheduler switching we have a way to do
> > > this already:
> > > 
> > > 	blk_wait_queue_drained(q, 1);
> > > 	blk_sync_queue(q);
> > > 
> > 
> > I couldn't find blk_sync_queue in 2.6.10-rc1-mm2 ....
> 
> I added it in -mm3 to replace all the open-coded del_timer_sync()s.
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10-rc1/2.6.10-rc1-mm3/broken-out/md-delete-unplug-timer-before-shutting-down-md-array-cleanup.patch

I seem to be missing Neils reply, this is the first I see of it.
Strange.

> > > 
> > > Would this work for you, Neil?
> > 
> > I need it to do:
> > 	del_timer_sync(&q->unplug_timer);
> > 	kblockd_flush();

blk_sync_queue() should do both. I'm unsure of whether we should wrap
blk_wait_queue_drained() and that into a seperate helper. There needs to
be an unfreeze as well, of course. Perhaps blk_freeze_queue() and
blk_unfreeze_queue()?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2004-11-08 11:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-02  3:37 [PATCH md 0 of 4] Introduction NeilBrown
2004-11-02  3:37 ` [PATCH md 1 of 4] Fix problem with md/linear for devices larger than 2 terabytes NeilBrown
2004-11-02  3:37 ` [PATCH md 2 of 4] Fix raid6 problem NeilBrown
2004-11-02  3:37 ` [PATCH md 3 of 4] Delete unplug timer before shutting down md array NeilBrown
2004-11-02  4:57   ` Andrew Morton
2004-11-02  5:00     ` Andrew Morton
2004-11-02  4:28       ` Neil Brown
2004-11-02  5:33         ` Andrew Morton
2004-11-02  7:12           ` Luca Berra
2004-11-03  0:29             ` Neil Brown
2004-11-03 23:03   ` Andrew Morton
2004-11-04  9:21     ` Jens Axboe
2004-11-05  8:52       ` Jens Axboe
2004-11-07 23:38         ` Neil Brown
2004-11-07 23:42           ` Andrew Morton
2004-11-08  1:21             ` Neil Brown
2004-11-08 11:23             ` Jens Axboe
2004-11-02  3:37 ` [PATCH md 4 of 4] "Faulty" personality of md NeilBrown
2004-11-02  9:31   ` Lars Marowsky-Bree
2004-11-02 23:30     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).