linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH md 000 of 20] Introduction
@ 2005-12-12  3:10 NeilBrown
  2005-12-12  3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
                   ` (19 more replies)
  0 siblings, 20 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid

Following are 20 patches for md/raid in 2.6.15-rc5-mm2

The  first 2 are suitable for 2.6.15.
The first fixes a use-after-free bug in raid1 which could only cause problems
with using raid1 with some devices in write-behind mode.
The second fixes a carelessness that could result in a deadlock
if raid5 stripe_cache_size were set too low.

3 through 8 are assorted minor cleanups or bugfixes only relevant to -mm
9 through 20 add a lot more information about md arrays to sysfs and allow
a lot of it to be settable.  This will ultimately make it possible to completely
configure an md array through sysfs without using the ioctl's at all.  This is
good because the ioctl's are limited and while I've manage to squeeze some 
extra functionality out of them, they've reached their limit.

One particular goal I am working towards here is to be able to have an 
md array where the metadata is completely managed by userspace.  A program would
read the superblock - fill in the correct information via sysfs, and start the 
array.
On device failure or other events, userspace gets notified and take necessary 
action like identifying a spare, passing it to md, and requesting a reconstruction. 

This means that e.g. DDF can be supported without adding complexity to the kernel.

Thanks,
NeilBrown


 [PATCH md 001 of 20] Fix a use-after-free bug in raid1
 [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is.
 [PATCH md 003 of 20] Define and use safe_put_page for md.
 [PATCH md 004 of 20] Helper function to match commands written to sysfs files.
 [PATCH md 005 of 20] Fix typo in comment.
 [PATCH md 006 of 20] Make a couple of names in md.c static
 [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem.
 [PATCH md 008 of 20] Fix rdev->pending counts in raid1
 [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs
 [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs
 [PATCH md 011 of 20] Expose md metadata format in sysfs.
 [PATCH md 012 of 20] Allow array level to be set textually via sysfs.
 [PATCH md 013 of 20] Count corrected read errors per drive.
 [PATCH md 014 of 20] Allow md/raid_disks to be settable.
 [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays.
 [PATCH md 016 of 20] Expose device slot information via sysfs
 [PATCH md 017 of 20] Export rdev->data_offset via sysfs
 [PATCH md 018 of 20] Allow available size of component devices to be set via sysfs.
 [PATCH md 019 of 20] Support adding new devices to md arrays via sysfs
 [PATCH md 020 of 20] Allow sync-speed to be controlled per-device

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 001 of 20] Fix a use-after-free bug in raid1
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
@ 2005-12-12  3:10 ` NeilBrown
  2005-12-12  3:10 ` [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is NeilBrown
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Who would submit code with a FIXME like that in it !!!!



Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid1.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid1.c	2005-12-12 10:45:22.000000000 +1100
@@ -323,7 +323,6 @@ static int raid1_end_write_request(struc
 		 * this branch is our 'one mirror IO has finished' event handler:
 		 */
 		r1_bio->bios[mirror] = NULL;
-		bio_put(bio);
 		if (!uptodate) {
 			md_error(r1_bio->mddev, conf->mirrors[mirror].rdev);
 			/* an I/O failed, we can't clear the bitmap */
@@ -380,7 +379,6 @@ static int raid1_end_write_request(struc
 		}
 		if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
 			/* free extra copy of the data pages */
-/* FIXME bio has been freed!!! */
 			int i = bio->bi_vcnt;
 			while (i--)
 				put_page(bio->bi_io_vec[i].bv_page);
@@ -394,6 +392,9 @@ static int raid1_end_write_request(struc
 		raid_end_bio_io(r1_bio);
 	}
 
+	if (r1_bio->bios[mirror]==NULL)
+		bio_put(bio);
+
 	rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 003 of 20] Define and use safe_put_page for md.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
  2005-12-12  3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
  2005-12-12  3:10 ` [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is NeilBrown
@ 2005-12-12  3:10 ` NeilBrown
  2005-12-12  3:13 ` [PATCH md 004 of 20] Helper function to match commands written to sysfs files NeilBrown
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


md sometimes call put_page on NULL pointers (treating it like kfree).
This is not safe, so define and use a 'safe_put_page' which checks
for NULL.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c       |    3 +--
 ./drivers/md/raid1.c        |    8 ++++----
 ./drivers/md/raid10.c       |    8 ++++----
 ./drivers/md/raid6main.c    |    3 +--
 ./include/linux/raid/md_k.h |    5 +++++
 5 files changed, 15 insertions(+), 12 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/bitmap.c	2005-12-12 10:45:26.000000000 +1100
@@ -626,8 +626,7 @@ static void bitmap_file_unmap(struct bit
 	kfree(map);
 	kfree(attr);
 
-	if (sb_page)
-		put_page(sb_page);
+	safe_put_page(sb_page);
 }
 
 static void bitmap_stop_daemon(struct bitmap *bitmap);

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2005-12-12 10:45:22.000000000 +1100
+++ ./drivers/md/raid1.c	2005-12-12 10:45:26.000000000 +1100
@@ -136,7 +136,7 @@ static void * r1buf_pool_alloc(gfp_t gfp
 out_free_pages:
 	for (i=0; i < RESYNC_PAGES ; i++)
 		for (j=0 ; j < pi->raid_disks; j++)
-			put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
+			safe_put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
 	j = -1;
 out_free_bio:
 	while ( ++j < pi->raid_disks )
@@ -156,7 +156,7 @@ static void r1buf_pool_free(void *__r1_b
 			if (j == 0 ||
 			    r1bio->bios[j]->bi_io_vec[i].bv_page !=
 			    r1bio->bios[0]->bi_io_vec[i].bv_page)
-				put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
+				safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
 		}
 	for (i=0 ; i < pi->raid_disks; i++)
 		bio_put(r1bio->bios[i]);
@@ -381,7 +381,7 @@ static int raid1_end_write_request(struc
 			/* free extra copy of the data pages */
 			int i = bio->bi_vcnt;
 			while (i--)
-				put_page(bio->bi_io_vec[i].bv_page);
+				safe_put_page(bio->bi_io_vec[i].bv_page);
 		}
 		/* clear the bitmap if all writes complete successfully */
 		bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
@@ -1907,7 +1907,7 @@ out_free_conf:
 		if (conf->r1bio_pool)
 			mempool_destroy(conf->r1bio_pool);
 		kfree(conf->mirrors);
-		put_page(conf->tmppage);
+		safe_put_page(conf->tmppage);
 		kfree(conf->poolinfo);
 		kfree(conf);
 		mddev->private = NULL;

diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~	2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid10.c	2005-12-12 10:45:26.000000000 +1100
@@ -132,10 +132,10 @@ static void * r10buf_pool_alloc(gfp_t gf
 
 out_free_pages:
 	for ( ; i > 0 ; i--)
-		put_page(bio->bi_io_vec[i-1].bv_page);
+		safe_put_page(bio->bi_io_vec[i-1].bv_page);
 	while (j--)
 		for (i = 0; i < RESYNC_PAGES ; i++)
-			put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
+			safe_put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
 	j = -1;
 out_free_bio:
 	while ( ++j < nalloc )
@@ -155,7 +155,7 @@ static void r10buf_pool_free(void *__r10
 		struct bio *bio = r10bio->devs[j].bio;
 		if (bio) {
 			for (i = 0; i < RESYNC_PAGES; i++) {
-				put_page(bio->bi_io_vec[i].bv_page);
+				safe_put_page(bio->bi_io_vec[i].bv_page);
 				bio->bi_io_vec[i].bv_page = NULL;
 			}
 			bio_put(bio);
@@ -2042,7 +2042,7 @@ static int run(mddev_t *mddev)
 out_free_conf:
 	if (conf->r10bio_pool)
 		mempool_destroy(conf->r10bio_pool);
-	put_page(conf->tmppage);
+	safe_put_page(conf->tmppage);
 	kfree(conf->mirrors);
 	kfree(conf);
 	mddev->private = NULL;

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid6main.c	2005-12-12 10:45:26.000000000 +1100
@@ -2072,8 +2072,7 @@ static int run(mddev_t *mddev)
 abort:
 	if (conf) {
 		print_raid6_conf(conf);
-		if (conf->spare_page)
-			put_page(conf->spare_page);
+		safe_put_page(conf->spare_page);
 		kfree(conf->stripe_hashtbl);
 		kfree(conf);
 	}

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2005-12-12 10:45:16.000000000 +1100
+++ ./include/linux/raid/md_k.h	2005-12-12 10:45:26.000000000 +1100
@@ -324,5 +324,10 @@ do {									\
 	__wait_event_lock_irq(wq, condition, lock, cmd);		\
 } while (0)
 
+static inline void safe_put_page(struct page *p)
+{
+	if (p) put_page(p);
+}
+
 #endif
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
  2005-12-12  3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
@ 2005-12-12  3:10 ` NeilBrown
  2005-12-12  3:10 ` [PATCH md 003 of 20] Define and use safe_put_page for md NeilBrown
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


The raid5 stripe cache was recently changed from fixed size
(NR_STRIPES) to variable size (conf->max_nr_stripes).  However there
are two places that still use the constant and as a result, reducing
the size of the stripe cache can result in a deadlock.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid5.c	2005-12-12 10:45:24.000000000 +1100
@@ -96,7 +96,7 @@ static inline void __release_stripe(raid
 			list_add_tail(&sh->lru, &conf->inactive_list);
 			atomic_dec(&conf->active_stripes);
 			if (!conf->inactive_blocked ||
-			    atomic_read(&conf->active_stripes) < (NR_STRIPES*3/4))
+			    atomic_read(&conf->active_stripes) < (conf->max_nr_stripes*3/4))
 				wake_up(&conf->wait_for_stripe);
 		}
 	}
@@ -255,7 +255,8 @@ static struct stripe_head *get_active_st
 				conf->inactive_blocked = 1;
 				wait_event_lock_irq(conf->wait_for_stripe,
 						    !list_empty(&conf->inactive_list) &&
-						    (atomic_read(&conf->active_stripes) < (NR_STRIPES *3/4)
+						    (atomic_read(&conf->active_stripes)
+						     < (conf->max_nr_stripes *3/4)
 						     || !conf->inactive_blocked),
 						    conf->device_lock,
 						    unplug_slaves(conf->mddev);
@@ -1915,7 +1916,7 @@ static int run(mddev_t *mddev)
 			goto abort;
 		}
 	}
-memory = conf->max_nr_stripes * (sizeof(struct stripe_head) +
+	memory = conf->max_nr_stripes * (sizeof(struct stripe_head) +
 		 conf->raid_disks * ((sizeof(struct bio) + PAGE_SIZE))) / 1024;
 	if (grow_stripes(conf, conf->max_nr_stripes)) {
 		printk(KERN_ERR 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 004 of 20] Helper function to match commands written to sysfs files.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (2 preceding siblings ...)
  2005-12-12  3:10 ` [PATCH md 003 of 20] Define and use safe_put_page for md NeilBrown
@ 2005-12-12  3:13 ` NeilBrown
  2005-12-12  3:14 ` [PATCH md 005 of 20] Fix typo in comment NeilBrown
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Commands written to sysfs files may, or my not, be \n terminated.
We want to accept with case. For this we use cmd_match.
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |   29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 10:45:30.000000000 +1100
@@ -1525,6 +1525,26 @@ repeat:
 
 }
 
+/* words written to sysfs files may, or my not, be \n terminated.
+ * We want to accept with case. For this we use cmd_match.
+ */
+static int cmd_match(const char *cmd, const char *str)
+{
+	/* See if cmd, written into a sysfs file, matches
+	 * str.  They must either be the same, or cmd can
+	 * have a trailing newline
+	 */
+	while (*cmd && *str && *cmd == *str) {
+		cmd++;
+		str++;
+	}
+	if (*cmd == '\n')
+		cmd++;
+	if (*str || *cmd)
+		return 0;
+	return 1;
+}
+
 struct rdev_sysfs_entry {
 	struct attribute attr;
 	ssize_t (*show)(mdk_rdev_t *, char *);
@@ -1799,7 +1819,7 @@ action_store(mddev_t *mddev, const char 
 	if (!mddev->pers || !mddev->pers->sync_request)
 		return -EINVAL;
 
-	if (strcmp(page, "idle")==0 || strcmp(page, "idle\n")==0) {
+	if (cmd_match(page, "idle")) {
 		if (mddev->sync_thread) {
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 			md_unregister_thread(mddev->sync_thread);
@@ -1812,13 +1832,12 @@ action_store(mddev_t *mddev, const char 
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
 	    test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
 		return -EBUSY;
-	if (strcmp(page, "resync")==0 || strcmp(page, "resync\n")==0 ||
-	    strcmp(page, "recover")==0 || strcmp(page, "recover\n")==0)
+	if (cmd_match(page, "resync") || cmd_match(page, "recover"))
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	else {
-		if (strcmp(page, "check")==0 || strcmp(page, "check\n")==0)
+		if (cmd_match(page, "check"))
 			set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
-		else if (strcmp(page, "repair")!=0 && strcmp(page, "repair\n")!=0)
+		else if (cmd_match(page, "repair"))
 			return -EINVAL;
 		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
 		set_bit(MD_RECOVERY_SYNC, &mddev->recovery);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 005 of 20] Fix typo in comment.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (3 preceding siblings ...)
  2005-12-12  3:13 ` [PATCH md 004 of 20] Helper function to match commands written to sysfs files NeilBrown
@ 2005-12-12  3:14 ` NeilBrown
  2005-12-12  3:14 ` [PATCH md 006 of 20] Make a couple of names in md.c static NeilBrown
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid



Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid10.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~	2005-12-12 10:45:26.000000000 +1100
+++ ./drivers/md/raid10.c	2005-12-12 10:45:34.000000000 +1100
@@ -712,7 +712,7 @@ static void allow_barrier(conf_t *conf)
 static void freeze_array(conf_t *conf)
 {
 	/* stop syncio and normal IO and wait for everything to
-	 * go quite.
+	 * go quiet.
 	 * We increment barrier and nr_waiting, and then
 	 * wait until barrier+nr_pending match nr_queued+2
 	 */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 006 of 20] Make a couple of names in md.c static
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (4 preceding siblings ...)
  2005-12-12  3:14 ` [PATCH md 005 of 20] Fix typo in comment NeilBrown
@ 2005-12-12  3:14 ` NeilBrown
  2005-12-12  3:14 ` [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem NeilBrown
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


.. because they aren't used outside md.c

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c           |    4 ++--
 ./include/linux/raid/md_k.h |    2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 10:45:30.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 10:45:36.000000000 +1100
@@ -144,7 +144,7 @@ static int start_readonly;
  *  start array, stop array, error, add device, remove device,
  *  start build, activate spare
  */
-DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
+static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
 static atomic_t md_event_count;
 void md_new_event(mddev_t *mddev)
 {
@@ -279,7 +279,7 @@ static inline void mddev_unlock(mddev_t 
 	md_wakeup_thread(mddev->thread);
 }
 
-mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
+static mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
 {
 	mdk_rdev_t * rdev;
 	struct list_head *tmp;

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2005-12-12 10:45:26.000000000 +1100
+++ ./include/linux/raid/md_k.h	2005-12-12 10:45:36.000000000 +1100
@@ -263,8 +263,6 @@ static inline char * mdname (mddev_t * m
 	return mddev->gendisk ? mddev->gendisk->disk_name : "mdX";
 }
 
-extern mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr);
-
 /*
  * iterates through some rdev ringlist. It's safe to remove the
  * current 'rdev'. Dont touch 'tmp' though.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (5 preceding siblings ...)
  2005-12-12  3:14 ` [PATCH md 006 of 20] Make a couple of names in md.c static NeilBrown
@ 2005-12-12  3:14 ` NeilBrown
  2005-12-12  3:14 ` [PATCH md 008 of 20] Fix rdev->pending counts in raid1 NeilBrown
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


When we update a page_cache page in the kernel, we need to
flush_dache_page or userspace might not see the change.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c |    2 ++
 1 file changed, 2 insertions(+)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2005-12-12 10:45:26.000000000 +1100
+++ ./drivers/md/bitmap.c	2005-12-12 10:45:38.000000000 +1100
@@ -315,6 +315,8 @@ static int write_page(struct bitmap *bit
 	if (bitmap->file == NULL)
 		return write_sb_page(bitmap->mddev, bitmap->offset, page, wait);
 
+	flush_dcache_page(page); /* make sure visible to anyone reading the file */
+
 	if (wait)
 		lock_page(page);
 	else {

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 008 of 20] Fix rdev->pending counts in raid1
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (6 preceding siblings ...)
  2005-12-12  3:14 ` [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem NeilBrown
@ 2005-12-12  3:14 ` NeilBrown
  2005-12-12  3:14 ` [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs NeilBrown
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


When we do a user-requested check/repair, we lose count
of the outstanding requests...

Also make sure that when anything is written to md/sync_action,
the RECOVERY_NEEDED flag is set and the thread is woken up
so any changes take effect.



Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c    |   11 ++++-------
 ./drivers/md/raid1.c |   10 ++++++----
 2 files changed, 10 insertions(+), 11 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 10:45:36.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 10:45:40.000000000 +1100
@@ -1826,13 +1826,10 @@ action_store(mddev_t *mddev, const char 
 			mddev->sync_thread = NULL;
 			mddev->recovery = 0;
 		}
-		return len;
-	}
-
-	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
-	    test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+	} else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+		   test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
 		return -EBUSY;
-	if (cmd_match(page, "resync") || cmd_match(page, "recover"))
+	else if (cmd_match(page, "resync") || cmd_match(page, "recover"))
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	else {
 		if (cmd_match(page, "check"))
@@ -1841,8 +1838,8 @@ action_store(mddev_t *mddev, const char 
 			return -EINVAL;
 		set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
 		set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	}
+	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	md_wakeup_thread(mddev->thread);
 	return len;
 }

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2005-12-12 10:45:26.000000000 +1100
+++ ./drivers/md/raid1.c	2005-12-12 10:45:40.000000000 +1100
@@ -527,7 +527,7 @@ static int read_balance(conf_t *conf, r1
 			/* cannot risk returning a device that failed
 			 * before we inc'ed nr_pending
 			 */
-			atomic_dec(&rdev->nr_pending);
+			rdev_dec_pending(rdev, conf->mddev);
 			goto retry;
 		}
 		conf->next_seq_sect = this_sector + sectors;
@@ -830,7 +830,7 @@ static int make_request(request_queue_t 
 		    !test_bit(Faulty, &rdev->flags)) {
 			atomic_inc(&rdev->nr_pending);
 			if (test_bit(Faulty, &rdev->flags)) {
-				atomic_dec(&rdev->nr_pending);
+				rdev_dec_pending(rdev, mddev);
 				r1_bio->bios[i] = NULL;
 			} else
 				r1_bio->bios[i] = bio;
@@ -1176,6 +1176,7 @@ static void sync_request_write(mddev_t *
 			if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
 			    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
 				r1_bio->bios[primary]->bi_end_io = NULL;
+				rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
 				break;
 			}
 		r1_bio->read_disk = primary;
@@ -1193,9 +1194,10 @@ static void sync_request_write(mddev_t *
 						break;
 				if (j >= 0)
 					mddev->resync_mismatches += r1_bio->sectors;
-				if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
+				if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
 					sbio->bi_end_io = NULL;
-				else {
+					rdev_dec_pending(conf->mirrors[i].rdev, mddev);
+				} else {
 					/* fixup the bio for reuse */
 					sbio->bi_vcnt = vcnt;
 					sbio->bi_size = r1_bio->sectors << 9;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (7 preceding siblings ...)
  2005-12-12  3:14 ` [PATCH md 008 of 20] Fix rdev->pending counts in raid1 NeilBrown
@ 2005-12-12  3:14 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs NeilBrown
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


... only before array is started of course.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    8 ++++++++
 ./drivers/md/md.c      |   26 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:30:58.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:31:25.000000000 +1100
@@ -166,6 +166,14 @@ All md devices contain:
      will be empty.  If an array is being resized (not currently
      possible) this will contain the larger of the old and new sizes.
 
+  chunk_size
+     This is the size if bytes for 'chunks' and is only relevant to
+     raid levels that involve striping (1,4,5,6,10). The address space
+     of the array is conceptually divided into chunks and consecutive
+     chunks are striped onto neighbouring devices.
+     The size should be atleast PAGE_SIZE (4k) and should be a power
+     of 2.  This can only be set while assembling an array
+
 As component devices are added to an md array, they appear in the 'md'
 directory as new directories named
       dev-XXX

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:30:58.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:31:25.000000000 +1100
@@ -1795,6 +1795,31 @@ raid_disks_show(mddev_t *mddev, char *pa
 static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks);
 
 static ssize_t
+chunk_size_show(mddev_t *mddev, char *page)
+{
+	return sprintf(page, "%d\n", mddev->chunk_size);
+}
+
+static ssize_t
+chunk_size_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	/* can only set chunk_size if array is not yet active */
+	char *e;
+	unsigned long n = simple_strtoul(buf, &e, 10);
+
+	if (mddev->pers)
+		return -EBUSY;
+	if (!*buf || (*e && *e != '\n'))
+		return -EINVAL;
+
+	mddev->chunk_size = n;
+	return len;
+}
+static struct md_sysfs_entry md_chunk_size =
+__ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
+
+
+static ssize_t
 action_show(mddev_t *mddev, char *page)
 {
 	char *type = "idle";
@@ -1861,6 +1886,7 @@ md_mismatches = __ATTR_RO(mismatch_cnt);
 static struct attribute *md_default_attrs[] = {
 	&md_level.attr,
 	&md_raid_disks.attr,
+	&md_chunk_size.attr,
 	NULL,
 };
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (8 preceding siblings ...)
  2005-12-12  3:14 ` [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 011 of 20] Expose md metadata format in sysfs NeilBrown
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid




Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    9 +++
 ./drivers/md/md.c      |  131 +++++++++++++++++++++++++++++++++----------------
 2 files changed, 98 insertions(+), 42 deletions(-)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:31:25.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:31:46.000000000 +1100
@@ -174,6 +174,15 @@ All md devices contain:
      The size should be atleast PAGE_SIZE (4k) and should be a power
      of 2.  This can only be set while assembling an array
 
+  component_size
+     For arrays with data redundancy (i.e. not raid0, linear, faulty,
+     multipath), all components must be the same size - or at least
+     there must a size that they all provide space for.  This is a key
+     part or the geometry of the array.  It is measured in sectors
+     and can be read from here.  Writing to this value may resize
+     the array if the personality supports it (raid1, raid5, raid6),
+     and if the component drives are large enough.
+
 As component devices are added to an md array, they appear in the 'md'
 directory as new directories named
       dev-XXX

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:31:25.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:32:28.000000000 +1100
@@ -1820,6 +1820,44 @@ __ATTR(chunk_size, 0644, chunk_size_show
 
 
 static ssize_t
+size_show(mddev_t *mddev, char *page)
+{
+	return sprintf(page, "%llu\n", (unsigned long long)mddev->size);
+}
+
+static int update_size(mddev_t *mddev, unsigned long size);
+
+static ssize_t
+size_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	/* If array is inactive, we can reduce the component size, but
+	 * not increase it (except from 0).
+	 * If array is active, we can try an on-line resize
+	 */
+	char *e;
+	int err = 0;
+	unsigned long long size = simple_strtoull(buf, &e, 10);
+	if (!*buf || *buf == '\n' ||
+	    (*e && *e != '\n'))
+		return -EINVAL;
+
+	if (mddev->pers) {
+		err = update_size(mddev, size);
+		md_update_sb(mddev);
+	} else {
+		if (mddev->size == 0 ||
+		    mddev->size > size)
+			mddev->size = size;
+		else
+			err = -ENOSPC;
+	}
+	return err ? err : len;
+}
+
+static struct md_sysfs_entry md_size =
+__ATTR(component_size, 0644, size_show, size_store);
+
+static ssize_t
 action_show(mddev_t *mddev, char *page)
 {
 	char *type = "idle";
@@ -1887,6 +1925,7 @@ static struct attribute *md_default_attr
 	&md_level.attr,
 	&md_raid_disks.attr,
 	&md_chunk_size.attr,
+	&md_size.attr,
 	NULL,
 };
 
@@ -3005,6 +3044,54 @@ static int set_array_info(mddev_t * mdde
 	return 0;
 }
 
+static int update_size(mddev_t *mddev, unsigned long size)
+{
+	mdk_rdev_t * rdev;
+	int rv;
+	struct list_head *tmp;
+
+	if (mddev->pers->resize == NULL)
+		return -EINVAL;
+	/* The "size" is the amount of each device that is used.
+	 * This can only make sense for arrays with redundancy.
+	 * linear and raid0 always use whatever space is available
+	 * We can only consider changing the size if no resync
+	 * or reconstruction is happening, and if the new size
+	 * is acceptable. It must fit before the sb_offset or,
+	 * if that is <data_offset, it must fit before the
+	 * size of each device.
+	 * If size is zero, we find the largest size that fits.
+	 */
+	if (mddev->sync_thread)
+		return -EBUSY;
+	ITERATE_RDEV(mddev,rdev,tmp) {
+		sector_t avail;
+		int fit = (size == 0);
+		if (rdev->sb_offset > rdev->data_offset)
+			avail = (rdev->sb_offset*2) - rdev->data_offset;
+		else
+			avail = get_capacity(rdev->bdev->bd_disk)
+				- rdev->data_offset;
+		if (fit && (size == 0 || size > avail/2))
+			size = avail/2;
+		if (avail < ((sector_t)size << 1))
+			return -ENOSPC;
+	}
+	rv = mddev->pers->resize(mddev, (sector_t)size *2);
+	if (!rv) {
+		struct block_device *bdev;
+
+		bdev = bdget_disk(mddev->gendisk, 0);
+		if (bdev) {
+			down(&bdev->bd_inode->i_sem);
+			i_size_write(bdev->bd_inode, mddev->array_size << 10);
+			up(&bdev->bd_inode->i_sem);
+			bdput(bdev);
+		}
+	}
+	return rv;
+}
+
 /*
  * update_array_info is used to change the configuration of an
  * on-line array.
@@ -3053,49 +3140,9 @@ static int update_array_info(mddev_t *md
 		else
 			return mddev->pers->reconfig(mddev, info->layout, -1);
 	}
-	if (mddev->size != info->size) {
-		mdk_rdev_t * rdev;
-		struct list_head *tmp;
-		if (mddev->pers->resize == NULL)
-			return -EINVAL;
-		/* The "size" is the amount of each device that is used.
-		 * This can only make sense for arrays with redundancy.
-		 * linear and raid0 always use whatever space is available
-		 * We can only consider changing the size if no resync
-		 * or reconstruction is happening, and if the new size
-		 * is acceptable. It must fit before the sb_offset or,
-		 * if that is <data_offset, it must fit before the
-		 * size of each device.
-		 * If size is zero, we find the largest size that fits.
-		 */
-		if (mddev->sync_thread)
-			return -EBUSY;
-		ITERATE_RDEV(mddev,rdev,tmp) {
-			sector_t avail;
-			int fit = (info->size == 0);
-			if (rdev->sb_offset > rdev->data_offset)
-				avail = (rdev->sb_offset*2) - rdev->data_offset;
-			else
-				avail = get_capacity(rdev->bdev->bd_disk)
-					- rdev->data_offset;
-			if (fit && (info->size == 0 || info->size > avail/2))
-				info->size = avail/2;
-			if (avail < ((sector_t)info->size << 1))
-				return -ENOSPC;
-		}
-		rv = mddev->pers->resize(mddev, (sector_t)info->size *2);
-		if (!rv) {
-			struct block_device *bdev;
+	if (mddev->size != info->size)
+		rv = update_size(mddev, info->size);
 
-			bdev = bdget_disk(mddev->gendisk, 0);
-			if (bdev) {
-				down(&bdev->bd_inode->i_sem);
-				i_size_write(bdev->bd_inode, mddev->array_size << 10);
-				up(&bdev->bd_inode->i_sem);
-				bdput(bdev);
-			}
-		}
-	}
 	if (mddev->raid_disks    != info->raid_disks) {
 		/* change the number of raid disks */
 		if (mddev->pers->reshape == NULL)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 011 of 20] Expose md metadata format in sysfs.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (9 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 012 of 20] Allow array level to be set textually via sysfs NeilBrown
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Allow it to be set to a particular version, or 'none'.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    6 ++++++
 ./drivers/md/md.c      |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:31:46.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:33:02.000000000 +1100
@@ -183,6 +183,12 @@ All md devices contain:
      the array if the personality supports it (raid1, raid5, raid6),
      and if the component drives are large enough.
 
+  metadata_version
+     This indicates the format that is being used to record metadata
+     about the array.  It can be 0.90 (traditional format), 1.0, 1.1,
+     1.2 (newer format in varying locations) or "none" indicating that
+     the kernel isn't managing metadata at all.
+
 As component devices are added to an md array, they appear in the 'md'
 directory as new directories named
       dev-XXX

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:32:28.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:33:24.000000000 +1100
@@ -1857,6 +1857,54 @@ size_store(mddev_t *mddev, const char *b
 static struct md_sysfs_entry md_size =
 __ATTR(component_size, 0644, size_show, size_store);
 
+
+/* Metdata version.
+ * This is either 'none' for arrays with externally managed metadata,
+ * or N.M for internally known formats
+ */
+static ssize_t
+metadata_show(mddev_t *mddev, char *page)
+{
+	if (mddev->persistent)
+		return sprintf(page, "%d.%d\n",
+			       mddev->major_version, mddev->minor_version);
+	else
+		return sprintf(page, "none\n");
+}
+
+static ssize_t
+metadata_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	int major, minor;
+	char *e;
+	if (!list_empty(&mddev->disks))
+		return -EBUSY;
+
+	if (cmd_match(buf, "none")) {
+		mddev->persistent = 0;
+		mddev->major_version = 0;
+		mddev->minor_version = 90;
+		return len;
+	}
+	major = simple_strtoul(buf, &e, 10);
+	if (e==buf || *e != '.')
+		return -EINVAL;
+	buf = e+1;
+	minor = simple_strtoul(buf, &e, 10);
+	if (e==buf || *e != '\n')
+		return -EINVAL;
+	if (major >= sizeof(super_types)/sizeof(super_types[0]) ||
+	    super_types[major].name == NULL)
+		return -ENOENT;
+	mddev->major_version = major;
+	mddev->minor_version = minor;
+	mddev->persistent = 1;
+	return len;
+}
+
+static struct md_sysfs_entry md_metadata =
+__ATTR(metadata_version, 0644, metadata_show, metadata_store);
+
 static ssize_t
 action_show(mddev_t *mddev, char *page)
 {
@@ -1926,6 +1974,7 @@ static struct attribute *md_default_attr
 	&md_raid_disks.attr,
 	&md_chunk_size.attr,
 	&md_size.attr,
+	&md_metadata.attr,
 	NULL,
 };
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 012 of 20] Allow array level to be set textually via sysfs.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (10 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 011 of 20] Expose md metadata format in sysfs NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid




Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt      |    8 +++++
 ./drivers/md/faulty.c       |    1 
 ./drivers/md/linear.c       |    3 +-
 ./drivers/md/md.c           |   61 ++++++++++++++++++++++++++++++++++----------
 ./drivers/md/multipath.c    |    1 
 ./drivers/md/raid0.c        |    1 
 ./drivers/md/raid1.c        |    1 
 ./drivers/md/raid10.c       |    1 
 ./drivers/md/raid5.c        |    2 +
 ./drivers/md/raid6main.c    |    1 
 ./include/linux/raid/md_k.h |    1 
 11 files changed, 67 insertions(+), 14 deletions(-)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:33:02.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:33:35.000000000 +1100
@@ -189,6 +189,14 @@ All md devices contain:
      1.2 (newer format in varying locations) or "none" indicating that
      the kernel isn't managing metadata at all.
 
+  level
+     The raid 'level' for this array.  The name will often (but not
+     always) be the same as the name of the module that implements the
+     level.  To be auto-loaded the module must have an alias
+        md-$LEVEL  e.g. md-raid5
+     This can be written only while the array is being assembled, not
+     after it is started.
+
 As component devices are added to an md array, they appear in the 'md'
 directory as new directories named
       dev-XXX

diff ./drivers/md/faulty.c~current~ ./drivers/md/faulty.c
--- ./drivers/md/faulty.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/faulty.c	2005-12-12 11:33:35.000000000 +1100
@@ -342,4 +342,5 @@ module_init(raid_init);
 module_exit(raid_exit);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("md-personality-10"); /* faulty */
+MODULE_ALIAS("md-faulty");
 MODULE_ALIAS("md-level--5");

diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
--- ./drivers/md/linear.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/linear.c	2005-12-12 11:33:35.000000000 +1100
@@ -376,5 +376,6 @@ static void linear_exit (void)
 module_init(linear_init);
 module_exit(linear_exit);
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("md-personality-1"); /* LINEAR - degrecated*/
+MODULE_ALIAS("md-personality-1"); /* LINEAR - deprecated*/
+MODULE_ALIAS("md-linear");
 MODULE_ALIAS("md-level--1");

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:33:24.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:33:56.000000000 +1100
@@ -303,12 +303,15 @@ static mdk_rdev_t * find_rdev(mddev_t * 
 	return NULL;
 }
 
-static struct mdk_personality *find_pers(int level)
+static struct mdk_personality *find_pers(int level, char *clevel)
 {
 	struct mdk_personality *pers;
-	list_for_each_entry(pers, &pers_list, list)
-		if (pers->level == level)
+	list_for_each_entry(pers, &pers_list, list) {
+		if (level != LEVEL_NONE && pers->level == level)
 			return pers;
+		if (strcmp(pers->name, clevel)==0)
+			return pers;
+	}
 	return NULL;
 }
 
@@ -715,6 +718,7 @@ static int super_90_validate(mddev_t *md
 		mddev->ctime = sb->ctime;
 		mddev->utime = sb->utime;
 		mddev->level = sb->level;
+		mddev->clevel[0] = 0;
 		mddev->layout = sb->layout;
 		mddev->raid_disks = sb->raid_disks;
 		mddev->size = sb->size;
@@ -1051,6 +1055,7 @@ static int super_1_validate(mddev_t *mdd
 		mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1);
 		mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1);
 		mddev->level = le32_to_cpu(sb->level);
+		mddev->clevel[0] = 0;
 		mddev->layout = le32_to_cpu(sb->layout);
 		mddev->raid_disks = le32_to_cpu(sb->raid_disks);
 		mddev->size = le64_to_cpu(sb->size)/2;
@@ -1774,15 +1779,36 @@ static ssize_t
 level_show(mddev_t *mddev, char *page)
 {
 	struct mdk_personality *p = mddev->pers;
-	if (p == NULL && mddev->raid_disks == 0)
-		return 0;
-	if (mddev->level >= 0)
-		return sprintf(page, "raid%d\n", mddev->level);
-	else
+	if (p)
 		return sprintf(page, "%s\n", p->name);
+	else if (mddev->clevel[0])
+		return sprintf(page, "%s\n", mddev->clevel);
+	else if (mddev->level != LEVEL_NONE)
+		return sprintf(page, "%d\n", mddev->level);
+	else
+		return 0;
+}
+
+static ssize_t
+level_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	int rv = len;
+	if (mddev->pers)
+		return -EBUSY;
+	if (len == 0)
+		return 0;
+	if (len >= sizeof(mddev->clevel))
+		return -ENOSPC;
+	strncpy(mddev->clevel, buf, len);
+	if (mddev->clevel[len-1] == '\n')
+		len--;
+	mddev->clevel[len] = 0;
+	mddev->level = LEVEL_NONE;
+	return rv;
 }
 
-static struct md_sysfs_entry md_level = __ATTR_RO(level);
+static struct md_sysfs_entry md_level =
+__ATTR(level, 0644, level_show, level_store);
 
 static ssize_t
 raid_disks_show(mddev_t *mddev, char *page)
@@ -2158,7 +2184,10 @@ static int do_md_run(mddev_t * mddev)
 	}
 
 #ifdef CONFIG_KMOD
-	request_module("md-level-%d", mddev->level);
+	if (mddev->level != LEVEL_NONE)
+		request_module("md-level-%d", mddev->level);
+	else if (mddev->clevel[0])
+		request_module("md-%s", mddev->clevel);
 #endif
 
 	/*
@@ -2180,15 +2209,21 @@ static int do_md_run(mddev_t * mddev)
 		return -ENOMEM;
 
 	spin_lock(&pers_lock);
-	pers = find_pers(mddev->level);
+	pers = find_pers(mddev->level, mddev->clevel);
 	if (!pers || !try_module_get(pers->owner)) {
 		spin_unlock(&pers_lock);
-		printk(KERN_WARNING "md: personality for level %d is not loaded!\n",
-		       mddev->level);
+		if (mddev->level != LEVEL_NONE)
+			printk(KERN_WARNING "md: personality for level %d is not loaded!\n",
+			       mddev->level);
+		else
+			printk(KERN_WARNING "md: personality for level %s is not loaded!\n",
+			       mddev->clevel);
 		return -EINVAL;
 	}
 	mddev->pers = pers;
 	spin_unlock(&pers_lock);
+	mddev->level = pers->level;
+	strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
 
 	mddev->recovery = 0;
 	mddev->resync_max_sectors = mddev->size << 1; /* may be over-ridden by personality */

diff ./drivers/md/multipath.c~current~ ./drivers/md/multipath.c
--- ./drivers/md/multipath.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/multipath.c	2005-12-12 11:33:35.000000000 +1100
@@ -579,4 +579,5 @@ module_init(multipath_init);
 module_exit(multipath_exit);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("md-personality-7"); /* MULTIPATH */
+MODULE_ALIAS("md-multipath");
 MODULE_ALIAS("md-level--4");

diff ./drivers/md/raid0.c~current~ ./drivers/md/raid0.c
--- ./drivers/md/raid0.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid0.c	2005-12-12 11:33:35.000000000 +1100
@@ -536,4 +536,5 @@ module_init(raid0_init);
 module_exit(raid0_exit);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("md-personality-2"); /* RAID0 */
+MODULE_ALIAS("md-raid0");
 MODULE_ALIAS("md-level-0");

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid1.c	2005-12-12 11:33:35.000000000 +1100
@@ -2092,4 +2092,5 @@ module_init(raid_init);
 module_exit(raid_exit);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("md-personality-3"); /* RAID1 */
+MODULE_ALIAS("md-raid1");
 MODULE_ALIAS("md-level-1");

diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid10.c	2005-12-12 11:33:35.000000000 +1100
@@ -2117,4 +2117,5 @@ module_init(raid_init);
 module_exit(raid_exit);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("md-personality-9"); /* RAID10 */
+MODULE_ALIAS("md-raid10");
 MODULE_ALIAS("md-level-10");

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid5.c	2005-12-12 11:33:35.000000000 +1100
@@ -2240,5 +2240,7 @@ module_init(raid5_init);
 module_exit(raid5_exit);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("md-personality-4"); /* RAID5 */
+MODULE_ALIAS("md-raid5");
+MODULE_ALIAS("md-raid4");
 MODULE_ALIAS("md-level-5");
 MODULE_ALIAS("md-level-4");

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid6main.c	2005-12-12 11:33:35.000000000 +1100
@@ -2341,4 +2341,5 @@ module_init(raid6_init);
 module_exit(raid6_exit);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("md-personality-8"); /* RAID6 */
+MODULE_ALIAS("md-raid6");
 MODULE_ALIAS("md-level-6");

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2005-12-12 11:30:59.000000000 +1100
+++ ./include/linux/raid/md_k.h	2005-12-12 11:33:35.000000000 +1100
@@ -119,6 +119,7 @@ struct mddev_s
 	int				chunk_size;
 	time_t				ctime, utime;
 	int				level, layout;
+	char				clevel[16];
 	int				raid_disks;
 	int				max_disks;
 	sector_t			size; /* used size of component devices */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 013 of 20] Count corrected read errors per drive.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (11 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 012 of 20] Allow array level to be set textually via sysfs NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12 10:07   ` Andrew Morton
  2005-12-12  3:15 ` [PATCH md 014 of 20] Allow md/raid_disks to be settable NeilBrown
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Store this total in superblock (As appropriate),
and make it available to userspace via sysfs.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt      |   11 +++++++++++
 ./drivers/md/md.c           |   27 ++++++++++++++++++++++++++-
 ./drivers/md/raid1.c        |    2 ++
 ./drivers/md/raid10.c       |   11 ++++++++---
 ./drivers/md/raid5.c        |    3 +++
 ./drivers/md/raid6main.c    |    3 +++
 ./include/linux/raid/md_k.h |    4 ++++
 7 files changed, 57 insertions(+), 4 deletions(-)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:33:35.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:34:21.000000000 +1100
@@ -222,6 +222,17 @@ Each directory contains:
 			 of being recoverred to
 	This list make grow in future.
 
+      errors
+        An approximate count of read errors that have been detected on
+        this device but have not caused the device to be evicted from
+        the array (either because they were corrected or because they
+	happened while the array was read-only).  When using version-1
+	metadata, this value persists across restarts of the array.
+
+	This value can be written while assembling an array thus
+	providing an ongoing count for arrays with metadata managed by
+	userspace.
+
 
 An active md device will also contain and entry for each active device
 in the array.  These are named

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:33:56.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:34:37.000000000 +1100
@@ -1000,6 +1000,7 @@ static int super_1_load(mdk_rdev_t *rdev
 	}
 	rdev->preferred_minor = 0xffff;
 	rdev->data_offset = le64_to_cpu(sb->data_offset);
+	atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read));
 
 	rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256;
 	bmask = queue_hardsect_size(rdev->bdev->bd_disk->queue)-1;
@@ -1139,6 +1140,8 @@ static void super_1_sync(mddev_t *mddev,
 	else
 		sb->resync_offset = cpu_to_le64(0);
 
+	sb->cnt_corrected_read = atomic_read(&rdev->corrected_errors);
+
 	if (mddev->bitmap && mddev->bitmap_file == NULL) {
 		sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_offset);
 		sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET);
@@ -1592,9 +1595,30 @@ super_show(mdk_rdev_t *rdev, char *page)
 }
 static struct rdev_sysfs_entry rdev_super = __ATTR_RO(super);
 
+static ssize_t
+errors_show(mdk_rdev_t *rdev, char *page)
+{
+	return sprintf(page, "%d\n", atomic_read(&rdev->corrected_errors));
+}
+
+static ssize_t
+errors_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+	char *e;
+	unsigned long n = simple_strtoul(buf, &e, 10);
+	if (*buf && (*e == 0 || *e == '\n')) {
+		atomic_set(&rdev->corrected_errors, n);
+		return len;
+	}
+	return -EINVAL;
+}
+static struct rdev_sysfs_entry rdev_errors =
+__ATTR(errors, 0644, errors_show, errors_store);
+
 static struct attribute *rdev_default_attrs[] = {
 	&rdev_state.attr,
 	&rdev_super.attr,
+	&rdev_errors.attr,
 	NULL,
 };
 static ssize_t
@@ -1674,6 +1698,7 @@ static mdk_rdev_t *md_import_device(dev_
 	rdev->data_offset = 0;
 	atomic_set(&rdev->nr_pending, 0);
 	atomic_set(&rdev->read_errors, 0);
+	atomic_set(&rdev->corrected_errors, 0);
 
 	size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
 	if (!size) {
@@ -4721,7 +4746,7 @@ static int set_ro(const char *val, struc
 	int num = simple_strtoul(val, &e, 10);
 	if (*val && (*e == '\0' || *e == '\n')) {
 		start_readonly = num;
-		return 0;;
+		return 0;
 	}
 	return -EINVAL;
 }

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid1.c	2005-12-12 11:34:21.000000000 +1100
@@ -1265,6 +1265,7 @@ static void sync_request_write(mddev_t *
 					if (r1_bio->bios[d]->bi_end_io != end_sync_read)
 						continue;
 					rdev = conf->mirrors[d].rdev;
+					atomic_add(s, &rdev->corrected_errors);
 					if (sync_page_io(rdev->bdev,
 							 sect + rdev->data_offset,
 							 s<<9,
@@ -1463,6 +1464,7 @@ static void raid1d(mddev_t *mddev)
 							d = conf->raid_disks;
 						d--;
 						rdev = conf->mirrors[d].rdev;
+						atomic_add(s, &rdev->corrected_errors);
 						if (rdev &&
 						    test_bit(In_sync, &rdev->flags)) {
 							if (sync_page_io(rdev->bdev,

diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~	2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid10.c	2005-12-12 11:34:21.000000000 +1100
@@ -1122,9 +1122,13 @@ static int end_sync_read(struct bio *bio
 
 	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
 		set_bit(R10BIO_Uptodate, &r10_bio->state);
-	else if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
-		md_error(r10_bio->mddev,
-			 conf->mirrors[d].rdev);
+	else {
+		atomic_add(r10_bio->sectors,
+			   &conf->mirrors[d].rdev->corrected_errors);
+		if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
+			md_error(r10_bio->mddev,
+				 conf->mirrors[d].rdev);
+	}
 
 	/* for reconstruct, we always reschedule after a read.
 	 * for resync, only after all reads
@@ -1430,6 +1434,7 @@ static void raid10d(mddev_t *mddev)
 						sl--;
 						d = r10_bio->devs[sl].devnum;
 						rdev = conf->mirrors[d].rdev;
+						atomic_add(s, &rdev->corrected_errors);
 						if (rdev &&
 						    test_bit(In_sync, &rdev->flags)) {
 							if (sync_page_io(rdev->bdev,

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid5.c	2005-12-12 11:34:21.000000000 +1100
@@ -1400,6 +1400,9 @@ static void handle_stripe(struct stripe_
 			bi->bi_io_vec[0].bv_offset = 0;
 			bi->bi_size = STRIPE_SIZE;
 			bi->bi_next = NULL;
+			if (rw == WRITE &&
+			    test_bit(R5_ReWrite, &sh->dev[i].flags))
+				atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
 			generic_make_request(bi);
 		} else {
 			if (rw == 1)

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid6main.c	2005-12-12 11:34:21.000000000 +1100
@@ -1562,6 +1562,9 @@ static void handle_stripe(struct stripe_
 			bi->bi_io_vec[0].bv_offset = 0;
 			bi->bi_size = STRIPE_SIZE;
 			bi->bi_next = NULL;
+			if (rw == WRITE &&
+			    test_bit(R5_ReWrite, &sh->dev[i].flags))
+				atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
 			generic_make_request(bi);
 		} else {
 			if (rw == 1)

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2005-12-12 11:33:35.000000000 +1100
+++ ./include/linux/raid/md_k.h	2005-12-12 11:34:21.000000000 +1100
@@ -95,6 +95,10 @@ struct mdk_rdev_s
 	atomic_t	read_errors;	/* number of consecutive read errors that
 					 * we have tried to ignore.
 					 */
+	atomic_t	corrected_errors; /* number of corrected read errors,
+					   * for reporting to userspace and storing
+					   * in superblock.
+					   */
 };
 
 struct mddev_s

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 014 of 20] Allow md/raid_disks to be settable.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (12 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays NeilBrown
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


If array is active, try to reshape, else just set the value.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    3 +
 ./drivers/md/md.c      |   74 +++++++++++++++++++++++++++++++++----------------
 2 files changed, 54 insertions(+), 23 deletions(-)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:34:21.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:35:01.000000000 +1100
@@ -165,6 +165,9 @@ All md devices contain:
      in a fully functional array.  If this is not yet known, the file
      will be empty.  If an array is being resized (not currently
      possible) this will contain the larger of the old and new sizes.
+     Some raid level (RAID1) allow this value to be set while the
+     array is active.  This will reconfigure the array.   Otherwise
+     it can only be set while assembling an array.
 
   chunk_size
      This is the size if bytes for 'chunks' and is only relevant to

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:34:37.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:35:53.000000000 +1100
@@ -1843,7 +1843,27 @@ raid_disks_show(mddev_t *mddev, char *pa
 	return sprintf(page, "%d\n", mddev->raid_disks);
 }
 
-static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks);
+static int update_raid_disks(mddev_t *mddev, int raid_disks);
+
+static ssize_t
+raid_disks_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	/* can only set raid_disks if array is not yet active */
+	char *e;
+	int rv = 0;
+	unsigned long n = simple_strtoul(buf, &e, 10);
+
+	if (!*buf || (*e && *e != '\n'))
+		return -EINVAL;
+
+	if (mddev->pers)
+		rv = update_raid_disks(mddev, n);
+	else
+		mddev->raid_disks = n;
+	return rv ? rv : len;
+}
+static struct md_sysfs_entry md_raid_disks =
+__ATTR(raid_disks, 0644, raid_disks_show, raid_disks_store);
 
 static ssize_t
 chunk_size_show(mddev_t *mddev, char *page)
@@ -3201,6 +3221,33 @@ static int update_size(mddev_t *mddev, u
 	return rv;
 }
 
+static int update_raid_disks(mddev_t *mddev, int raid_disks)
+{
+	int rv;
+	/* change the number of raid disks */
+	if (mddev->pers->reshape == NULL)
+		return -EINVAL;
+	if (raid_disks <= 0 ||
+	    raid_disks >= mddev->max_disks)
+		return -EINVAL;
+	if (mddev->sync_thread)
+		return -EBUSY;
+	rv = mddev->pers->reshape(mddev, raid_disks);
+	if (!rv) {
+		struct block_device *bdev;
+
+		bdev = bdget_disk(mddev->gendisk, 0);
+		if (bdev) {
+			down(&bdev->bd_inode->i_sem);
+			i_size_write(bdev->bd_inode, mddev->array_size << 10);
+			up(&bdev->bd_inode->i_sem);
+			bdput(bdev);
+		}
+	}
+	return rv;
+}
+
+
 /*
  * update_array_info is used to change the configuration of an
  * on-line array.
@@ -3252,28 +3299,9 @@ static int update_array_info(mddev_t *md
 	if (mddev->size != info->size)
 		rv = update_size(mddev, info->size);
 
-	if (mddev->raid_disks    != info->raid_disks) {
-		/* change the number of raid disks */
-		if (mddev->pers->reshape == NULL)
-			return -EINVAL;
-		if (info->raid_disks <= 0 ||
-		    info->raid_disks >= mddev->max_disks)
-			return -EINVAL;
-		if (mddev->sync_thread)
-			return -EBUSY;
-		rv = mddev->pers->reshape(mddev, info->raid_disks);
-		if (!rv) {
-			struct block_device *bdev;
-
-			bdev = bdget_disk(mddev->gendisk, 0);
-			if (bdev) {
-				down(&bdev->bd_inode->i_sem);
-				i_size_write(bdev->bd_inode, mddev->array_size << 10);
-				up(&bdev->bd_inode->i_sem);
-				bdput(bdev);
-			}
-		}
-	}
+	if (mddev->raid_disks    != info->raid_disks)
+		rv = update_raid_disks(mddev, info->raid_disks);
+
 	if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) {
 		if (mddev->pers->quiesce == NULL)
 			return -EINVAL;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (13 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 014 of 20] Allow md/raid_disks to be settable NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 016 of 20] Expose device slot information via sysfs NeilBrown
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Move the checks - that dev size is never less than array size - into
bind_rdev_to_array to make sure it always happens properly (there is
one place where currently it doesn't).

Also reject any superblock which claims an array size smaller than the
device in question can hold.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |   41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:35:53.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:36:25.000000000 +1100
@@ -695,6 +695,10 @@ static int super_90_load(mdk_rdev_t *rde
 	}
 	rdev->size = calc_dev_size(rdev, sb->chunk_size);
 
+	if (rdev->size < sb->size && sb->level > 1)
+		/* "this cannot possibly happen" ... */
+		ret = -EINVAL;
+
  abort:
 	return ret;
 }
@@ -1039,6 +1043,9 @@ static int super_1_load(mdk_rdev_t *rdev
 	rdev->size = le64_to_cpu(sb->data_size)/2;
 	if (le32_to_cpu(sb->chunksize))
 		rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1);
+
+	if (le32_to_cpu(sb->size) > rdev->size*2)
+		return -EINVAL;
 	return 0;
 }
 
@@ -1224,6 +1231,14 @@ static int bind_rdev_to_array(mdk_rdev_t
 		MD_BUG();
 		return -EINVAL;
 	}
+	/* make sure rdev->size exceeds mddev->size */
+	if (rdev->size && (mddev->size == 0 || rdev->size < mddev->size)) {
+		if (mddev->pers)
+			/* Cannot change size, so fail */
+			return -ENOSPC;
+		else
+			mddev->size = rdev->size;
+	}
 	same_pdev = match_dev_unit(mddev, rdev);
 	if (same_pdev)
 		printk(KERN_WARNING
@@ -2898,12 +2913,6 @@ static int add_new_disk(mddev_t * mddev,
 		if (info->state & (1<<MD_DISK_WRITEMOSTLY))
 			set_bit(WriteMostly, &rdev->flags);
 
-		err = bind_rdev_to_array(rdev, mddev);
-		if (err) {
-			export_rdev(rdev);
-			return err;
-		}
-
 		if (!mddev->persistent) {
 			printk(KERN_INFO "md: nonpersistent superblock ...\n");
 			rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
@@ -2911,8 +2920,11 @@ static int add_new_disk(mddev_t * mddev,
 			rdev->sb_offset = calc_dev_sboffset(rdev->bdev);
 		rdev->size = calc_dev_size(rdev, mddev->chunk_size);
 
-		if (!mddev->size || (mddev->size > rdev->size))
-			mddev->size = rdev->size;
+		err = bind_rdev_to_array(rdev, mddev);
+		if (err) {
+			export_rdev(rdev);
+			return err;
+		}
 	}
 
 	return 0;
@@ -2984,15 +2996,6 @@ static int hot_add_disk(mddev_t * mddev,
 	size = calc_dev_size(rdev, mddev->chunk_size);
 	rdev->size = size;
 
-	if (size < mddev->size) {
-		printk(KERN_WARNING 
-			"%s: disk size %llu blocks < array size %llu\n",
-			mdname(mddev), (unsigned long long)size,
-			(unsigned long long)mddev->size);
-		err = -ENOSPC;
-		goto abort_export;
-	}
-
 	if (test_bit(Faulty, &rdev->flags)) {
 		printk(KERN_WARNING 
 			"md: can not hot-add faulty %s disk to %s!\n",
@@ -3002,7 +3005,9 @@ static int hot_add_disk(mddev_t * mddev,
 	}
 	clear_bit(In_sync, &rdev->flags);
 	rdev->desc_nr = -1;
-	bind_rdev_to_array(rdev, mddev);
+	err = bind_rdev_to_array(rdev, mddev);
+	if (err)
+		goto abort_export;
 
 	/*
 	 * The rest should better be atomic, we can have disk failures

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 016 of 20] Expose device slot information via sysfs
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (14 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 017 of 20] Export rdev->data_offset " NeilBrown
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


This the role that a device has in an array can be viewed
and set.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    8 ++++++++
 ./drivers/md/md.c      |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:35:01.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:36:55.000000000 +1100
@@ -236,6 +236,14 @@ Each directory contains:
 	providing an ongoing count for arrays with metadata managed by
 	userspace.
 
+      slot
+        This gives the role that the device has in the array.  It will
+	either be 'none' if the device is not active in the array
+        (i.e. is a spare or has failed) or an integer less than the
+	'raid_disks' number for the array indicating which possition
+	it currently fills.  This can only be set while assembling an
+	array.  A device for which this is set is assumed to be working.
+
 
 An active md device will also contain and entry for each active device
 in the array.  These are named

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:36:25.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:37:12.000000000 +1100
@@ -1630,10 +1630,45 @@ errors_store(mdk_rdev_t *rdev, const cha
 static struct rdev_sysfs_entry rdev_errors =
 __ATTR(errors, 0644, errors_show, errors_store);
 
+static ssize_t
+slot_show(mdk_rdev_t *rdev, char *page)
+{
+	if (rdev->raid_disk < 0)
+		return sprintf(page, "none\n");
+	else
+		return sprintf(page, "%d\n", rdev->raid_disk);
+}
+
+static ssize_t
+slot_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+	char *e;
+	int slot = simple_strtoul(buf, &e, 10);
+	if (strncmp(buf, "none", 4)==0)
+		slot = -1;
+	else if (e==buf || (*e && *e!= '\n'))
+		return -EINVAL;
+	if (rdev->mddev->pers)
+		/* Cannot set slot in active array (yet) */
+		return -EBUSY;
+	if (slot >= rdev->mddev->raid_disks)
+		return -ENOSPC;
+	rdev->raid_disk = slot;
+	/* assume it is working */
+	rdev->flags = 0;
+	set_bit(In_sync, &rdev->flags);
+	return len;
+}
+
+
+static struct rdev_sysfs_entry rdev_slot =
+__ATTR(slot, 0644, slot_show, slot_store);
+
 static struct attribute *rdev_default_attrs[] = {
 	&rdev_state.attr,
 	&rdev_super.attr,
 	&rdev_errors.attr,
+	&rdev_slot.attr,
 	NULL,
 };
 static ssize_t

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 017 of 20] Export rdev->data_offset via sysfs
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (15 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 016 of 20] Expose device slot information via sysfs NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 018 of 20] Allow available size of component devices to be set " NeilBrown
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid




Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    6 ++++++
 ./drivers/md/md.c      |   23 +++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:36:55.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:37:32.000000000 +1100
@@ -244,6 +244,12 @@ Each directory contains:
 	it currently fills.  This can only be set while assembling an
 	array.  A device for which this is set is assumed to be working.
 
+      offset
+        This gives the location in the device (in sectors from the
+        start) where data from the array will be stored.  Any part of
+        the device before this offset us not touched, unless it is
+        used for storing metadata (Formats 1.1 and 1.2).
+
 
 An active md device will also contain and entry for each active device
 in the array.  These are named

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:37:12.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:37:48.000000000 +1100
@@ -1664,11 +1664,34 @@ slot_store(mdk_rdev_t *rdev, const char 
 static struct rdev_sysfs_entry rdev_slot =
 __ATTR(slot, 0644, slot_show, slot_store);
 
+static ssize_t
+offset_show(mdk_rdev_t *rdev, char *page)
+{
+	return sprintf(page, "%llu\n", rdev->data_offset);
+}
+
+static ssize_t
+offset_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+	char *e;
+	unsigned long long offset = simple_strtoull(buf, &e, 10);
+	if (e==buf || (*e && *e != '\n'))
+		return -EINVAL;
+	if (rdev->mddev->pers)
+		return -EBUSY;
+	rdev->data_offset = offset;
+	return len;
+}
+
+static struct rdev_sysfs_entry rdev_offset =
+__ATTR(offset, 0644, offset_show, offset_store);
+
 static struct attribute *rdev_default_attrs[] = {
 	&rdev_state.attr,
 	&rdev_super.attr,
 	&rdev_errors.attr,
 	&rdev_slot.attr,
+	&rdev_offset.attr,
 	NULL,
 };
 static ssize_t

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 018 of 20] Allow available size of component devices to be set via sysfs.
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (16 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 017 of 20] Export rdev->data_offset " NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 019 of 20] Support adding new devices to md arrays " NeilBrown
  2005-12-12  3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid



Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    7 +++++++
 ./drivers/md/md.c      |   25 +++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:37:32.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:38:02.000000000 +1100
@@ -250,6 +250,13 @@ Each directory contains:
         the device before this offset us not touched, unless it is
         used for storing metadata (Formats 1.1 and 1.2).
 
+      size
+        The amount of the device, after the offset, that can be used
+        for storage of data.  This will normally be the same as the
+	component_size.  This can be written while assembling an
+        array.  If a value less than the current component_size is
+        written, component_size will be reduced to this value.
+
 
 An active md device will also contain and entry for each active device
 in the array.  These are named

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:37:48.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:38:30.000000000 +1100
@@ -1686,12 +1686,37 @@ offset_store(mdk_rdev_t *rdev, const cha
 static struct rdev_sysfs_entry rdev_offset =
 __ATTR(offset, 0644, offset_show, offset_store);
 
+static ssize_t
+rdev_size_show(mdk_rdev_t *rdev, char *page)
+{
+	return sprintf(page, "%llu\n", rdev->size);
+}
+
+static ssize_t
+rdev_size_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+	char *e;
+	unsigned long long size = simple_strtoull(buf, &e, 10);
+	if (e==buf || (*e && *e != '\n'))
+		return -EINVAL;
+	if (rdev->mddev->pers)
+		return -EBUSY;
+	rdev->size = size;
+	if (size < rdev->mddev->size || rdev->mddev->size == 0)
+		rdev->mddev->size = size;
+	return len;
+}
+
+static struct rdev_sysfs_entry rdev_size =
+__ATTR(size, 0644, rdev_size_show, rdev_size_store);
+
 static struct attribute *rdev_default_attrs[] = {
 	&rdev_state.attr,
 	&rdev_super.attr,
 	&rdev_errors.attr,
 	&rdev_slot.attr,
 	&rdev_offset.attr,
+	&rdev_size.attr,
 	NULL,
 };
 static ssize_t

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 019 of 20] Support adding new devices to md arrays via sysfs
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (17 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 018 of 20] Allow available size of component devices to be set " NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12  3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
  19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Writing major:minor to md/new_dev will bind that device to the
array. 

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt |    8 ++++++
 ./drivers/md/md.c      |   60 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 11:57:28.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 11:57:31.000000000 +1100
@@ -200,6 +200,14 @@ All md devices contain:
      This can be written only while the array is being assembled, not
      after it is started.
 
+   new_dev
+     This file can be written but not read.  The value written should
+     be a block device number as major:minor.  e.g. 8:0
+     This will cause that device to be attached to the array, if it is
+     available.  It will then appear at md/dev-XXX (depending on the
+     name of the device) and further configuration is then possible.
+
+
 As component devices are added to an md array, they appear in the 'md'
 directory as new directories named
       dev-XXX

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 11:57:28.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 11:57:49.000000000 +1100
@@ -1987,6 +1987,65 @@ chunk_size_store(mddev_t *mddev, const c
 static struct md_sysfs_entry md_chunk_size =
 __ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
 
+static ssize_t
+null_show(mddev_t *mddev, char *page)
+{
+	return -EINVAL;
+}
+
+static ssize_t
+new_dev_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	/* buf must be %d:%d\n? giving major and minor numbers */
+	/* The new device is added to the array.
+	 * If the array has a persistent superblock, we read the
+	 * superblock to initialise info and check validity.
+	 * Otherwise, only checking done is that in bind_rdev_to_array,
+	 * which mainly checks size.
+	 */
+	char *e;
+	int major = simple_strtoul(buf, &e, 10);
+	int minor;
+	dev_t dev;
+	mdk_rdev_t *rdev;
+	int err;
+
+	if (!*buf || *e != ':' || !e[1] || e[1] == '\n')
+		return -EINVAL;
+	minor = simple_strtoul(e+1, &e, 10);
+	if (*e && *e != '\n')
+		return -EINVAL;
+	dev = MKDEV(major, minor);
+	if (major != MAJOR(dev) ||
+	    minor != MINOR(dev))
+		return -EOVERFLOW;
+
+
+	if (mddev->persistent) {
+		rdev = md_import_device(dev, mddev->major_version,
+					mddev->minor_version);
+		if (!IS_ERR(rdev) && !list_empty(&mddev->disks)) {
+			mdk_rdev_t *rdev0 = list_entry(mddev->disks.next,
+						       mdk_rdev_t, same_set);
+			err = super_types[mddev->major_version]
+				.load_super(rdev, rdev0, mddev->minor_version);
+			if (err < 0)
+				goto out;
+		}
+	} else
+		rdev = md_import_device(dev, -1, -1);
+
+	if (IS_ERR(rdev))
+		return PTR_ERR(rdev);
+	err = bind_rdev_to_array(rdev, mddev);
+ out:
+	if (err)
+		export_rdev(rdev);
+	return err ? err : len;
+}
+
+static struct md_sysfs_entry md_new_device =
+__ATTR(new_dev, 0200, null_show, new_dev_store);
 
 static ssize_t
 size_show(mddev_t *mddev, char *page)
@@ -2144,6 +2203,7 @@ static struct attribute *md_default_attr
 	&md_chunk_size.attr,
 	&md_size.attr,
 	&md_metadata.attr,
+	&md_new_device.attr,
 	NULL,
 };
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH md 020 of 20] Allow sync-speed to be controlled per-device
  2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
                   ` (18 preceding siblings ...)
  2005-12-12  3:15 ` [PATCH md 019 of 20] Support adding new devices to md arrays " NeilBrown
@ 2005-12-12  3:15 ` NeilBrown
  2005-12-12 11:30   ` Jeff Breidenbach
  19 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2005-12-12  3:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Also export current (average) speed and status in sysfs.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./Documentation/md.txt      |   22 ++++++++
 ./drivers/md/md.c           |  110 ++++++++++++++++++++++++++++++++++++++++++--
 ./include/linux/raid/md_k.h |    4 +
 3 files changed, 131 insertions(+), 5 deletions(-)

diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~	2005-12-12 12:12:50.000000000 +1100
+++ ./Documentation/md.txt	2005-12-12 12:12:53.000000000 +1100
@@ -207,6 +207,28 @@ All md devices contain:
      available.  It will then appear at md/dev-XXX (depending on the
      name of the device) and further configuration is then possible.
 
+   sync_speed_min
+   sync_speed_max
+     This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
+     however they only apply to the particular array.
+     If no value has been written to these, of if the word 'system'
+     is written, then the system-wide value is used.  If a value,
+     in kibibytes-per-second is written, then it is used.
+     When the files are read, they show the currently active value
+     followed by "(local)" or "(system)" depending on whether it is
+     a locally set or system-wide value.
+
+   sync_completed
+     This shows the number of sectors that have been completed of
+     whatever the current sync_action is, followed by the number of
+     sectors in total that could need to be processed.  The two
+     numbers are separated by a '/'  thus effectively showing one
+     value, a fraction of the process that is complete.
+
+   sync_speed
+     This shows the current actual speed, in K/sec, of the current
+     sync_action.  It is averaged over the last 30 seconds.
+
 
 As component devices are added to an md array, they appear in the 'md'
 directory as new directories named

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-12-12 12:12:50.000000000 +1100
+++ ./drivers/md/md.c	2005-12-12 12:12:57.000000000 +1100
@@ -81,10 +81,22 @@ static DEFINE_SPINLOCK(pers_lock);
  * idle IO detection.
  *
  * you can change it via /proc/sys/dev/raid/speed_limit_min and _max.
+ * or /sys/block/mdX/md/sync_speed_{min,max}
  */
 
 static int sysctl_speed_limit_min = 1000;
 static int sysctl_speed_limit_max = 200000;
+static inline int speed_min(mddev_t *mddev)
+{
+	return mddev->sync_speed_min ?
+		mddev->sync_speed_min : sysctl_speed_limit_min;
+}
+
+static inline int speed_max(mddev_t *mddev)
+{
+	return mddev->sync_speed_max ?
+		mddev->sync_speed_max : sysctl_speed_limit_max;
+}
 
 static struct ctl_table_header *raid_table_header;
 
@@ -2197,6 +2209,90 @@ md_scan_mode = __ATTR(sync_action, S_IRU
 static struct md_sysfs_entry
 md_mismatches = __ATTR_RO(mismatch_cnt);
 
+static ssize_t
+sync_min_show(mddev_t *mddev, char *page)
+{
+	return sprintf(page, "%d (%s)\n", speed_min(mddev),
+		       mddev->sync_speed_min ? "local": "system");
+}
+
+static ssize_t
+sync_min_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	int min;
+	char *e;
+	if (strncmp(buf, "system", 6)==0) {
+		mddev->sync_speed_min = 0;
+		return len;
+	}
+	min = simple_strtoul(buf, &e, 10);
+	if (buf == e || (*e && *e != '\n') || min <= 0)
+		return -EINVAL;
+	mddev->sync_speed_min = min;
+	return len;
+}
+
+static struct md_sysfs_entry md_sync_min =
+__ATTR(sync_speed_min, S_IRUGO|S_IWUSR, sync_min_show, sync_min_store);
+
+static ssize_t
+sync_max_show(mddev_t *mddev, char *page)
+{
+	return sprintf(page, "%d (%s)\n", speed_max(mddev),
+		       mddev->sync_speed_max ? "local": "system");
+}
+
+static ssize_t
+sync_max_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	int max;
+	char *e;
+	if (strncmp(buf, "system", 6)==0) {
+		mddev->sync_speed_max = 0;
+		return len;
+	}
+	max = simple_strtoul(buf, &e, 10);
+	if (buf == e || (*e && *e != '\n') || max <= 0)
+		return -EINVAL;
+	mddev->sync_speed_max = max;
+	return len;
+}
+
+static struct md_sysfs_entry md_sync_max =
+__ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store);
+
+
+static ssize_t
+sync_speed_show(mddev_t *mddev, char *page)
+{
+	unsigned long resync, dt, db;
+	resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
+	dt = ((jiffies - mddev->resync_mark) / HZ);
+	if (!dt) dt++;
+	db = resync - (mddev->resync_mark_cnt);
+	return sprintf(page, "%ld\n", db/dt/2); /* K/sec */
+}
+
+static struct md_sysfs_entry
+md_sync_speed = __ATTR_RO(sync_speed);
+
+static ssize_t
+sync_completed_show(mddev_t *mddev, char *page)
+{
+	unsigned long max_blocks, resync;
+
+	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
+		max_blocks = mddev->resync_max_sectors;
+	else
+		max_blocks = mddev->size << 1;
+
+	resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
+	return sprintf(page, "%lu / %lu\n", resync, max_blocks);
+}
+
+static struct md_sysfs_entry
+md_sync_completed = __ATTR_RO(sync_completed);
+
 static struct attribute *md_default_attrs[] = {
 	&md_level.attr,
 	&md_raid_disks.attr,
@@ -2210,6 +2306,10 @@ static struct attribute *md_default_attr
 static struct attribute *md_redundancy_attrs[] = {
 	&md_scan_mode.attr,
 	&md_mismatches.attr,
+	&md_sync_min.attr,
+	&md_sync_max.attr,
+	&md_sync_speed.attr,
+	&md_sync_completed.attr,
 	NULL,
 };
 static struct attribute_group md_redundancy_group = {
@@ -4425,10 +4525,10 @@ static void md_do_sync(mddev_t *mddev)
 
 	printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev));
 	printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:"
-		" %d KB/sec/disc.\n", sysctl_speed_limit_min);
+		" %d KB/sec/disc.\n", speed_min(mddev));
 	printk(KERN_INFO "md: using maximum available idle IO bandwidth "
 	       "(but not more than %d KB/sec) for reconstruction.\n",
-	       sysctl_speed_limit_max);
+	       speed_max(mddev));
 
 	is_mddev_idle(mddev); /* this also initializes IO event counters */
 	/* we don't use the checkpoint if there's a bitmap */
@@ -4469,7 +4569,7 @@ static void md_do_sync(mddev_t *mddev)
 
 		skipped = 0;
 		sectors = mddev->pers->sync_request(mddev, j, &skipped,
-					    currspeed < sysctl_speed_limit_min);
+					    currspeed < speed_min(mddev));
 		if (sectors == 0) {
 			set_bit(MD_RECOVERY_ERR, &mddev->recovery);
 			goto out;
@@ -4534,8 +4634,8 @@ static void md_do_sync(mddev_t *mddev)
 		currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
 			/((jiffies-mddev->resync_mark)/HZ +1) +1;
 
-		if (currspeed > sysctl_speed_limit_min) {
-			if ((currspeed > sysctl_speed_limit_max) ||
+		if (currspeed > speed_min(mddev)) {
+			if ((currspeed > speed_max(mddev)) ||
 					!is_mddev_idle(mddev)) {
 				msleep(500);
 				goto repeat;

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2005-12-12 12:12:50.000000000 +1100
+++ ./include/linux/raid/md_k.h	2005-12-12 12:12:53.000000000 +1100
@@ -143,6 +143,10 @@ struct mddev_s
 	sector_t			resync_mismatches; /* count of sectors where
 							    * parity/replica mismatch found
 							    */
+	/* if zero, use the system-wide default */
+	int				sync_speed_min;
+	int				sync_speed_max;
+
 	int				ok_start_degraded;
 	/* recovery/resync flags 
 	 * NEEDED:   we might need to start a resync/recover

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH md 013 of 20] Count corrected read errors per drive.
  2005-12-12  3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
@ 2005-12-12 10:07   ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2005-12-12 10:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown <neilb@suse.de> wrote:
>
> +      errors
>  +        An approximate count of read errors that have been detected on
>  +        this device but have not caused the device to be evicted from
>  +        the array (either because they were corrected or because they
>  +	happened while the array was read-only).  When using version-1
>  +	metadata, this value persists across restarts of the array.

Looks funny.  Mixture of tabs and spaces I guess.

<fixes it>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH md 020 of 20] Allow sync-speed to be controlled per-device
  2005-12-12  3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
@ 2005-12-12 11:30   ` Jeff Breidenbach
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Breidenbach @ 2005-12-12 11:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Typo.

of if -> or if

On Sun, 11 Dec 2005 7:16 pm, NeilBrown wrote:
>
> Also export current (average) speed and status in sysfs.
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
>  ./Documentation/md.txt      |   22 ++++++++
>  ./drivers/md/md.c           |  110 
> ++++++++++++++++++++++++++++++++++++++++++--
>  ./include/linux/raid/md_k.h |    4 +
>  3 files changed, 131 insertions(+), 5 deletions(-)
>
> diff ./Documentation/md.txt~current~ ./Documentation/md.txt
> --- ./Documentation/md.txt~current~	2005-12-12 12:12:50.000000000 +1100
> +++ ./Documentation/md.txt	2005-12-12 12:12:53.000000000 +1100
> @@ -207,6 +207,28 @@ All md devices contain:
>       available.  It will then appear at md/dev-XXX (depending on the
>       name of the device) and further configuration is then possible.
>
> +   sync_speed_min
> +   sync_speed_max
> +     This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
> +     however they only apply to the particular array.
> +     If no value has been written to these, of if the word 'system'
> +     is written, then the system-wide value is used.  If a value,
> +     in kibibytes-per-second is written, then it is used.
> +     When the files are read, they show the currently active value
> +     followed by "(local)" or "(system)" depending on whether it is
> +     a locally set or system-wide value.
> +
> +   sync_completed
> +     This shows the number of sectors that have been completed of
> +     whatever the current sync_action is, followed by the number of
> +     sectors in total that could need to be processed.  The two
> +     numbers are separated by a '/'  thus effectively showing one
> +     value, a fraction of the process that is complete.
> +
> +   sync_speed
> +     This shows the current actual speed, in K/sec, of the current
> +     sync_action.  It is averaged over the last 30 seconds.
> +
>
>  As component devices are added to an md array, they appear in the 'md'
>  directory as new directories named
>
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> --- ./drivers/md/md.c~current~	2005-12-12 12:12:50.000000000 +1100
> +++ ./drivers/md/md.c	2005-12-12 12:12:57.000000000 +1100
> @@ -81,10 +81,22 @@ static DEFINE_SPINLOCK(pers_lock);
>   * idle IO detection.
>   *
>   * you can change it via /proc/sys/dev/raid/speed_limit_min and _max.
> + * or /sys/block/mdX/md/sync_speed_{min,max}
>   */
>
>  static int sysctl_speed_limit_min = 1000;
>  static int sysctl_speed_limit_max = 200000;
> +static inline int speed_min(mddev_t *mddev)
> +{
> +	return mddev->sync_speed_min ?
> +		mddev->sync_speed_min : sysctl_speed_limit_min;
> +}
> +
> +static inline int speed_max(mddev_t *mddev)
> +{
> +	return mddev->sync_speed_max ?
> +		mddev->sync_speed_max : sysctl_speed_limit_max;
> +}
>
>  static struct ctl_table_header *raid_table_header;
>
> @@ -2197,6 +2209,90 @@ md_scan_mode = __ATTR(sync_action, S_IRU
>  static struct md_sysfs_entry
>  md_mismatches = __ATTR_RO(mismatch_cnt);
>
> +static ssize_t
> +sync_min_show(mddev_t *mddev, char *page)
> +{
> +	return sprintf(page, "%d (%s)\n", speed_min(mddev),
> +		       mddev->sync_speed_min ? "local": "system");
> +}
> +
> +static ssize_t
> +sync_min_store(mddev_t *mddev, const char *buf, size_t len)
> +{
> +	int min;
> +	char *e;
> +	if (strncmp(buf, "system", 6)==0) {
> +		mddev->sync_speed_min = 0;
> +		return len;
> +	}
> +	min = simple_strtoul(buf, &e, 10);
> +	if (buf == e || (*e && *e != '\n') || min <= 0)
> +		return -EINVAL;
> +	mddev->sync_speed_min = min;
> +	return len;
> +}
> +
> +static struct md_sysfs_entry md_sync_min =
> +__ATTR(sync_speed_min, S_IRUGO|S_IWUSR, sync_min_show, 
> sync_min_store);
> +
> +static ssize_t
> +sync_max_show(mddev_t *mddev, char *page)
> +{
> +	return sprintf(page, "%d (%s)\n", speed_max(mddev),
> +		       mddev->sync_speed_max ? "local": "system");
> +}
> +
> +static ssize_t
> +sync_max_store(mddev_t *mddev, const char *buf, size_t len)
> +{
> +	int max;
> +	char *e;
> +	if (strncmp(buf, "system", 6)==0) {
> +		mddev->sync_speed_max = 0;
> +		return len;
> +	}
> +	max = simple_strtoul(buf, &e, 10);
> +	if (buf == e || (*e && *e != '\n') || max <= 0)
> +		return -EINVAL;
> +	mddev->sync_speed_max = max;
> +	return len;
> +}
> +
> +static struct md_sysfs_entry md_sync_max =
> +__ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, 
> sync_max_store);
> +
> +
> +static ssize_t
> +sync_speed_show(mddev_t *mddev, char *page)
> +{
> +	unsigned long resync, dt, db;
> +	resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
> +	dt = ((jiffies - mddev->resync_mark) / HZ);
> +	if (!dt) dt++;
> +	db = resync - (mddev->resync_mark_cnt);
> +	return sprintf(page, "%ld\n", db/dt/2); /* K/sec */
> +}
> +
> +static struct md_sysfs_entry
> +md_sync_speed = __ATTR_RO(sync_speed);
> +
> +static ssize_t
> +sync_completed_show(mddev_t *mddev, char *page)
> +{
> +	unsigned long max_blocks, resync;
> +
> +	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
> +		max_blocks = mddev->resync_max_sectors;
> +	else
> +		max_blocks = mddev->size << 1;
> +
> +	resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
> +	return sprintf(page, "%lu / %lu\n", resync, max_blocks);
> +}
> +
> +static struct md_sysfs_entry
> +md_sync_completed = __ATTR_RO(sync_completed);
> +
>  static struct attribute *md_default_attrs[] = {
>  	&md_level.attr,
>  	&md_raid_disks.attr,
> @@ -2210,6 +2306,10 @@ static struct attribute *md_default_attr
>  static struct attribute *md_redundancy_attrs[] = {
>  	&md_scan_mode.attr,
>  	&md_mismatches.attr,
> +	&md_sync_min.attr,
> +	&md_sync_max.attr,
> +	&md_sync_speed.attr,
> +	&md_sync_completed.attr,
>  	NULL,
>  };
>  static struct attribute_group md_redundancy_group = {
> @@ -4425,10 +4525,10 @@ static void md_do_sync(mddev_t *mddev)
>
>  	printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev));
>  	printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:"
> -		" %d KB/sec/disc.\n", sysctl_speed_limit_min);
> +		" %d KB/sec/disc.\n", speed_min(mddev));
>  	printk(KERN_INFO "md: using maximum available idle IO bandwidth "
>  	       "(but not more than %d KB/sec) for reconstruction.\n",
> -	       sysctl_speed_limit_max);
> +	       speed_max(mddev));
>
>  	is_mddev_idle(mddev); /* this also initializes IO event counters */
>  	/* we don't use the checkpoint if there's a bitmap */
> @@ -4469,7 +4569,7 @@ static void md_do_sync(mddev_t *mddev)
>
>  		skipped = 0;
>  		sectors = mddev->pers->sync_request(mddev, j, &skipped,
> -					    currspeed < sysctl_speed_limit_min);
> +					    currspeed < speed_min(mddev));
>  		if (sectors == 0) {
>  			set_bit(MD_RECOVERY_ERR, &mddev->recovery);
>  			goto out;
> @@ -4534,8 +4634,8 @@ static void md_do_sync(mddev_t *mddev)
>  		currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
>  			/((jiffies-mddev->resync_mark)/HZ +1) +1;
>
> -		if (currspeed > sysctl_speed_limit_min) {
> -			if ((currspeed > sysctl_speed_limit_max) ||
> +		if (currspeed > speed_min(mddev)) {
> +			if ((currspeed > speed_max(mddev)) ||
>  					!is_mddev_idle(mddev)) {
>  				msleep(500);
>  				goto repeat;
>
> diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
> --- ./include/linux/raid/md_k.h~current~	2005-12-12 12:12:50.000000000 
> +1100
> +++ ./include/linux/raid/md_k.h	2005-12-12 12:12:53.000000000 +1100
> @@ -143,6 +143,10 @@ struct mddev_s
>  	sector_t			resync_mismatches; /* count of sectors where
>  							    * parity/replica mismatch found
>  							    */
> +	/* if zero, use the system-wide default */
> +	int				sync_speed_min;
> +	int				sync_speed_max;
> +
>  	int				ok_start_degraded;
>  	/* recovery/resync flags
>  	 * NEEDED:   we might need to start a resync/recover
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--Jeff

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2005-12-12 11:30 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-12  3:10 [PATCH md 000 of 20] Introduction NeilBrown
2005-12-12  3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
2005-12-12  3:10 ` [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is NeilBrown
2005-12-12  3:10 ` [PATCH md 003 of 20] Define and use safe_put_page for md NeilBrown
2005-12-12  3:13 ` [PATCH md 004 of 20] Helper function to match commands written to sysfs files NeilBrown
2005-12-12  3:14 ` [PATCH md 005 of 20] Fix typo in comment NeilBrown
2005-12-12  3:14 ` [PATCH md 006 of 20] Make a couple of names in md.c static NeilBrown
2005-12-12  3:14 ` [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem NeilBrown
2005-12-12  3:14 ` [PATCH md 008 of 20] Fix rdev->pending counts in raid1 NeilBrown
2005-12-12  3:14 ` [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs NeilBrown
2005-12-12  3:15 ` [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs NeilBrown
2005-12-12  3:15 ` [PATCH md 011 of 20] Expose md metadata format in sysfs NeilBrown
2005-12-12  3:15 ` [PATCH md 012 of 20] Allow array level to be set textually via sysfs NeilBrown
2005-12-12  3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
2005-12-12 10:07   ` Andrew Morton
2005-12-12  3:15 ` [PATCH md 014 of 20] Allow md/raid_disks to be settable NeilBrown
2005-12-12  3:15 ` [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays NeilBrown
2005-12-12  3:15 ` [PATCH md 016 of 20] Expose device slot information via sysfs NeilBrown
2005-12-12  3:15 ` [PATCH md 017 of 20] Export rdev->data_offset " NeilBrown
2005-12-12  3:15 ` [PATCH md 018 of 20] Allow available size of component devices to be set " NeilBrown
2005-12-12  3:15 ` [PATCH md 019 of 20] Support adding new devices to md arrays " NeilBrown
2005-12-12  3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
2005-12-12 11:30   ` Jeff Breidenbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).