* [PATCH md 000 of 20] Introduction
@ 2005-12-12 3:10 NeilBrown
2005-12-12 3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
` (19 more replies)
0 siblings, 20 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Following are 20 patches for md/raid in 2.6.15-rc5-mm2
The first 2 are suitable for 2.6.15.
The first fixes a use-after-free bug in raid1 which could only cause problems
with using raid1 with some devices in write-behind mode.
The second fixes a carelessness that could result in a deadlock
if raid5 stripe_cache_size were set too low.
3 through 8 are assorted minor cleanups or bugfixes only relevant to -mm
9 through 20 add a lot more information about md arrays to sysfs and allow
a lot of it to be settable. This will ultimately make it possible to completely
configure an md array through sysfs without using the ioctl's at all. This is
good because the ioctl's are limited and while I've manage to squeeze some
extra functionality out of them, they've reached their limit.
One particular goal I am working towards here is to be able to have an
md array where the metadata is completely managed by userspace. A program would
read the superblock - fill in the correct information via sysfs, and start the
array.
On device failure or other events, userspace gets notified and take necessary
action like identifying a spare, passing it to md, and requesting a reconstruction.
This means that e.g. DDF can be supported without adding complexity to the kernel.
Thanks,
NeilBrown
[PATCH md 001 of 20] Fix a use-after-free bug in raid1
[PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is.
[PATCH md 003 of 20] Define and use safe_put_page for md.
[PATCH md 004 of 20] Helper function to match commands written to sysfs files.
[PATCH md 005 of 20] Fix typo in comment.
[PATCH md 006 of 20] Make a couple of names in md.c static
[PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem.
[PATCH md 008 of 20] Fix rdev->pending counts in raid1
[PATCH md 009 of 20] Allow chunk_size to be settable through sysfs
[PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs
[PATCH md 011 of 20] Expose md metadata format in sysfs.
[PATCH md 012 of 20] Allow array level to be set textually via sysfs.
[PATCH md 013 of 20] Count corrected read errors per drive.
[PATCH md 014 of 20] Allow md/raid_disks to be settable.
[PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays.
[PATCH md 016 of 20] Expose device slot information via sysfs
[PATCH md 017 of 20] Export rdev->data_offset via sysfs
[PATCH md 018 of 20] Allow available size of component devices to be set via sysfs.
[PATCH md 019 of 20] Support adding new devices to md arrays via sysfs
[PATCH md 020 of 20] Allow sync-speed to be controlled per-device
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 001 of 20] Fix a use-after-free bug in raid1
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
@ 2005-12-12 3:10 ` NeilBrown
2005-12-12 3:10 ` [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is NeilBrown
` (18 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Who would submit code with a FIXME like that in it !!!!
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-12 10:45:22.000000000 +1100
@@ -323,7 +323,6 @@ static int raid1_end_write_request(struc
* this branch is our 'one mirror IO has finished' event handler:
*/
r1_bio->bios[mirror] = NULL;
- bio_put(bio);
if (!uptodate) {
md_error(r1_bio->mddev, conf->mirrors[mirror].rdev);
/* an I/O failed, we can't clear the bitmap */
@@ -380,7 +379,6 @@ static int raid1_end_write_request(struc
}
if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
/* free extra copy of the data pages */
-/* FIXME bio has been freed!!! */
int i = bio->bi_vcnt;
while (i--)
put_page(bio->bi_io_vec[i].bv_page);
@@ -394,6 +392,9 @@ static int raid1_end_write_request(struc
raid_end_bio_io(r1_bio);
}
+ if (r1_bio->bios[mirror]==NULL)
+ bio_put(bio);
+
rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev);
return 0;
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 003 of 20] Define and use safe_put_page for md.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
2005-12-12 3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
2005-12-12 3:10 ` [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is NeilBrown
@ 2005-12-12 3:10 ` NeilBrown
2005-12-12 3:13 ` [PATCH md 004 of 20] Helper function to match commands written to sysfs files NeilBrown
` (16 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
md sometimes call put_page on NULL pointers (treating it like kfree).
This is not safe, so define and use a 'safe_put_page' which checks
for NULL.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 3 +--
./drivers/md/raid1.c | 8 ++++----
./drivers/md/raid10.c | 8 ++++----
./drivers/md/raid6main.c | 3 +--
./include/linux/raid/md_k.h | 5 +++++
5 files changed, 15 insertions(+), 12 deletions(-)
diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~ 2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/bitmap.c 2005-12-12 10:45:26.000000000 +1100
@@ -626,8 +626,7 @@ static void bitmap_file_unmap(struct bit
kfree(map);
kfree(attr);
- if (sb_page)
- put_page(sb_page);
+ safe_put_page(sb_page);
}
static void bitmap_stop_daemon(struct bitmap *bitmap);
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-12 10:45:22.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-12 10:45:26.000000000 +1100
@@ -136,7 +136,7 @@ static void * r1buf_pool_alloc(gfp_t gfp
out_free_pages:
for (i=0; i < RESYNC_PAGES ; i++)
for (j=0 ; j < pi->raid_disks; j++)
- put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
+ safe_put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
j = -1;
out_free_bio:
while ( ++j < pi->raid_disks )
@@ -156,7 +156,7 @@ static void r1buf_pool_free(void *__r1_b
if (j == 0 ||
r1bio->bios[j]->bi_io_vec[i].bv_page !=
r1bio->bios[0]->bi_io_vec[i].bv_page)
- put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
+ safe_put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
}
for (i=0 ; i < pi->raid_disks; i++)
bio_put(r1bio->bios[i]);
@@ -381,7 +381,7 @@ static int raid1_end_write_request(struc
/* free extra copy of the data pages */
int i = bio->bi_vcnt;
while (i--)
- put_page(bio->bi_io_vec[i].bv_page);
+ safe_put_page(bio->bi_io_vec[i].bv_page);
}
/* clear the bitmap if all writes complete successfully */
bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
@@ -1907,7 +1907,7 @@ out_free_conf:
if (conf->r1bio_pool)
mempool_destroy(conf->r1bio_pool);
kfree(conf->mirrors);
- put_page(conf->tmppage);
+ safe_put_page(conf->tmppage);
kfree(conf->poolinfo);
kfree(conf);
mddev->private = NULL;
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-12 10:45:26.000000000 +1100
@@ -132,10 +132,10 @@ static void * r10buf_pool_alloc(gfp_t gf
out_free_pages:
for ( ; i > 0 ; i--)
- put_page(bio->bi_io_vec[i-1].bv_page);
+ safe_put_page(bio->bi_io_vec[i-1].bv_page);
while (j--)
for (i = 0; i < RESYNC_PAGES ; i++)
- put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
+ safe_put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
j = -1;
out_free_bio:
while ( ++j < nalloc )
@@ -155,7 +155,7 @@ static void r10buf_pool_free(void *__r10
struct bio *bio = r10bio->devs[j].bio;
if (bio) {
for (i = 0; i < RESYNC_PAGES; i++) {
- put_page(bio->bi_io_vec[i].bv_page);
+ safe_put_page(bio->bi_io_vec[i].bv_page);
bio->bi_io_vec[i].bv_page = NULL;
}
bio_put(bio);
@@ -2042,7 +2042,7 @@ static int run(mddev_t *mddev)
out_free_conf:
if (conf->r10bio_pool)
mempool_destroy(conf->r10bio_pool);
- put_page(conf->tmppage);
+ safe_put_page(conf->tmppage);
kfree(conf->mirrors);
kfree(conf);
mddev->private = NULL;
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-12-12 10:45:26.000000000 +1100
@@ -2072,8 +2072,7 @@ static int run(mddev_t *mddev)
abort:
if (conf) {
print_raid6_conf(conf);
- if (conf->spare_page)
- put_page(conf->spare_page);
+ safe_put_page(conf->spare_page);
kfree(conf->stripe_hashtbl);
kfree(conf);
}
diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2005-12-12 10:45:16.000000000 +1100
+++ ./include/linux/raid/md_k.h 2005-12-12 10:45:26.000000000 +1100
@@ -324,5 +324,10 @@ do { \
__wait_event_lock_irq(wq, condition, lock, cmd); \
} while (0)
+static inline void safe_put_page(struct page *p)
+{
+ if (p) put_page(p);
+}
+
#endif
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
2005-12-12 3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
@ 2005-12-12 3:10 ` NeilBrown
2005-12-12 3:10 ` [PATCH md 003 of 20] Define and use safe_put_page for md NeilBrown
` (17 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
The raid5 stripe cache was recently changed from fixed size
(NR_STRIPES) to variable size (conf->max_nr_stripes). However there
are two places that still use the constant and as a result, reducing
the size of the stripe cache can result in a deadlock.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/raid5.c 2005-12-12 10:45:24.000000000 +1100
@@ -96,7 +96,7 @@ static inline void __release_stripe(raid
list_add_tail(&sh->lru, &conf->inactive_list);
atomic_dec(&conf->active_stripes);
if (!conf->inactive_blocked ||
- atomic_read(&conf->active_stripes) < (NR_STRIPES*3/4))
+ atomic_read(&conf->active_stripes) < (conf->max_nr_stripes*3/4))
wake_up(&conf->wait_for_stripe);
}
}
@@ -255,7 +255,8 @@ static struct stripe_head *get_active_st
conf->inactive_blocked = 1;
wait_event_lock_irq(conf->wait_for_stripe,
!list_empty(&conf->inactive_list) &&
- (atomic_read(&conf->active_stripes) < (NR_STRIPES *3/4)
+ (atomic_read(&conf->active_stripes)
+ < (conf->max_nr_stripes *3/4)
|| !conf->inactive_blocked),
conf->device_lock,
unplug_slaves(conf->mddev);
@@ -1915,7 +1916,7 @@ static int run(mddev_t *mddev)
goto abort;
}
}
-memory = conf->max_nr_stripes * (sizeof(struct stripe_head) +
+ memory = conf->max_nr_stripes * (sizeof(struct stripe_head) +
conf->raid_disks * ((sizeof(struct bio) + PAGE_SIZE))) / 1024;
if (grow_stripes(conf, conf->max_nr_stripes)) {
printk(KERN_ERR
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 004 of 20] Helper function to match commands written to sysfs files.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (2 preceding siblings ...)
2005-12-12 3:10 ` [PATCH md 003 of 20] Define and use safe_put_page for md NeilBrown
@ 2005-12-12 3:13 ` NeilBrown
2005-12-12 3:14 ` [PATCH md 005 of 20] Fix typo in comment NeilBrown
` (15 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Commands written to sysfs files may, or my not, be \n terminated.
We want to accept with case. For this we use cmd_match.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 10:45:16.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 10:45:30.000000000 +1100
@@ -1525,6 +1525,26 @@ repeat:
}
+/* words written to sysfs files may, or my not, be \n terminated.
+ * We want to accept with case. For this we use cmd_match.
+ */
+static int cmd_match(const char *cmd, const char *str)
+{
+ /* See if cmd, written into a sysfs file, matches
+ * str. They must either be the same, or cmd can
+ * have a trailing newline
+ */
+ while (*cmd && *str && *cmd == *str) {
+ cmd++;
+ str++;
+ }
+ if (*cmd == '\n')
+ cmd++;
+ if (*str || *cmd)
+ return 0;
+ return 1;
+}
+
struct rdev_sysfs_entry {
struct attribute attr;
ssize_t (*show)(mdk_rdev_t *, char *);
@@ -1799,7 +1819,7 @@ action_store(mddev_t *mddev, const char
if (!mddev->pers || !mddev->pers->sync_request)
return -EINVAL;
- if (strcmp(page, "idle")==0 || strcmp(page, "idle\n")==0) {
+ if (cmd_match(page, "idle")) {
if (mddev->sync_thread) {
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
md_unregister_thread(mddev->sync_thread);
@@ -1812,13 +1832,12 @@ action_store(mddev_t *mddev, const char
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
return -EBUSY;
- if (strcmp(page, "resync")==0 || strcmp(page, "resync\n")==0 ||
- strcmp(page, "recover")==0 || strcmp(page, "recover\n")==0)
+ if (cmd_match(page, "resync") || cmd_match(page, "recover"))
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
else {
- if (strcmp(page, "check")==0 || strcmp(page, "check\n")==0)
+ if (cmd_match(page, "check"))
set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
- else if (strcmp(page, "repair")!=0 && strcmp(page, "repair\n")!=0)
+ else if (cmd_match(page, "repair"))
return -EINVAL;
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 005 of 20] Fix typo in comment.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (3 preceding siblings ...)
2005-12-12 3:13 ` [PATCH md 004 of 20] Helper function to match commands written to sysfs files NeilBrown
@ 2005-12-12 3:14 ` NeilBrown
2005-12-12 3:14 ` [PATCH md 006 of 20] Make a couple of names in md.c static NeilBrown
` (14 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid10.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-12 10:45:26.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-12 10:45:34.000000000 +1100
@@ -712,7 +712,7 @@ static void allow_barrier(conf_t *conf)
static void freeze_array(conf_t *conf)
{
/* stop syncio and normal IO and wait for everything to
- * go quite.
+ * go quiet.
* We increment barrier and nr_waiting, and then
* wait until barrier+nr_pending match nr_queued+2
*/
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 006 of 20] Make a couple of names in md.c static
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (4 preceding siblings ...)
2005-12-12 3:14 ` [PATCH md 005 of 20] Fix typo in comment NeilBrown
@ 2005-12-12 3:14 ` NeilBrown
2005-12-12 3:14 ` [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem NeilBrown
` (13 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
.. because they aren't used outside md.c
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 4 ++--
./include/linux/raid/md_k.h | 2 --
2 files changed, 2 insertions(+), 4 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 10:45:30.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 10:45:36.000000000 +1100
@@ -144,7 +144,7 @@ static int start_readonly;
* start array, stop array, error, add device, remove device,
* start build, activate spare
*/
-DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
+static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
static atomic_t md_event_count;
void md_new_event(mddev_t *mddev)
{
@@ -279,7 +279,7 @@ static inline void mddev_unlock(mddev_t
md_wakeup_thread(mddev->thread);
}
-mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
+static mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
{
mdk_rdev_t * rdev;
struct list_head *tmp;
diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2005-12-12 10:45:26.000000000 +1100
+++ ./include/linux/raid/md_k.h 2005-12-12 10:45:36.000000000 +1100
@@ -263,8 +263,6 @@ static inline char * mdname (mddev_t * m
return mddev->gendisk ? mddev->gendisk->disk_name : "mdX";
}
-extern mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr);
-
/*
* iterates through some rdev ringlist. It's safe to remove the
* current 'rdev'. Dont touch 'tmp' though.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (5 preceding siblings ...)
2005-12-12 3:14 ` [PATCH md 006 of 20] Make a couple of names in md.c static NeilBrown
@ 2005-12-12 3:14 ` NeilBrown
2005-12-12 3:14 ` [PATCH md 008 of 20] Fix rdev->pending counts in raid1 NeilBrown
` (12 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
When we update a page_cache page in the kernel, we need to
flush_dache_page or userspace might not see the change.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 2 ++
1 file changed, 2 insertions(+)
diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~ 2005-12-12 10:45:26.000000000 +1100
+++ ./drivers/md/bitmap.c 2005-12-12 10:45:38.000000000 +1100
@@ -315,6 +315,8 @@ static int write_page(struct bitmap *bit
if (bitmap->file == NULL)
return write_sb_page(bitmap->mddev, bitmap->offset, page, wait);
+ flush_dcache_page(page); /* make sure visible to anyone reading the file */
+
if (wait)
lock_page(page);
else {
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 008 of 20] Fix rdev->pending counts in raid1
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (6 preceding siblings ...)
2005-12-12 3:14 ` [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem NeilBrown
@ 2005-12-12 3:14 ` NeilBrown
2005-12-12 3:14 ` [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs NeilBrown
` (11 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
When we do a user-requested check/repair, we lose count
of the outstanding requests...
Also make sure that when anything is written to md/sync_action,
the RECOVERY_NEEDED flag is set and the thread is woken up
so any changes take effect.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 11 ++++-------
./drivers/md/raid1.c | 10 ++++++----
2 files changed, 10 insertions(+), 11 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 10:45:36.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 10:45:40.000000000 +1100
@@ -1826,13 +1826,10 @@ action_store(mddev_t *mddev, const char
mddev->sync_thread = NULL;
mddev->recovery = 0;
}
- return len;
- }
-
- if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
- test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+ } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+ test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
return -EBUSY;
- if (cmd_match(page, "resync") || cmd_match(page, "recover"))
+ else if (cmd_match(page, "resync") || cmd_match(page, "recover"))
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
else {
if (cmd_match(page, "check"))
@@ -1841,8 +1838,8 @@ action_store(mddev_t *mddev, const char
return -EINVAL;
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
}
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
return len;
}
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-12 10:45:26.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-12 10:45:40.000000000 +1100
@@ -527,7 +527,7 @@ static int read_balance(conf_t *conf, r1
/* cannot risk returning a device that failed
* before we inc'ed nr_pending
*/
- atomic_dec(&rdev->nr_pending);
+ rdev_dec_pending(rdev, conf->mddev);
goto retry;
}
conf->next_seq_sect = this_sector + sectors;
@@ -830,7 +830,7 @@ static int make_request(request_queue_t
!test_bit(Faulty, &rdev->flags)) {
atomic_inc(&rdev->nr_pending);
if (test_bit(Faulty, &rdev->flags)) {
- atomic_dec(&rdev->nr_pending);
+ rdev_dec_pending(rdev, mddev);
r1_bio->bios[i] = NULL;
} else
r1_bio->bios[i] = bio;
@@ -1176,6 +1176,7 @@ static void sync_request_write(mddev_t *
if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
r1_bio->bios[primary]->bi_end_io = NULL;
+ rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
break;
}
r1_bio->read_disk = primary;
@@ -1193,9 +1194,10 @@ static void sync_request_write(mddev_t *
break;
if (j >= 0)
mddev->resync_mismatches += r1_bio->sectors;
- if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
+ if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
sbio->bi_end_io = NULL;
- else {
+ rdev_dec_pending(conf->mirrors[i].rdev, mddev);
+ } else {
/* fixup the bio for reuse */
sbio->bi_vcnt = vcnt;
sbio->bi_size = r1_bio->sectors << 9;
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (7 preceding siblings ...)
2005-12-12 3:14 ` [PATCH md 008 of 20] Fix rdev->pending counts in raid1 NeilBrown
@ 2005-12-12 3:14 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs NeilBrown
` (10 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
... only before array is started of course.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 8 ++++++++
./drivers/md/md.c | 26 ++++++++++++++++++++++++++
2 files changed, 34 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:30:58.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:31:25.000000000 +1100
@@ -166,6 +166,14 @@ All md devices contain:
will be empty. If an array is being resized (not currently
possible) this will contain the larger of the old and new sizes.
+ chunk_size
+ This is the size if bytes for 'chunks' and is only relevant to
+ raid levels that involve striping (1,4,5,6,10). The address space
+ of the array is conceptually divided into chunks and consecutive
+ chunks are striped onto neighbouring devices.
+ The size should be atleast PAGE_SIZE (4k) and should be a power
+ of 2. This can only be set while assembling an array
+
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
dev-XXX
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:30:58.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:31:25.000000000 +1100
@@ -1795,6 +1795,31 @@ raid_disks_show(mddev_t *mddev, char *pa
static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks);
static ssize_t
+chunk_size_show(mddev_t *mddev, char *page)
+{
+ return sprintf(page, "%d\n", mddev->chunk_size);
+}
+
+static ssize_t
+chunk_size_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ /* can only set chunk_size if array is not yet active */
+ char *e;
+ unsigned long n = simple_strtoul(buf, &e, 10);
+
+ if (mddev->pers)
+ return -EBUSY;
+ if (!*buf || (*e && *e != '\n'))
+ return -EINVAL;
+
+ mddev->chunk_size = n;
+ return len;
+}
+static struct md_sysfs_entry md_chunk_size =
+__ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
+
+
+static ssize_t
action_show(mddev_t *mddev, char *page)
{
char *type = "idle";
@@ -1861,6 +1886,7 @@ md_mismatches = __ATTR_RO(mismatch_cnt);
static struct attribute *md_default_attrs[] = {
&md_level.attr,
&md_raid_disks.attr,
+ &md_chunk_size.attr,
NULL,
};
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (8 preceding siblings ...)
2005-12-12 3:14 ` [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 011 of 20] Expose md metadata format in sysfs NeilBrown
` (9 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 9 +++
./drivers/md/md.c | 131 +++++++++++++++++++++++++++++++++----------------
2 files changed, 98 insertions(+), 42 deletions(-)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:31:25.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:31:46.000000000 +1100
@@ -174,6 +174,15 @@ All md devices contain:
The size should be atleast PAGE_SIZE (4k) and should be a power
of 2. This can only be set while assembling an array
+ component_size
+ For arrays with data redundancy (i.e. not raid0, linear, faulty,
+ multipath), all components must be the same size - or at least
+ there must a size that they all provide space for. This is a key
+ part or the geometry of the array. It is measured in sectors
+ and can be read from here. Writing to this value may resize
+ the array if the personality supports it (raid1, raid5, raid6),
+ and if the component drives are large enough.
+
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
dev-XXX
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:31:25.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:32:28.000000000 +1100
@@ -1820,6 +1820,44 @@ __ATTR(chunk_size, 0644, chunk_size_show
static ssize_t
+size_show(mddev_t *mddev, char *page)
+{
+ return sprintf(page, "%llu\n", (unsigned long long)mddev->size);
+}
+
+static int update_size(mddev_t *mddev, unsigned long size);
+
+static ssize_t
+size_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ /* If array is inactive, we can reduce the component size, but
+ * not increase it (except from 0).
+ * If array is active, we can try an on-line resize
+ */
+ char *e;
+ int err = 0;
+ unsigned long long size = simple_strtoull(buf, &e, 10);
+ if (!*buf || *buf == '\n' ||
+ (*e && *e != '\n'))
+ return -EINVAL;
+
+ if (mddev->pers) {
+ err = update_size(mddev, size);
+ md_update_sb(mddev);
+ } else {
+ if (mddev->size == 0 ||
+ mddev->size > size)
+ mddev->size = size;
+ else
+ err = -ENOSPC;
+ }
+ return err ? err : len;
+}
+
+static struct md_sysfs_entry md_size =
+__ATTR(component_size, 0644, size_show, size_store);
+
+static ssize_t
action_show(mddev_t *mddev, char *page)
{
char *type = "idle";
@@ -1887,6 +1925,7 @@ static struct attribute *md_default_attr
&md_level.attr,
&md_raid_disks.attr,
&md_chunk_size.attr,
+ &md_size.attr,
NULL,
};
@@ -3005,6 +3044,54 @@ static int set_array_info(mddev_t * mdde
return 0;
}
+static int update_size(mddev_t *mddev, unsigned long size)
+{
+ mdk_rdev_t * rdev;
+ int rv;
+ struct list_head *tmp;
+
+ if (mddev->pers->resize == NULL)
+ return -EINVAL;
+ /* The "size" is the amount of each device that is used.
+ * This can only make sense for arrays with redundancy.
+ * linear and raid0 always use whatever space is available
+ * We can only consider changing the size if no resync
+ * or reconstruction is happening, and if the new size
+ * is acceptable. It must fit before the sb_offset or,
+ * if that is <data_offset, it must fit before the
+ * size of each device.
+ * If size is zero, we find the largest size that fits.
+ */
+ if (mddev->sync_thread)
+ return -EBUSY;
+ ITERATE_RDEV(mddev,rdev,tmp) {
+ sector_t avail;
+ int fit = (size == 0);
+ if (rdev->sb_offset > rdev->data_offset)
+ avail = (rdev->sb_offset*2) - rdev->data_offset;
+ else
+ avail = get_capacity(rdev->bdev->bd_disk)
+ - rdev->data_offset;
+ if (fit && (size == 0 || size > avail/2))
+ size = avail/2;
+ if (avail < ((sector_t)size << 1))
+ return -ENOSPC;
+ }
+ rv = mddev->pers->resize(mddev, (sector_t)size *2);
+ if (!rv) {
+ struct block_device *bdev;
+
+ bdev = bdget_disk(mddev->gendisk, 0);
+ if (bdev) {
+ down(&bdev->bd_inode->i_sem);
+ i_size_write(bdev->bd_inode, mddev->array_size << 10);
+ up(&bdev->bd_inode->i_sem);
+ bdput(bdev);
+ }
+ }
+ return rv;
+}
+
/*
* update_array_info is used to change the configuration of an
* on-line array.
@@ -3053,49 +3140,9 @@ static int update_array_info(mddev_t *md
else
return mddev->pers->reconfig(mddev, info->layout, -1);
}
- if (mddev->size != info->size) {
- mdk_rdev_t * rdev;
- struct list_head *tmp;
- if (mddev->pers->resize == NULL)
- return -EINVAL;
- /* The "size" is the amount of each device that is used.
- * This can only make sense for arrays with redundancy.
- * linear and raid0 always use whatever space is available
- * We can only consider changing the size if no resync
- * or reconstruction is happening, and if the new size
- * is acceptable. It must fit before the sb_offset or,
- * if that is <data_offset, it must fit before the
- * size of each device.
- * If size is zero, we find the largest size that fits.
- */
- if (mddev->sync_thread)
- return -EBUSY;
- ITERATE_RDEV(mddev,rdev,tmp) {
- sector_t avail;
- int fit = (info->size == 0);
- if (rdev->sb_offset > rdev->data_offset)
- avail = (rdev->sb_offset*2) - rdev->data_offset;
- else
- avail = get_capacity(rdev->bdev->bd_disk)
- - rdev->data_offset;
- if (fit && (info->size == 0 || info->size > avail/2))
- info->size = avail/2;
- if (avail < ((sector_t)info->size << 1))
- return -ENOSPC;
- }
- rv = mddev->pers->resize(mddev, (sector_t)info->size *2);
- if (!rv) {
- struct block_device *bdev;
+ if (mddev->size != info->size)
+ rv = update_size(mddev, info->size);
- bdev = bdget_disk(mddev->gendisk, 0);
- if (bdev) {
- down(&bdev->bd_inode->i_sem);
- i_size_write(bdev->bd_inode, mddev->array_size << 10);
- up(&bdev->bd_inode->i_sem);
- bdput(bdev);
- }
- }
- }
if (mddev->raid_disks != info->raid_disks) {
/* change the number of raid disks */
if (mddev->pers->reshape == NULL)
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 011 of 20] Expose md metadata format in sysfs.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (9 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 012 of 20] Allow array level to be set textually via sysfs NeilBrown
` (8 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Allow it to be set to a particular version, or 'none'.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 6 ++++++
./drivers/md/md.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:31:46.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:33:02.000000000 +1100
@@ -183,6 +183,12 @@ All md devices contain:
the array if the personality supports it (raid1, raid5, raid6),
and if the component drives are large enough.
+ metadata_version
+ This indicates the format that is being used to record metadata
+ about the array. It can be 0.90 (traditional format), 1.0, 1.1,
+ 1.2 (newer format in varying locations) or "none" indicating that
+ the kernel isn't managing metadata at all.
+
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
dev-XXX
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:32:28.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:33:24.000000000 +1100
@@ -1857,6 +1857,54 @@ size_store(mddev_t *mddev, const char *b
static struct md_sysfs_entry md_size =
__ATTR(component_size, 0644, size_show, size_store);
+
+/* Metdata version.
+ * This is either 'none' for arrays with externally managed metadata,
+ * or N.M for internally known formats
+ */
+static ssize_t
+metadata_show(mddev_t *mddev, char *page)
+{
+ if (mddev->persistent)
+ return sprintf(page, "%d.%d\n",
+ mddev->major_version, mddev->minor_version);
+ else
+ return sprintf(page, "none\n");
+}
+
+static ssize_t
+metadata_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ int major, minor;
+ char *e;
+ if (!list_empty(&mddev->disks))
+ return -EBUSY;
+
+ if (cmd_match(buf, "none")) {
+ mddev->persistent = 0;
+ mddev->major_version = 0;
+ mddev->minor_version = 90;
+ return len;
+ }
+ major = simple_strtoul(buf, &e, 10);
+ if (e==buf || *e != '.')
+ return -EINVAL;
+ buf = e+1;
+ minor = simple_strtoul(buf, &e, 10);
+ if (e==buf || *e != '\n')
+ return -EINVAL;
+ if (major >= sizeof(super_types)/sizeof(super_types[0]) ||
+ super_types[major].name == NULL)
+ return -ENOENT;
+ mddev->major_version = major;
+ mddev->minor_version = minor;
+ mddev->persistent = 1;
+ return len;
+}
+
+static struct md_sysfs_entry md_metadata =
+__ATTR(metadata_version, 0644, metadata_show, metadata_store);
+
static ssize_t
action_show(mddev_t *mddev, char *page)
{
@@ -1926,6 +1974,7 @@ static struct attribute *md_default_attr
&md_raid_disks.attr,
&md_chunk_size.attr,
&md_size.attr,
+ &md_metadata.attr,
NULL,
};
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 012 of 20] Allow array level to be set textually via sysfs.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (10 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 011 of 20] Expose md metadata format in sysfs NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
` (7 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 8 +++++
./drivers/md/faulty.c | 1
./drivers/md/linear.c | 3 +-
./drivers/md/md.c | 61 ++++++++++++++++++++++++++++++++++----------
./drivers/md/multipath.c | 1
./drivers/md/raid0.c | 1
./drivers/md/raid1.c | 1
./drivers/md/raid10.c | 1
./drivers/md/raid5.c | 2 +
./drivers/md/raid6main.c | 1
./include/linux/raid/md_k.h | 1
11 files changed, 67 insertions(+), 14 deletions(-)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:33:02.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:33:35.000000000 +1100
@@ -189,6 +189,14 @@ All md devices contain:
1.2 (newer format in varying locations) or "none" indicating that
the kernel isn't managing metadata at all.
+ level
+ The raid 'level' for this array. The name will often (but not
+ always) be the same as the name of the module that implements the
+ level. To be auto-loaded the module must have an alias
+ md-$LEVEL e.g. md-raid5
+ This can be written only while the array is being assembled, not
+ after it is started.
+
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
dev-XXX
diff ./drivers/md/faulty.c~current~ ./drivers/md/faulty.c
--- ./drivers/md/faulty.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/faulty.c 2005-12-12 11:33:35.000000000 +1100
@@ -342,4 +342,5 @@ module_init(raid_init);
module_exit(raid_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-10"); /* faulty */
+MODULE_ALIAS("md-faulty");
MODULE_ALIAS("md-level--5");
diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
--- ./drivers/md/linear.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/linear.c 2005-12-12 11:33:35.000000000 +1100
@@ -376,5 +376,6 @@ static void linear_exit (void)
module_init(linear_init);
module_exit(linear_exit);
MODULE_LICENSE("GPL");
-MODULE_ALIAS("md-personality-1"); /* LINEAR - degrecated*/
+MODULE_ALIAS("md-personality-1"); /* LINEAR - deprecated*/
+MODULE_ALIAS("md-linear");
MODULE_ALIAS("md-level--1");
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:33:24.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:33:56.000000000 +1100
@@ -303,12 +303,15 @@ static mdk_rdev_t * find_rdev(mddev_t *
return NULL;
}
-static struct mdk_personality *find_pers(int level)
+static struct mdk_personality *find_pers(int level, char *clevel)
{
struct mdk_personality *pers;
- list_for_each_entry(pers, &pers_list, list)
- if (pers->level == level)
+ list_for_each_entry(pers, &pers_list, list) {
+ if (level != LEVEL_NONE && pers->level == level)
return pers;
+ if (strcmp(pers->name, clevel)==0)
+ return pers;
+ }
return NULL;
}
@@ -715,6 +718,7 @@ static int super_90_validate(mddev_t *md
mddev->ctime = sb->ctime;
mddev->utime = sb->utime;
mddev->level = sb->level;
+ mddev->clevel[0] = 0;
mddev->layout = sb->layout;
mddev->raid_disks = sb->raid_disks;
mddev->size = sb->size;
@@ -1051,6 +1055,7 @@ static int super_1_validate(mddev_t *mdd
mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1);
mddev->utime = le64_to_cpu(sb->utime) & ((1ULL << 32)-1);
mddev->level = le32_to_cpu(sb->level);
+ mddev->clevel[0] = 0;
mddev->layout = le32_to_cpu(sb->layout);
mddev->raid_disks = le32_to_cpu(sb->raid_disks);
mddev->size = le64_to_cpu(sb->size)/2;
@@ -1774,15 +1779,36 @@ static ssize_t
level_show(mddev_t *mddev, char *page)
{
struct mdk_personality *p = mddev->pers;
- if (p == NULL && mddev->raid_disks == 0)
- return 0;
- if (mddev->level >= 0)
- return sprintf(page, "raid%d\n", mddev->level);
- else
+ if (p)
return sprintf(page, "%s\n", p->name);
+ else if (mddev->clevel[0])
+ return sprintf(page, "%s\n", mddev->clevel);
+ else if (mddev->level != LEVEL_NONE)
+ return sprintf(page, "%d\n", mddev->level);
+ else
+ return 0;
+}
+
+static ssize_t
+level_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ int rv = len;
+ if (mddev->pers)
+ return -EBUSY;
+ if (len == 0)
+ return 0;
+ if (len >= sizeof(mddev->clevel))
+ return -ENOSPC;
+ strncpy(mddev->clevel, buf, len);
+ if (mddev->clevel[len-1] == '\n')
+ len--;
+ mddev->clevel[len] = 0;
+ mddev->level = LEVEL_NONE;
+ return rv;
}
-static struct md_sysfs_entry md_level = __ATTR_RO(level);
+static struct md_sysfs_entry md_level =
+__ATTR(level, 0644, level_show, level_store);
static ssize_t
raid_disks_show(mddev_t *mddev, char *page)
@@ -2158,7 +2184,10 @@ static int do_md_run(mddev_t * mddev)
}
#ifdef CONFIG_KMOD
- request_module("md-level-%d", mddev->level);
+ if (mddev->level != LEVEL_NONE)
+ request_module("md-level-%d", mddev->level);
+ else if (mddev->clevel[0])
+ request_module("md-%s", mddev->clevel);
#endif
/*
@@ -2180,15 +2209,21 @@ static int do_md_run(mddev_t * mddev)
return -ENOMEM;
spin_lock(&pers_lock);
- pers = find_pers(mddev->level);
+ pers = find_pers(mddev->level, mddev->clevel);
if (!pers || !try_module_get(pers->owner)) {
spin_unlock(&pers_lock);
- printk(KERN_WARNING "md: personality for level %d is not loaded!\n",
- mddev->level);
+ if (mddev->level != LEVEL_NONE)
+ printk(KERN_WARNING "md: personality for level %d is not loaded!\n",
+ mddev->level);
+ else
+ printk(KERN_WARNING "md: personality for level %s is not loaded!\n",
+ mddev->clevel);
return -EINVAL;
}
mddev->pers = pers;
spin_unlock(&pers_lock);
+ mddev->level = pers->level;
+ strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
mddev->recovery = 0;
mddev->resync_max_sectors = mddev->size << 1; /* may be over-ridden by personality */
diff ./drivers/md/multipath.c~current~ ./drivers/md/multipath.c
--- ./drivers/md/multipath.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/multipath.c 2005-12-12 11:33:35.000000000 +1100
@@ -579,4 +579,5 @@ module_init(multipath_init);
module_exit(multipath_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-7"); /* MULTIPATH */
+MODULE_ALIAS("md-multipath");
MODULE_ALIAS("md-level--4");
diff ./drivers/md/raid0.c~current~ ./drivers/md/raid0.c
--- ./drivers/md/raid0.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid0.c 2005-12-12 11:33:35.000000000 +1100
@@ -536,4 +536,5 @@ module_init(raid0_init);
module_exit(raid0_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-2"); /* RAID0 */
+MODULE_ALIAS("md-raid0");
MODULE_ALIAS("md-level-0");
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-12 11:33:35.000000000 +1100
@@ -2092,4 +2092,5 @@ module_init(raid_init);
module_exit(raid_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-3"); /* RAID1 */
+MODULE_ALIAS("md-raid1");
MODULE_ALIAS("md-level-1");
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-12 11:33:35.000000000 +1100
@@ -2117,4 +2117,5 @@ module_init(raid_init);
module_exit(raid_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-9"); /* RAID10 */
+MODULE_ALIAS("md-raid10");
MODULE_ALIAS("md-level-10");
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid5.c 2005-12-12 11:33:35.000000000 +1100
@@ -2240,5 +2240,7 @@ module_init(raid5_init);
module_exit(raid5_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-4"); /* RAID5 */
+MODULE_ALIAS("md-raid5");
+MODULE_ALIAS("md-raid4");
MODULE_ALIAS("md-level-5");
MODULE_ALIAS("md-level-4");
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-12-12 11:33:35.000000000 +1100
@@ -2341,4 +2341,5 @@ module_init(raid6_init);
module_exit(raid6_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-8"); /* RAID6 */
+MODULE_ALIAS("md-raid6");
MODULE_ALIAS("md-level-6");
diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2005-12-12 11:30:59.000000000 +1100
+++ ./include/linux/raid/md_k.h 2005-12-12 11:33:35.000000000 +1100
@@ -119,6 +119,7 @@ struct mddev_s
int chunk_size;
time_t ctime, utime;
int level, layout;
+ char clevel[16];
int raid_disks;
int max_disks;
sector_t size; /* used size of component devices */
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 013 of 20] Count corrected read errors per drive.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (11 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 012 of 20] Allow array level to be set textually via sysfs NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 10:07 ` Andrew Morton
2005-12-12 3:15 ` [PATCH md 014 of 20] Allow md/raid_disks to be settable NeilBrown
` (6 subsequent siblings)
19 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Store this total in superblock (As appropriate),
and make it available to userspace via sysfs.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 11 +++++++++++
./drivers/md/md.c | 27 ++++++++++++++++++++++++++-
./drivers/md/raid1.c | 2 ++
./drivers/md/raid10.c | 11 ++++++++---
./drivers/md/raid5.c | 3 +++
./drivers/md/raid6main.c | 3 +++
./include/linux/raid/md_k.h | 4 ++++
7 files changed, 57 insertions(+), 4 deletions(-)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:33:35.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:34:21.000000000 +1100
@@ -222,6 +222,17 @@ Each directory contains:
of being recoverred to
This list make grow in future.
+ errors
+ An approximate count of read errors that have been detected on
+ this device but have not caused the device to be evicted from
+ the array (either because they were corrected or because they
+ happened while the array was read-only). When using version-1
+ metadata, this value persists across restarts of the array.
+
+ This value can be written while assembling an array thus
+ providing an ongoing count for arrays with metadata managed by
+ userspace.
+
An active md device will also contain and entry for each active device
in the array. These are named
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:33:56.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:34:37.000000000 +1100
@@ -1000,6 +1000,7 @@ static int super_1_load(mdk_rdev_t *rdev
}
rdev->preferred_minor = 0xffff;
rdev->data_offset = le64_to_cpu(sb->data_offset);
+ atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read));
rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256;
bmask = queue_hardsect_size(rdev->bdev->bd_disk->queue)-1;
@@ -1139,6 +1140,8 @@ static void super_1_sync(mddev_t *mddev,
else
sb->resync_offset = cpu_to_le64(0);
+ sb->cnt_corrected_read = atomic_read(&rdev->corrected_errors);
+
if (mddev->bitmap && mddev->bitmap_file == NULL) {
sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_offset);
sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET);
@@ -1592,9 +1595,30 @@ super_show(mdk_rdev_t *rdev, char *page)
}
static struct rdev_sysfs_entry rdev_super = __ATTR_RO(super);
+static ssize_t
+errors_show(mdk_rdev_t *rdev, char *page)
+{
+ return sprintf(page, "%d\n", atomic_read(&rdev->corrected_errors));
+}
+
+static ssize_t
+errors_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+ char *e;
+ unsigned long n = simple_strtoul(buf, &e, 10);
+ if (*buf && (*e == 0 || *e == '\n')) {
+ atomic_set(&rdev->corrected_errors, n);
+ return len;
+ }
+ return -EINVAL;
+}
+static struct rdev_sysfs_entry rdev_errors =
+__ATTR(errors, 0644, errors_show, errors_store);
+
static struct attribute *rdev_default_attrs[] = {
&rdev_state.attr,
&rdev_super.attr,
+ &rdev_errors.attr,
NULL,
};
static ssize_t
@@ -1674,6 +1698,7 @@ static mdk_rdev_t *md_import_device(dev_
rdev->data_offset = 0;
atomic_set(&rdev->nr_pending, 0);
atomic_set(&rdev->read_errors, 0);
+ atomic_set(&rdev->corrected_errors, 0);
size = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
if (!size) {
@@ -4721,7 +4746,7 @@ static int set_ro(const char *val, struc
int num = simple_strtoul(val, &e, 10);
if (*val && (*e == '\0' || *e == '\n')) {
start_readonly = num;
- return 0;;
+ return 0;
}
return -EINVAL;
}
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-12 11:34:21.000000000 +1100
@@ -1265,6 +1265,7 @@ static void sync_request_write(mddev_t *
if (r1_bio->bios[d]->bi_end_io != end_sync_read)
continue;
rdev = conf->mirrors[d].rdev;
+ atomic_add(s, &rdev->corrected_errors);
if (sync_page_io(rdev->bdev,
sect + rdev->data_offset,
s<<9,
@@ -1463,6 +1464,7 @@ static void raid1d(mddev_t *mddev)
d = conf->raid_disks;
d--;
rdev = conf->mirrors[d].rdev;
+ atomic_add(s, &rdev->corrected_errors);
if (rdev &&
test_bit(In_sync, &rdev->flags)) {
if (sync_page_io(rdev->bdev,
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-12 11:34:21.000000000 +1100
@@ -1122,9 +1122,13 @@ static int end_sync_read(struct bio *bio
if (test_bit(BIO_UPTODATE, &bio->bi_flags))
set_bit(R10BIO_Uptodate, &r10_bio->state);
- else if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
- md_error(r10_bio->mddev,
- conf->mirrors[d].rdev);
+ else {
+ atomic_add(r10_bio->sectors,
+ &conf->mirrors[d].rdev->corrected_errors);
+ if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
+ md_error(r10_bio->mddev,
+ conf->mirrors[d].rdev);
+ }
/* for reconstruct, we always reschedule after a read.
* for resync, only after all reads
@@ -1430,6 +1434,7 @@ static void raid10d(mddev_t *mddev)
sl--;
d = r10_bio->devs[sl].devnum;
rdev = conf->mirrors[d].rdev;
+ atomic_add(s, &rdev->corrected_errors);
if (rdev &&
test_bit(In_sync, &rdev->flags)) {
if (sync_page_io(rdev->bdev,
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid5.c 2005-12-12 11:34:21.000000000 +1100
@@ -1400,6 +1400,9 @@ static void handle_stripe(struct stripe_
bi->bi_io_vec[0].bv_offset = 0;
bi->bi_size = STRIPE_SIZE;
bi->bi_next = NULL;
+ if (rw == WRITE &&
+ test_bit(R5_ReWrite, &sh->dev[i].flags))
+ atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
generic_make_request(bi);
} else {
if (rw == 1)
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-12-12 11:33:35.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-12-12 11:34:21.000000000 +1100
@@ -1562,6 +1562,9 @@ static void handle_stripe(struct stripe_
bi->bi_io_vec[0].bv_offset = 0;
bi->bi_size = STRIPE_SIZE;
bi->bi_next = NULL;
+ if (rw == WRITE &&
+ test_bit(R5_ReWrite, &sh->dev[i].flags))
+ atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
generic_make_request(bi);
} else {
if (rw == 1)
diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2005-12-12 11:33:35.000000000 +1100
+++ ./include/linux/raid/md_k.h 2005-12-12 11:34:21.000000000 +1100
@@ -95,6 +95,10 @@ struct mdk_rdev_s
atomic_t read_errors; /* number of consecutive read errors that
* we have tried to ignore.
*/
+ atomic_t corrected_errors; /* number of corrected read errors,
+ * for reporting to userspace and storing
+ * in superblock.
+ */
};
struct mddev_s
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 014 of 20] Allow md/raid_disks to be settable.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (12 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays NeilBrown
` (5 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
If array is active, try to reshape, else just set the value.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 3 +
./drivers/md/md.c | 74 +++++++++++++++++++++++++++++++++----------------
2 files changed, 54 insertions(+), 23 deletions(-)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:34:21.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:35:01.000000000 +1100
@@ -165,6 +165,9 @@ All md devices contain:
in a fully functional array. If this is not yet known, the file
will be empty. If an array is being resized (not currently
possible) this will contain the larger of the old and new sizes.
+ Some raid level (RAID1) allow this value to be set while the
+ array is active. This will reconfigure the array. Otherwise
+ it can only be set while assembling an array.
chunk_size
This is the size if bytes for 'chunks' and is only relevant to
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:34:37.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:35:53.000000000 +1100
@@ -1843,7 +1843,27 @@ raid_disks_show(mddev_t *mddev, char *pa
return sprintf(page, "%d\n", mddev->raid_disks);
}
-static struct md_sysfs_entry md_raid_disks = __ATTR_RO(raid_disks);
+static int update_raid_disks(mddev_t *mddev, int raid_disks);
+
+static ssize_t
+raid_disks_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ /* can only set raid_disks if array is not yet active */
+ char *e;
+ int rv = 0;
+ unsigned long n = simple_strtoul(buf, &e, 10);
+
+ if (!*buf || (*e && *e != '\n'))
+ return -EINVAL;
+
+ if (mddev->pers)
+ rv = update_raid_disks(mddev, n);
+ else
+ mddev->raid_disks = n;
+ return rv ? rv : len;
+}
+static struct md_sysfs_entry md_raid_disks =
+__ATTR(raid_disks, 0644, raid_disks_show, raid_disks_store);
static ssize_t
chunk_size_show(mddev_t *mddev, char *page)
@@ -3201,6 +3221,33 @@ static int update_size(mddev_t *mddev, u
return rv;
}
+static int update_raid_disks(mddev_t *mddev, int raid_disks)
+{
+ int rv;
+ /* change the number of raid disks */
+ if (mddev->pers->reshape == NULL)
+ return -EINVAL;
+ if (raid_disks <= 0 ||
+ raid_disks >= mddev->max_disks)
+ return -EINVAL;
+ if (mddev->sync_thread)
+ return -EBUSY;
+ rv = mddev->pers->reshape(mddev, raid_disks);
+ if (!rv) {
+ struct block_device *bdev;
+
+ bdev = bdget_disk(mddev->gendisk, 0);
+ if (bdev) {
+ down(&bdev->bd_inode->i_sem);
+ i_size_write(bdev->bd_inode, mddev->array_size << 10);
+ up(&bdev->bd_inode->i_sem);
+ bdput(bdev);
+ }
+ }
+ return rv;
+}
+
+
/*
* update_array_info is used to change the configuration of an
* on-line array.
@@ -3252,28 +3299,9 @@ static int update_array_info(mddev_t *md
if (mddev->size != info->size)
rv = update_size(mddev, info->size);
- if (mddev->raid_disks != info->raid_disks) {
- /* change the number of raid disks */
- if (mddev->pers->reshape == NULL)
- return -EINVAL;
- if (info->raid_disks <= 0 ||
- info->raid_disks >= mddev->max_disks)
- return -EINVAL;
- if (mddev->sync_thread)
- return -EBUSY;
- rv = mddev->pers->reshape(mddev, info->raid_disks);
- if (!rv) {
- struct block_device *bdev;
-
- bdev = bdget_disk(mddev->gendisk, 0);
- if (bdev) {
- down(&bdev->bd_inode->i_sem);
- i_size_write(bdev->bd_inode, mddev->array_size << 10);
- up(&bdev->bd_inode->i_sem);
- bdput(bdev);
- }
- }
- }
+ if (mddev->raid_disks != info->raid_disks)
+ rv = update_raid_disks(mddev, info->raid_disks);
+
if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT)) {
if (mddev->pers->quiesce == NULL)
return -EINVAL;
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (13 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 014 of 20] Allow md/raid_disks to be settable NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 016 of 20] Expose device slot information via sysfs NeilBrown
` (4 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Move the checks - that dev size is never less than array size - into
bind_rdev_to_array to make sure it always happens properly (there is
one place where currently it doesn't).
Also reject any superblock which claims an array size smaller than the
device in question can hold.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 41 +++++++++++++++++++++++------------------
1 file changed, 23 insertions(+), 18 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:35:53.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:36:25.000000000 +1100
@@ -695,6 +695,10 @@ static int super_90_load(mdk_rdev_t *rde
}
rdev->size = calc_dev_size(rdev, sb->chunk_size);
+ if (rdev->size < sb->size && sb->level > 1)
+ /* "this cannot possibly happen" ... */
+ ret = -EINVAL;
+
abort:
return ret;
}
@@ -1039,6 +1043,9 @@ static int super_1_load(mdk_rdev_t *rdev
rdev->size = le64_to_cpu(sb->data_size)/2;
if (le32_to_cpu(sb->chunksize))
rdev->size &= ~((sector_t)le32_to_cpu(sb->chunksize)/2 - 1);
+
+ if (le32_to_cpu(sb->size) > rdev->size*2)
+ return -EINVAL;
return 0;
}
@@ -1224,6 +1231,14 @@ static int bind_rdev_to_array(mdk_rdev_t
MD_BUG();
return -EINVAL;
}
+ /* make sure rdev->size exceeds mddev->size */
+ if (rdev->size && (mddev->size == 0 || rdev->size < mddev->size)) {
+ if (mddev->pers)
+ /* Cannot change size, so fail */
+ return -ENOSPC;
+ else
+ mddev->size = rdev->size;
+ }
same_pdev = match_dev_unit(mddev, rdev);
if (same_pdev)
printk(KERN_WARNING
@@ -2898,12 +2913,6 @@ static int add_new_disk(mddev_t * mddev,
if (info->state & (1<<MD_DISK_WRITEMOSTLY))
set_bit(WriteMostly, &rdev->flags);
- err = bind_rdev_to_array(rdev, mddev);
- if (err) {
- export_rdev(rdev);
- return err;
- }
-
if (!mddev->persistent) {
printk(KERN_INFO "md: nonpersistent superblock ...\n");
rdev->sb_offset = rdev->bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
@@ -2911,8 +2920,11 @@ static int add_new_disk(mddev_t * mddev,
rdev->sb_offset = calc_dev_sboffset(rdev->bdev);
rdev->size = calc_dev_size(rdev, mddev->chunk_size);
- if (!mddev->size || (mddev->size > rdev->size))
- mddev->size = rdev->size;
+ err = bind_rdev_to_array(rdev, mddev);
+ if (err) {
+ export_rdev(rdev);
+ return err;
+ }
}
return 0;
@@ -2984,15 +2996,6 @@ static int hot_add_disk(mddev_t * mddev,
size = calc_dev_size(rdev, mddev->chunk_size);
rdev->size = size;
- if (size < mddev->size) {
- printk(KERN_WARNING
- "%s: disk size %llu blocks < array size %llu\n",
- mdname(mddev), (unsigned long long)size,
- (unsigned long long)mddev->size);
- err = -ENOSPC;
- goto abort_export;
- }
-
if (test_bit(Faulty, &rdev->flags)) {
printk(KERN_WARNING
"md: can not hot-add faulty %s disk to %s!\n",
@@ -3002,7 +3005,9 @@ static int hot_add_disk(mddev_t * mddev,
}
clear_bit(In_sync, &rdev->flags);
rdev->desc_nr = -1;
- bind_rdev_to_array(rdev, mddev);
+ err = bind_rdev_to_array(rdev, mddev);
+ if (err)
+ goto abort_export;
/*
* The rest should better be atomic, we can have disk failures
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 016 of 20] Expose device slot information via sysfs
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (14 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 017 of 20] Export rdev->data_offset " NeilBrown
` (3 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
This the role that a device has in an array can be viewed
and set.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 8 ++++++++
./drivers/md/md.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:35:01.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:36:55.000000000 +1100
@@ -236,6 +236,14 @@ Each directory contains:
providing an ongoing count for arrays with metadata managed by
userspace.
+ slot
+ This gives the role that the device has in the array. It will
+ either be 'none' if the device is not active in the array
+ (i.e. is a spare or has failed) or an integer less than the
+ 'raid_disks' number for the array indicating which possition
+ it currently fills. This can only be set while assembling an
+ array. A device for which this is set is assumed to be working.
+
An active md device will also contain and entry for each active device
in the array. These are named
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:36:25.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:37:12.000000000 +1100
@@ -1630,10 +1630,45 @@ errors_store(mdk_rdev_t *rdev, const cha
static struct rdev_sysfs_entry rdev_errors =
__ATTR(errors, 0644, errors_show, errors_store);
+static ssize_t
+slot_show(mdk_rdev_t *rdev, char *page)
+{
+ if (rdev->raid_disk < 0)
+ return sprintf(page, "none\n");
+ else
+ return sprintf(page, "%d\n", rdev->raid_disk);
+}
+
+static ssize_t
+slot_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+ char *e;
+ int slot = simple_strtoul(buf, &e, 10);
+ if (strncmp(buf, "none", 4)==0)
+ slot = -1;
+ else if (e==buf || (*e && *e!= '\n'))
+ return -EINVAL;
+ if (rdev->mddev->pers)
+ /* Cannot set slot in active array (yet) */
+ return -EBUSY;
+ if (slot >= rdev->mddev->raid_disks)
+ return -ENOSPC;
+ rdev->raid_disk = slot;
+ /* assume it is working */
+ rdev->flags = 0;
+ set_bit(In_sync, &rdev->flags);
+ return len;
+}
+
+
+static struct rdev_sysfs_entry rdev_slot =
+__ATTR(slot, 0644, slot_show, slot_store);
+
static struct attribute *rdev_default_attrs[] = {
&rdev_state.attr,
&rdev_super.attr,
&rdev_errors.attr,
+ &rdev_slot.attr,
NULL,
};
static ssize_t
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 017 of 20] Export rdev->data_offset via sysfs
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (15 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 016 of 20] Expose device slot information via sysfs NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 018 of 20] Allow available size of component devices to be set " NeilBrown
` (2 subsequent siblings)
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 6 ++++++
./drivers/md/md.c | 23 +++++++++++++++++++++++
2 files changed, 29 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:36:55.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:37:32.000000000 +1100
@@ -244,6 +244,12 @@ Each directory contains:
it currently fills. This can only be set while assembling an
array. A device for which this is set is assumed to be working.
+ offset
+ This gives the location in the device (in sectors from the
+ start) where data from the array will be stored. Any part of
+ the device before this offset us not touched, unless it is
+ used for storing metadata (Formats 1.1 and 1.2).
+
An active md device will also contain and entry for each active device
in the array. These are named
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:37:12.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:37:48.000000000 +1100
@@ -1664,11 +1664,34 @@ slot_store(mdk_rdev_t *rdev, const char
static struct rdev_sysfs_entry rdev_slot =
__ATTR(slot, 0644, slot_show, slot_store);
+static ssize_t
+offset_show(mdk_rdev_t *rdev, char *page)
+{
+ return sprintf(page, "%llu\n", rdev->data_offset);
+}
+
+static ssize_t
+offset_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+ char *e;
+ unsigned long long offset = simple_strtoull(buf, &e, 10);
+ if (e==buf || (*e && *e != '\n'))
+ return -EINVAL;
+ if (rdev->mddev->pers)
+ return -EBUSY;
+ rdev->data_offset = offset;
+ return len;
+}
+
+static struct rdev_sysfs_entry rdev_offset =
+__ATTR(offset, 0644, offset_show, offset_store);
+
static struct attribute *rdev_default_attrs[] = {
&rdev_state.attr,
&rdev_super.attr,
&rdev_errors.attr,
&rdev_slot.attr,
+ &rdev_offset.attr,
NULL,
};
static ssize_t
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 018 of 20] Allow available size of component devices to be set via sysfs.
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (16 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 017 of 20] Export rdev->data_offset " NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 019 of 20] Support adding new devices to md arrays " NeilBrown
2005-12-12 3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 7 +++++++
./drivers/md/md.c | 25 +++++++++++++++++++++++++
2 files changed, 32 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:37:32.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:38:02.000000000 +1100
@@ -250,6 +250,13 @@ Each directory contains:
the device before this offset us not touched, unless it is
used for storing metadata (Formats 1.1 and 1.2).
+ size
+ The amount of the device, after the offset, that can be used
+ for storage of data. This will normally be the same as the
+ component_size. This can be written while assembling an
+ array. If a value less than the current component_size is
+ written, component_size will be reduced to this value.
+
An active md device will also contain and entry for each active device
in the array. These are named
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:37:48.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:38:30.000000000 +1100
@@ -1686,12 +1686,37 @@ offset_store(mdk_rdev_t *rdev, const cha
static struct rdev_sysfs_entry rdev_offset =
__ATTR(offset, 0644, offset_show, offset_store);
+static ssize_t
+rdev_size_show(mdk_rdev_t *rdev, char *page)
+{
+ return sprintf(page, "%llu\n", rdev->size);
+}
+
+static ssize_t
+rdev_size_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+ char *e;
+ unsigned long long size = simple_strtoull(buf, &e, 10);
+ if (e==buf || (*e && *e != '\n'))
+ return -EINVAL;
+ if (rdev->mddev->pers)
+ return -EBUSY;
+ rdev->size = size;
+ if (size < rdev->mddev->size || rdev->mddev->size == 0)
+ rdev->mddev->size = size;
+ return len;
+}
+
+static struct rdev_sysfs_entry rdev_size =
+__ATTR(size, 0644, rdev_size_show, rdev_size_store);
+
static struct attribute *rdev_default_attrs[] = {
&rdev_state.attr,
&rdev_super.attr,
&rdev_errors.attr,
&rdev_slot.attr,
&rdev_offset.attr,
+ &rdev_size.attr,
NULL,
};
static ssize_t
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 019 of 20] Support adding new devices to md arrays via sysfs
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (17 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 018 of 20] Allow available size of component devices to be set " NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
19 siblings, 0 replies; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Writing major:minor to md/new_dev will bind that device to the
array.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 8 ++++++
./drivers/md/md.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 68 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 11:57:28.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 11:57:31.000000000 +1100
@@ -200,6 +200,14 @@ All md devices contain:
This can be written only while the array is being assembled, not
after it is started.
+ new_dev
+ This file can be written but not read. The value written should
+ be a block device number as major:minor. e.g. 8:0
+ This will cause that device to be attached to the array, if it is
+ available. It will then appear at md/dev-XXX (depending on the
+ name of the device) and further configuration is then possible.
+
+
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
dev-XXX
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 11:57:28.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 11:57:49.000000000 +1100
@@ -1987,6 +1987,65 @@ chunk_size_store(mddev_t *mddev, const c
static struct md_sysfs_entry md_chunk_size =
__ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
+static ssize_t
+null_show(mddev_t *mddev, char *page)
+{
+ return -EINVAL;
+}
+
+static ssize_t
+new_dev_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ /* buf must be %d:%d\n? giving major and minor numbers */
+ /* The new device is added to the array.
+ * If the array has a persistent superblock, we read the
+ * superblock to initialise info and check validity.
+ * Otherwise, only checking done is that in bind_rdev_to_array,
+ * which mainly checks size.
+ */
+ char *e;
+ int major = simple_strtoul(buf, &e, 10);
+ int minor;
+ dev_t dev;
+ mdk_rdev_t *rdev;
+ int err;
+
+ if (!*buf || *e != ':' || !e[1] || e[1] == '\n')
+ return -EINVAL;
+ minor = simple_strtoul(e+1, &e, 10);
+ if (*e && *e != '\n')
+ return -EINVAL;
+ dev = MKDEV(major, minor);
+ if (major != MAJOR(dev) ||
+ minor != MINOR(dev))
+ return -EOVERFLOW;
+
+
+ if (mddev->persistent) {
+ rdev = md_import_device(dev, mddev->major_version,
+ mddev->minor_version);
+ if (!IS_ERR(rdev) && !list_empty(&mddev->disks)) {
+ mdk_rdev_t *rdev0 = list_entry(mddev->disks.next,
+ mdk_rdev_t, same_set);
+ err = super_types[mddev->major_version]
+ .load_super(rdev, rdev0, mddev->minor_version);
+ if (err < 0)
+ goto out;
+ }
+ } else
+ rdev = md_import_device(dev, -1, -1);
+
+ if (IS_ERR(rdev))
+ return PTR_ERR(rdev);
+ err = bind_rdev_to_array(rdev, mddev);
+ out:
+ if (err)
+ export_rdev(rdev);
+ return err ? err : len;
+}
+
+static struct md_sysfs_entry md_new_device =
+__ATTR(new_dev, 0200, null_show, new_dev_store);
static ssize_t
size_show(mddev_t *mddev, char *page)
@@ -2144,6 +2203,7 @@ static struct attribute *md_default_attr
&md_chunk_size.attr,
&md_size.attr,
&md_metadata.attr,
+ &md_new_device.attr,
NULL,
};
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH md 020 of 20] Allow sync-speed to be controlled per-device
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
` (18 preceding siblings ...)
2005-12-12 3:15 ` [PATCH md 019 of 20] Support adding new devices to md arrays " NeilBrown
@ 2005-12-12 3:15 ` NeilBrown
2005-12-12 11:30 ` Jeff Breidenbach
19 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2005-12-12 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Also export current (average) speed and status in sysfs.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 22 ++++++++
./drivers/md/md.c | 110 ++++++++++++++++++++++++++++++++++++++++++--
./include/linux/raid/md_k.h | 4 +
3 files changed, 131 insertions(+), 5 deletions(-)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2005-12-12 12:12:50.000000000 +1100
+++ ./Documentation/md.txt 2005-12-12 12:12:53.000000000 +1100
@@ -207,6 +207,28 @@ All md devices contain:
available. It will then appear at md/dev-XXX (depending on the
name of the device) and further configuration is then possible.
+ sync_speed_min
+ sync_speed_max
+ This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
+ however they only apply to the particular array.
+ If no value has been written to these, of if the word 'system'
+ is written, then the system-wide value is used. If a value,
+ in kibibytes-per-second is written, then it is used.
+ When the files are read, they show the currently active value
+ followed by "(local)" or "(system)" depending on whether it is
+ a locally set or system-wide value.
+
+ sync_completed
+ This shows the number of sectors that have been completed of
+ whatever the current sync_action is, followed by the number of
+ sectors in total that could need to be processed. The two
+ numbers are separated by a '/' thus effectively showing one
+ value, a fraction of the process that is complete.
+
+ sync_speed
+ This shows the current actual speed, in K/sec, of the current
+ sync_action. It is averaged over the last 30 seconds.
+
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-12 12:12:50.000000000 +1100
+++ ./drivers/md/md.c 2005-12-12 12:12:57.000000000 +1100
@@ -81,10 +81,22 @@ static DEFINE_SPINLOCK(pers_lock);
* idle IO detection.
*
* you can change it via /proc/sys/dev/raid/speed_limit_min and _max.
+ * or /sys/block/mdX/md/sync_speed_{min,max}
*/
static int sysctl_speed_limit_min = 1000;
static int sysctl_speed_limit_max = 200000;
+static inline int speed_min(mddev_t *mddev)
+{
+ return mddev->sync_speed_min ?
+ mddev->sync_speed_min : sysctl_speed_limit_min;
+}
+
+static inline int speed_max(mddev_t *mddev)
+{
+ return mddev->sync_speed_max ?
+ mddev->sync_speed_max : sysctl_speed_limit_max;
+}
static struct ctl_table_header *raid_table_header;
@@ -2197,6 +2209,90 @@ md_scan_mode = __ATTR(sync_action, S_IRU
static struct md_sysfs_entry
md_mismatches = __ATTR_RO(mismatch_cnt);
+static ssize_t
+sync_min_show(mddev_t *mddev, char *page)
+{
+ return sprintf(page, "%d (%s)\n", speed_min(mddev),
+ mddev->sync_speed_min ? "local": "system");
+}
+
+static ssize_t
+sync_min_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ int min;
+ char *e;
+ if (strncmp(buf, "system", 6)==0) {
+ mddev->sync_speed_min = 0;
+ return len;
+ }
+ min = simple_strtoul(buf, &e, 10);
+ if (buf == e || (*e && *e != '\n') || min <= 0)
+ return -EINVAL;
+ mddev->sync_speed_min = min;
+ return len;
+}
+
+static struct md_sysfs_entry md_sync_min =
+__ATTR(sync_speed_min, S_IRUGO|S_IWUSR, sync_min_show, sync_min_store);
+
+static ssize_t
+sync_max_show(mddev_t *mddev, char *page)
+{
+ return sprintf(page, "%d (%s)\n", speed_max(mddev),
+ mddev->sync_speed_max ? "local": "system");
+}
+
+static ssize_t
+sync_max_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ int max;
+ char *e;
+ if (strncmp(buf, "system", 6)==0) {
+ mddev->sync_speed_max = 0;
+ return len;
+ }
+ max = simple_strtoul(buf, &e, 10);
+ if (buf == e || (*e && *e != '\n') || max <= 0)
+ return -EINVAL;
+ mddev->sync_speed_max = max;
+ return len;
+}
+
+static struct md_sysfs_entry md_sync_max =
+__ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store);
+
+
+static ssize_t
+sync_speed_show(mddev_t *mddev, char *page)
+{
+ unsigned long resync, dt, db;
+ resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
+ dt = ((jiffies - mddev->resync_mark) / HZ);
+ if (!dt) dt++;
+ db = resync - (mddev->resync_mark_cnt);
+ return sprintf(page, "%ld\n", db/dt/2); /* K/sec */
+}
+
+static struct md_sysfs_entry
+md_sync_speed = __ATTR_RO(sync_speed);
+
+static ssize_t
+sync_completed_show(mddev_t *mddev, char *page)
+{
+ unsigned long max_blocks, resync;
+
+ if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
+ max_blocks = mddev->resync_max_sectors;
+ else
+ max_blocks = mddev->size << 1;
+
+ resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
+ return sprintf(page, "%lu / %lu\n", resync, max_blocks);
+}
+
+static struct md_sysfs_entry
+md_sync_completed = __ATTR_RO(sync_completed);
+
static struct attribute *md_default_attrs[] = {
&md_level.attr,
&md_raid_disks.attr,
@@ -2210,6 +2306,10 @@ static struct attribute *md_default_attr
static struct attribute *md_redundancy_attrs[] = {
&md_scan_mode.attr,
&md_mismatches.attr,
+ &md_sync_min.attr,
+ &md_sync_max.attr,
+ &md_sync_speed.attr,
+ &md_sync_completed.attr,
NULL,
};
static struct attribute_group md_redundancy_group = {
@@ -4425,10 +4525,10 @@ static void md_do_sync(mddev_t *mddev)
printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev));
printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:"
- " %d KB/sec/disc.\n", sysctl_speed_limit_min);
+ " %d KB/sec/disc.\n", speed_min(mddev));
printk(KERN_INFO "md: using maximum available idle IO bandwidth "
"(but not more than %d KB/sec) for reconstruction.\n",
- sysctl_speed_limit_max);
+ speed_max(mddev));
is_mddev_idle(mddev); /* this also initializes IO event counters */
/* we don't use the checkpoint if there's a bitmap */
@@ -4469,7 +4569,7 @@ static void md_do_sync(mddev_t *mddev)
skipped = 0;
sectors = mddev->pers->sync_request(mddev, j, &skipped,
- currspeed < sysctl_speed_limit_min);
+ currspeed < speed_min(mddev));
if (sectors == 0) {
set_bit(MD_RECOVERY_ERR, &mddev->recovery);
goto out;
@@ -4534,8 +4634,8 @@ static void md_do_sync(mddev_t *mddev)
currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
/((jiffies-mddev->resync_mark)/HZ +1) +1;
- if (currspeed > sysctl_speed_limit_min) {
- if ((currspeed > sysctl_speed_limit_max) ||
+ if (currspeed > speed_min(mddev)) {
+ if ((currspeed > speed_max(mddev)) ||
!is_mddev_idle(mddev)) {
msleep(500);
goto repeat;
diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2005-12-12 12:12:50.000000000 +1100
+++ ./include/linux/raid/md_k.h 2005-12-12 12:12:53.000000000 +1100
@@ -143,6 +143,10 @@ struct mddev_s
sector_t resync_mismatches; /* count of sectors where
* parity/replica mismatch found
*/
+ /* if zero, use the system-wide default */
+ int sync_speed_min;
+ int sync_speed_max;
+
int ok_start_degraded;
/* recovery/resync flags
* NEEDED: we might need to start a resync/recover
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH md 013 of 20] Count corrected read errors per drive.
2005-12-12 3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
@ 2005-12-12 10:07 ` Andrew Morton
0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2005-12-12 10:07 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown <neilb@suse.de> wrote:
>
> + errors
> + An approximate count of read errors that have been detected on
> + this device but have not caused the device to be evicted from
> + the array (either because they were corrected or because they
> + happened while the array was read-only). When using version-1
> + metadata, this value persists across restarts of the array.
Looks funny. Mixture of tabs and spaces I guess.
<fixes it>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH md 020 of 20] Allow sync-speed to be controlled per-device
2005-12-12 3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
@ 2005-12-12 11:30 ` Jeff Breidenbach
0 siblings, 0 replies; 23+ messages in thread
From: Jeff Breidenbach @ 2005-12-12 11:30 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Typo.
of if -> or if
On Sun, 11 Dec 2005 7:16 pm, NeilBrown wrote:
>
> Also export current (average) speed and status in sysfs.
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./Documentation/md.txt | 22 ++++++++
> ./drivers/md/md.c | 110
> ++++++++++++++++++++++++++++++++++++++++++--
> ./include/linux/raid/md_k.h | 4 +
> 3 files changed, 131 insertions(+), 5 deletions(-)
>
> diff ./Documentation/md.txt~current~ ./Documentation/md.txt
> --- ./Documentation/md.txt~current~ 2005-12-12 12:12:50.000000000 +1100
> +++ ./Documentation/md.txt 2005-12-12 12:12:53.000000000 +1100
> @@ -207,6 +207,28 @@ All md devices contain:
> available. It will then appear at md/dev-XXX (depending on the
> name of the device) and further configuration is then possible.
>
> + sync_speed_min
> + sync_speed_max
> + This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
> + however they only apply to the particular array.
> + If no value has been written to these, of if the word 'system'
> + is written, then the system-wide value is used. If a value,
> + in kibibytes-per-second is written, then it is used.
> + When the files are read, they show the currently active value
> + followed by "(local)" or "(system)" depending on whether it is
> + a locally set or system-wide value.
> +
> + sync_completed
> + This shows the number of sectors that have been completed of
> + whatever the current sync_action is, followed by the number of
> + sectors in total that could need to be processed. The two
> + numbers are separated by a '/' thus effectively showing one
> + value, a fraction of the process that is complete.
> +
> + sync_speed
> + This shows the current actual speed, in K/sec, of the current
> + sync_action. It is averaged over the last 30 seconds.
> +
>
> As component devices are added to an md array, they appear in the 'md'
> directory as new directories named
>
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> --- ./drivers/md/md.c~current~ 2005-12-12 12:12:50.000000000 +1100
> +++ ./drivers/md/md.c 2005-12-12 12:12:57.000000000 +1100
> @@ -81,10 +81,22 @@ static DEFINE_SPINLOCK(pers_lock);
> * idle IO detection.
> *
> * you can change it via /proc/sys/dev/raid/speed_limit_min and _max.
> + * or /sys/block/mdX/md/sync_speed_{min,max}
> */
>
> static int sysctl_speed_limit_min = 1000;
> static int sysctl_speed_limit_max = 200000;
> +static inline int speed_min(mddev_t *mddev)
> +{
> + return mddev->sync_speed_min ?
> + mddev->sync_speed_min : sysctl_speed_limit_min;
> +}
> +
> +static inline int speed_max(mddev_t *mddev)
> +{
> + return mddev->sync_speed_max ?
> + mddev->sync_speed_max : sysctl_speed_limit_max;
> +}
>
> static struct ctl_table_header *raid_table_header;
>
> @@ -2197,6 +2209,90 @@ md_scan_mode = __ATTR(sync_action, S_IRU
> static struct md_sysfs_entry
> md_mismatches = __ATTR_RO(mismatch_cnt);
>
> +static ssize_t
> +sync_min_show(mddev_t *mddev, char *page)
> +{
> + return sprintf(page, "%d (%s)\n", speed_min(mddev),
> + mddev->sync_speed_min ? "local": "system");
> +}
> +
> +static ssize_t
> +sync_min_store(mddev_t *mddev, const char *buf, size_t len)
> +{
> + int min;
> + char *e;
> + if (strncmp(buf, "system", 6)==0) {
> + mddev->sync_speed_min = 0;
> + return len;
> + }
> + min = simple_strtoul(buf, &e, 10);
> + if (buf == e || (*e && *e != '\n') || min <= 0)
> + return -EINVAL;
> + mddev->sync_speed_min = min;
> + return len;
> +}
> +
> +static struct md_sysfs_entry md_sync_min =
> +__ATTR(sync_speed_min, S_IRUGO|S_IWUSR, sync_min_show,
> sync_min_store);
> +
> +static ssize_t
> +sync_max_show(mddev_t *mddev, char *page)
> +{
> + return sprintf(page, "%d (%s)\n", speed_max(mddev),
> + mddev->sync_speed_max ? "local": "system");
> +}
> +
> +static ssize_t
> +sync_max_store(mddev_t *mddev, const char *buf, size_t len)
> +{
> + int max;
> + char *e;
> + if (strncmp(buf, "system", 6)==0) {
> + mddev->sync_speed_max = 0;
> + return len;
> + }
> + max = simple_strtoul(buf, &e, 10);
> + if (buf == e || (*e && *e != '\n') || max <= 0)
> + return -EINVAL;
> + mddev->sync_speed_max = max;
> + return len;
> +}
> +
> +static struct md_sysfs_entry md_sync_max =
> +__ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show,
> sync_max_store);
> +
> +
> +static ssize_t
> +sync_speed_show(mddev_t *mddev, char *page)
> +{
> + unsigned long resync, dt, db;
> + resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
> + dt = ((jiffies - mddev->resync_mark) / HZ);
> + if (!dt) dt++;
> + db = resync - (mddev->resync_mark_cnt);
> + return sprintf(page, "%ld\n", db/dt/2); /* K/sec */
> +}
> +
> +static struct md_sysfs_entry
> +md_sync_speed = __ATTR_RO(sync_speed);
> +
> +static ssize_t
> +sync_completed_show(mddev_t *mddev, char *page)
> +{
> + unsigned long max_blocks, resync;
> +
> + if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
> + max_blocks = mddev->resync_max_sectors;
> + else
> + max_blocks = mddev->size << 1;
> +
> + resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
> + return sprintf(page, "%lu / %lu\n", resync, max_blocks);
> +}
> +
> +static struct md_sysfs_entry
> +md_sync_completed = __ATTR_RO(sync_completed);
> +
> static struct attribute *md_default_attrs[] = {
> &md_level.attr,
> &md_raid_disks.attr,
> @@ -2210,6 +2306,10 @@ static struct attribute *md_default_attr
> static struct attribute *md_redundancy_attrs[] = {
> &md_scan_mode.attr,
> &md_mismatches.attr,
> + &md_sync_min.attr,
> + &md_sync_max.attr,
> + &md_sync_speed.attr,
> + &md_sync_completed.attr,
> NULL,
> };
> static struct attribute_group md_redundancy_group = {
> @@ -4425,10 +4525,10 @@ static void md_do_sync(mddev_t *mddev)
>
> printk(KERN_INFO "md: syncing RAID array %s\n", mdname(mddev));
> printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed:"
> - " %d KB/sec/disc.\n", sysctl_speed_limit_min);
> + " %d KB/sec/disc.\n", speed_min(mddev));
> printk(KERN_INFO "md: using maximum available idle IO bandwidth "
> "(but not more than %d KB/sec) for reconstruction.\n",
> - sysctl_speed_limit_max);
> + speed_max(mddev));
>
> is_mddev_idle(mddev); /* this also initializes IO event counters */
> /* we don't use the checkpoint if there's a bitmap */
> @@ -4469,7 +4569,7 @@ static void md_do_sync(mddev_t *mddev)
>
> skipped = 0;
> sectors = mddev->pers->sync_request(mddev, j, &skipped,
> - currspeed < sysctl_speed_limit_min);
> + currspeed < speed_min(mddev));
> if (sectors == 0) {
> set_bit(MD_RECOVERY_ERR, &mddev->recovery);
> goto out;
> @@ -4534,8 +4634,8 @@ static void md_do_sync(mddev_t *mddev)
> currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
> /((jiffies-mddev->resync_mark)/HZ +1) +1;
>
> - if (currspeed > sysctl_speed_limit_min) {
> - if ((currspeed > sysctl_speed_limit_max) ||
> + if (currspeed > speed_min(mddev)) {
> + if ((currspeed > speed_max(mddev)) ||
> !is_mddev_idle(mddev)) {
> msleep(500);
> goto repeat;
>
> diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
> --- ./include/linux/raid/md_k.h~current~ 2005-12-12 12:12:50.000000000
> +1100
> +++ ./include/linux/raid/md_k.h 2005-12-12 12:12:53.000000000 +1100
> @@ -143,6 +143,10 @@ struct mddev_s
> sector_t resync_mismatches; /* count of sectors where
> * parity/replica mismatch found
> */
> + /* if zero, use the system-wide default */
> + int sync_speed_min;
> + int sync_speed_max;
> +
> int ok_start_degraded;
> /* recovery/resync flags
> * NEEDED: we might need to start a resync/recover
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--Jeff
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2005-12-12 11:30 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-12 3:10 [PATCH md 000 of 20] Introduction NeilBrown
2005-12-12 3:10 ` [PATCH md 001 of 20] Fix a use-after-free bug in raid1 NeilBrown
2005-12-12 3:10 ` [PATCH md 002 of 20] Use correct size of raid5 stripe cache when measuring how full it is NeilBrown
2005-12-12 3:10 ` [PATCH md 003 of 20] Define and use safe_put_page for md NeilBrown
2005-12-12 3:13 ` [PATCH md 004 of 20] Helper function to match commands written to sysfs files NeilBrown
2005-12-12 3:14 ` [PATCH md 005 of 20] Fix typo in comment NeilBrown
2005-12-12 3:14 ` [PATCH md 006 of 20] Make a couple of names in md.c static NeilBrown
2005-12-12 3:14 ` [PATCH md 007 of 20] Make sure bitmap updates are visible through filesystem NeilBrown
2005-12-12 3:14 ` [PATCH md 008 of 20] Fix rdev->pending counts in raid1 NeilBrown
2005-12-12 3:14 ` [PATCH md 009 of 20] Allow chunk_size to be settable through sysfs NeilBrown
2005-12-12 3:15 ` [PATCH md 010 of 20] Allow md array component size to be accessed and set via sysfs NeilBrown
2005-12-12 3:15 ` [PATCH md 011 of 20] Expose md metadata format in sysfs NeilBrown
2005-12-12 3:15 ` [PATCH md 012 of 20] Allow array level to be set textually via sysfs NeilBrown
2005-12-12 3:15 ` [PATCH md 013 of 20] Count corrected read errors per drive NeilBrown
2005-12-12 10:07 ` Andrew Morton
2005-12-12 3:15 ` [PATCH md 014 of 20] Allow md/raid_disks to be settable NeilBrown
2005-12-12 3:15 ` [PATCH md 015 of 20] Keep better track of dev/array size when assembling md arrays NeilBrown
2005-12-12 3:15 ` [PATCH md 016 of 20] Expose device slot information via sysfs NeilBrown
2005-12-12 3:15 ` [PATCH md 017 of 20] Export rdev->data_offset " NeilBrown
2005-12-12 3:15 ` [PATCH md 018 of 20] Allow available size of component devices to be set " NeilBrown
2005-12-12 3:15 ` [PATCH md 019 of 20] Support adding new devices to md arrays " NeilBrown
2005-12-12 3:15 ` [PATCH md 020 of 20] Allow sync-speed to be controlled per-device NeilBrown
2005-12-12 11:30 ` Jeff Breidenbach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).