* [PATCH md 0 of 6] Introduction
@ 2005-02-07 2:44 NeilBrown
2005-02-07 2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07 2:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Following are 6 patches for md in 2.6.11-rc2-mm2
I believe they are all suitable for inclusion in 2.6.11
- Fix silly bugs in version-1 superblock code (which currently is
not widely used)
- Fix possible Oops in raid_set_faulty (which is hard to trigger in
normal usage)
- Make md work better with devfs when it is compiled in.
- Fix a possible endless-resync-loop
- Removing the for loop in "copy_data" (raid5 and raid6) which caused
some copinging to be done multiple times (should be harmless, but is
a waste)
- Fix handling for overlapping requests to raid5 and raid6. -mm currently
has a patch which is even more broken than the original code. Adding
this patch fixes it up.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array.
2005-02-07 2:44 [PATCH md 0 of 6] Introduction NeilBrown
` (2 preceding siblings ...)
2005-02-07 2:44 ` [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync NeilBrown
@ 2005-02-07 2:44 ` NeilBrown
2005-02-07 2:44 ` [PATCH md 5 of 6] Remove extra loop from copy_data NeilBrown
2005-02-07 2:44 ` [PATCH md 3 of 6] Make md work a bit better with devfs NeilBrown
5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07 2:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
hot_add_disk and hot_remove_disk check mddev->pers before proceeding.
set_disk_faulty should too.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 3 +++
1 files changed, 3 insertions(+)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-02-07 13:31:08.000000000 +1100
+++ ./drivers/md/md.c 2005-02-07 13:32:00.000000000 +1100
@@ -2489,6 +2489,9 @@ static int set_disk_faulty(mddev_t *mdde
{
mdk_rdev_t *rdev;
+ if (mddev->pers == NULL)
+ return -ENODEV;
+
rdev = find_rdev(mddev, dev);
if (!rdev)
return -ENODEV;
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH md 3 of 6] Make md work a bit better with devfs
2005-02-07 2:44 [PATCH md 0 of 6] Introduction NeilBrown
` (4 preceding siblings ...)
2005-02-07 2:44 ` [PATCH md 5 of 6] Remove extra loop from copy_data NeilBrown
@ 2005-02-07 2:44 ` NeilBrown
5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07 2:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
- set ->devfs_name
- create initial devfs names slightly differently so
as not to conflict
- re-read partition table when an array is assembled at boot
time - not sure why this is needed, but it is.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 9 ++++++---
./init/do_mounts_md.c | 10 ++++++++++
2 files changed, 16 insertions(+), 3 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-02-07 13:32:00.000000000 +1100
+++ ./drivers/md/md.c 2005-02-07 13:32:22.000000000 +1100
@@ -1518,10 +1518,13 @@ static struct kobject *md_probe(dev_t de
}
disk->major = MAJOR(dev);
disk->first_minor = unit << shift;
- if (partitioned)
+ if (partitioned) {
sprintf(disk->disk_name, "md_d%d", unit);
- else
+ sprintf(disk->devfs_name, "md/d%d", unit);
+ } else {
sprintf(disk->disk_name, "md%d", unit);
+ sprintf(disk->devfs_name, "md/%d", unit);
+ }
disk->fops = &md_fops;
disk->private_data = mddev;
disk->queue = mddev->queue;
@@ -3773,7 +3776,7 @@ int __init md_init(void)
for (minor=0; minor < MAX_MD_DEVS; ++minor)
devfs_mk_bdev(MKDEV(mdp_major, minor<<MdpMinorShift),
S_IFBLK|S_IRUSR|S_IWUSR,
- "md/d%d", minor);
+ "md/mdp%d", minor);
register_reboot_notifier(&md_notifier);
diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
--- ./init/do_mounts_md.c~current~ 2005-02-07 13:31:05.000000000 +1100
+++ ./init/do_mounts_md.c 2005-02-07 13:32:22.000000000 +1100
@@ -232,6 +232,16 @@ static void __init md_setup_drive(void)
err = sys_ioctl(fd, RUN_ARRAY, 0);
if (err)
printk(KERN_WARNING "md: starting md%d failed\n", minor);
+ else {
+ /* reread the partition table.
+ * I (neilb) and not sure why this is needed, but I cannot
+ * boot a kernel with devfs compiled in from partitioned md
+ * array without it
+ */
+ sys_close(fd);
+ fd = sys_open(name, 0, 0);
+ sys_ioctl(fd, BLKRRPART, 0);
+ }
sys_close(fd);
}
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH md 5 of 6] Remove extra loop from copy_data
2005-02-07 2:44 [PATCH md 0 of 6] Introduction NeilBrown
` (3 preceding siblings ...)
2005-02-07 2:44 ` [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array NeilBrown
@ 2005-02-07 2:44 ` NeilBrown
2005-02-07 2:44 ` [PATCH md 3 of 6] Make md work a bit better with devfs NeilBrown
5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07 2:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
copy_data currently loops over bio's in a list, but the
caller also does the same looping, sometimes with extra work.
So remove the loop from copy_data.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/raid5.c | 72 ++++++++++++++++++++++-------------------------
./drivers/md/raid6main.c | 65 ++++++++++++++++++++----------------------
2 files changed, 65 insertions(+), 72 deletions(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-02-07 13:31:05.000000000 +1100
+++ ./drivers/md/raid5.c 2005-02-07 13:34:09.000000000 +1100
@@ -613,11 +613,10 @@ static sector_t compute_blocknr(struct s
/*
- * Copy data between a page in the stripe cache, and one or more bion
- * The page could align with the middle of the bio, or there could be
- * several bion, each with several bio_vecs, which cover part of the page
- * Multiple bion are linked together on bi_next. There may be extras
- * at the end of this list. We ignore them.
+ * Copy data between a page in the stripe cache, and a bio.
+ * There are no alignment or size guarantees between the page or the
+ * bio except that there is some overlap.
+ * All iovecs in the bio must be considered.
*/
static void copy_data(int frombio, struct bio *bio,
struct page *page,
@@ -626,41 +625,38 @@ static void copy_data(int frombio, struc
char *pa = page_address(page);
struct bio_vec *bvl;
int i;
+ int page_offset;
- for (;bio && bio->bi_sector < sector+STRIPE_SECTORS;
- bio = r5_next_bio(bio, sector) ) {
- int page_offset;
- if (bio->bi_sector >= sector)
- page_offset = (signed)(bio->bi_sector - sector) * 512;
- else
- page_offset = (signed)(sector - bio->bi_sector) * -512;
- bio_for_each_segment(bvl, bio, i) {
- int len = bio_iovec_idx(bio,i)->bv_len;
- int clen;
- int b_offset = 0;
-
- if (page_offset < 0) {
- b_offset = -page_offset;
- page_offset += b_offset;
- len -= b_offset;
- }
-
- if (len > 0 && page_offset + len > STRIPE_SIZE)
- clen = STRIPE_SIZE - page_offset;
- else clen = len;
+ if (bio->bi_sector >= sector)
+ page_offset = (signed)(bio->bi_sector - sector) * 512;
+ else
+ page_offset = (signed)(sector - bio->bi_sector) * -512;
+ bio_for_each_segment(bvl, bio, i) {
+ int len = bio_iovec_idx(bio,i)->bv_len;
+ int clen;
+ int b_offset = 0;
+
+ if (page_offset < 0) {
+ b_offset = -page_offset;
+ page_offset += b_offset;
+ len -= b_offset;
+ }
+
+ if (len > 0 && page_offset + len > STRIPE_SIZE)
+ clen = STRIPE_SIZE - page_offset;
+ else clen = len;
- if (clen > 0) {
- char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
- if (frombio)
- memcpy(pa+page_offset, ba+b_offset, clen);
- else
- memcpy(ba+b_offset, pa+page_offset, clen);
- __bio_kunmap_atomic(ba, KM_USER0);
- }
- if (clen < len) /* hit end of page */
- break;
- page_offset += len;
- }
+ if (clen > 0) {
+ char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
+ if (frombio)
+ memcpy(pa+page_offset, ba+b_offset, clen);
+ else
+ memcpy(ba+b_offset, pa+page_offset, clen);
+ __bio_kunmap_atomic(ba, KM_USER0);
+ }
+ if (clen < len) /* hit end of page */
+ break;
+ page_offset += len;
}
}
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-02-07 13:31:05.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-02-07 13:34:09.000000000 +1100
@@ -670,41 +670,38 @@ static void copy_data(int frombio, struc
char *pa = page_address(page);
struct bio_vec *bvl;
int i;
+ int page_offset;
- for (;bio && bio->bi_sector < sector+STRIPE_SECTORS;
- bio = r5_next_bio(bio, sector) ) {
- int page_offset;
- if (bio->bi_sector >= sector)
- page_offset = (signed)(bio->bi_sector - sector) * 512;
- else
- page_offset = (signed)(sector - bio->bi_sector) * -512;
- bio_for_each_segment(bvl, bio, i) {
- int len = bio_iovec_idx(bio,i)->bv_len;
- int clen;
- int b_offset = 0;
-
- if (page_offset < 0) {
- b_offset = -page_offset;
- page_offset += b_offset;
- len -= b_offset;
- }
-
- if (len > 0 && page_offset + len > STRIPE_SIZE)
- clen = STRIPE_SIZE - page_offset;
- else clen = len;
-
- if (clen > 0) {
- char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
- if (frombio)
- memcpy(pa+page_offset, ba+b_offset, clen);
- else
- memcpy(ba+b_offset, pa+page_offset, clen);
- __bio_kunmap_atomic(ba, KM_USER0);
- }
- if (clen < len) /* hit end of page */
- break;
- page_offset += len;
- }
+ if (bio->bi_sector >= sector)
+ page_offset = (signed)(bio->bi_sector - sector) * 512;
+ else
+ page_offset = (signed)(sector - bio->bi_sector) * -512;
+ bio_for_each_segment(bvl, bio, i) {
+ int len = bio_iovec_idx(bio,i)->bv_len;
+ int clen;
+ int b_offset = 0;
+
+ if (page_offset < 0) {
+ b_offset = -page_offset;
+ page_offset += b_offset;
+ len -= b_offset;
+ }
+
+ if (len > 0 && page_offset + len > STRIPE_SIZE)
+ clen = STRIPE_SIZE - page_offset;
+ else clen = len;
+
+ if (clen > 0) {
+ char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
+ if (frombio)
+ memcpy(pa+page_offset, ba+b_offset, clen);
+ else
+ memcpy(ba+b_offset, pa+page_offset, clen);
+ __bio_kunmap_atomic(ba, KM_USER0);
+ }
+ if (clen < len) /* hit end of page */
+ break;
+ page_offset += len;
}
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
2005-02-07 2:44 [PATCH md 0 of 6] Introduction NeilBrown
@ 2005-02-07 2:44 ` NeilBrown
2005-02-08 2:24 ` Andrew Morton
2005-02-07 2:44 ` [PATCH md 1 of 6] Fix problems with verion-1 superblock code NeilBrown
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2005-02-07 2:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
If we detect an overlap, we set a flag and wait for a wakeup.
When requests are handled, if the flag was set, we perform the
wakeup.
Note that the code currently in -mm is badly broken. With this
patch applied, it passes tests the use O_DIRECT to cause lots
of overlapping requests.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/raid5.c | 52 ++++++++++++++++++++++++++++++++---------
./drivers/md/raid6main.c | 54 +++++++++++++++++++++++++++++++++++--------
./include/linux/raid/raid5.h | 2 +
3 files changed, 87 insertions(+), 21 deletions(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-02-07 13:34:09.000000000 +1100
+++ ./drivers/md/raid5.c 2005-02-07 13:34:23.000000000 +1100
@@ -49,7 +49,7 @@
* This macro is used to determine the 'next' bio in the list, given the sector
* of the current stripe+device
*/
-#define r5_next_bio(bio, sect) ( ( bio->bi_sector + (bio->bi_size>>9) < sect + STRIPE_SECTORS) ? bio->bi_next : NULL)
+#define r5_next_bio(bio, sect) ( ( (bio)->bi_sector + ((bio)->bi_size>>9) < sect + STRIPE_SECTORS) ? (bio)->bi_next : NULL)
/*
* The following can be used to debug the driver
*/
@@ -722,6 +722,10 @@ static void compute_parity(struct stripe
ptr[count++] = page_address(sh->dev[i].page);
chosen = sh->dev[i].towrite;
sh->dev[i].towrite = NULL;
+
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+ wake_up(&conf->wait_for_overlap);
+
if (sh->dev[i].written) BUG();
sh->dev[i].written = chosen;
check_xor();
@@ -734,6 +738,10 @@ static void compute_parity(struct stripe
if (i!=pd_idx && sh->dev[i].towrite) {
chosen = sh->dev[i].towrite;
sh->dev[i].towrite = NULL;
+
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+ wake_up(&conf->wait_for_overlap);
+
if (sh->dev[i].written) BUG();
sh->dev[i].written = chosen;
}
@@ -790,7 +798,7 @@ static void compute_parity(struct stripe
* toread/towrite point to the first in a chain.
* The bi_next chain must be in order.
*/
-static int add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
+static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
{
struct bio **bip;
raid5_conf_t *conf = sh->raid_conf;
@@ -808,9 +816,12 @@ static int add_stripe_bio (struct stripe
bip = &sh->dev[dd_idx].toread;
while (*bip && (*bip)->bi_sector < bi->bi_sector) {
if ((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector)
- return 0; /* cannot add just now due to overlap */
+ goto overlap;
bip = & (*bip)->bi_next;
}
+ if (*bip && (*bip)->bi_sector < bi->bi_sector + ((bi->bi_size)>>9))
+ goto overlap;
+
if (*bip && bi->bi_next && (*bip) != bi->bi_next)
BUG();
if (*bip)
@@ -825,7 +836,7 @@ static int add_stripe_bio (struct stripe
(unsigned long long)sh->sector, dd_idx);
if (forwrite) {
- /* check if page is coverred */
+ /* check if page is covered */
sector_t sector = sh->dev[dd_idx].sector;
for (bi=sh->dev[dd_idx].towrite;
sector < sh->dev[dd_idx].sector + STRIPE_SECTORS &&
@@ -838,6 +849,12 @@ static int add_stripe_bio (struct stripe
set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
}
return 1;
+
+ overlap:
+ set_bit(R5_Overlap, &sh->dev[dd_idx].flags);
+ spin_unlock_irq(&conf->device_lock);
+ spin_unlock(&sh->lock);
+ return 0;
}
@@ -898,6 +915,8 @@ static void handle_stripe(struct stripe_
spin_lock_irq(&conf->device_lock);
rbi = dev->toread;
dev->toread = NULL;
+ if (test_and_clear_bit(R5_Overlap, &dev->flags))
+ wake_up(&conf->wait_for_overlap);
spin_unlock_irq(&conf->device_lock);
while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
copy_data(0, rbi, dev->page, dev->sector);
@@ -945,6 +964,9 @@ static void handle_stripe(struct stripe_
sh->dev[i].towrite = NULL;
if (bi) to_write--;
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+ wake_up(&conf->wait_for_overlap);
+
while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
clear_bit(BIO_UPTODATE, &bi->bi_flags);
@@ -973,6 +995,8 @@ static void handle_stripe(struct stripe_
if (!test_bit(R5_Insync, &sh->dev[i].flags)) {
bi = sh->dev[i].toread;
sh->dev[i].toread = NULL;
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+ wake_up(&conf->wait_for_overlap);
if (bi) to_read--;
while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
@@ -1400,6 +1424,7 @@ static int make_request (request_queue_t
if ( bio_data_dir(bi) == WRITE )
md_write_start(mddev);
for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
+ DEFINE_WAIT(w);
new_sector = raid5_compute_sector(logical_sector,
raid_disks, data_disks, &dd_idx, &pd_idx, conf);
@@ -1408,25 +1433,29 @@ static int make_request (request_queue_t
(unsigned long long)new_sector,
(unsigned long long)logical_sector);
+ prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
+
+ retry:
sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
if (sh) {
-
- while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
- /* add failed due to overlap. Flush everything
+ if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+ /* Add failed due to overlap. Flush everything
* and wait a while
- * FIXME - overlapping requests should be handled better
*/
raid5_unplug_device(mddev->queue);
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_timeout(1);
+ release_stripe(sh);
+ schedule();
+ goto retry;
}
-
+ finish_wait(&conf->wait_for_overlap, &w);
raid5_plug_device(conf);
handle_stripe(sh);
release_stripe(sh);
+
} else {
/* cannot get stripe for read-ahead, just give-up */
clear_bit(BIO_UPTODATE, &bi->bi_flags);
+ finish_wait(&conf->wait_for_overlap, &w);
break;
}
@@ -1574,6 +1603,7 @@ static int run (mddev_t *mddev)
spin_lock_init(&conf->device_lock);
init_waitqueue_head(&conf->wait_for_stripe);
+ init_waitqueue_head(&conf->wait_for_overlap);
INIT_LIST_HEAD(&conf->handle_list);
INIT_LIST_HEAD(&conf->delayed_list);
INIT_LIST_HEAD(&conf->inactive_list);
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-02-07 13:34:09.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-02-07 13:34:23.000000000 +1100
@@ -54,7 +54,7 @@
* This macro is used to determine the 'next' bio in the list, given the sector
* of the current stripe+device
*/
-#define r5_next_bio(bio, sect) ( ( bio->bi_sector + (bio->bi_size>>9) < sect + STRIPE_SECTORS) ? bio->bi_next : NULL)
+#define r5_next_bio(bio, sect) ( ( (bio)->bi_sector + ((bio)->bi_size>>9) < sect + STRIPE_SECTORS) ? (bio)->bi_next : NULL)
/*
* The following can be used to debug the driver
*/
@@ -690,7 +690,7 @@ static void copy_data(int frombio, struc
if (len > 0 && page_offset + len > STRIPE_SIZE)
clen = STRIPE_SIZE - page_offset;
else clen = len;
-
+
if (clen > 0) {
char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
if (frombio)
@@ -735,6 +735,10 @@ static void compute_parity(struct stripe
if ( i != pd_idx && i != qd_idx && sh->dev[i].towrite ) {
chosen = sh->dev[i].towrite;
sh->dev[i].towrite = NULL;
+
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+ wake_up(&conf->wait_for_overlap);
+
if (sh->dev[i].written) BUG();
sh->dev[i].written = chosen;
}
@@ -897,7 +901,7 @@ static void compute_block_2(struct strip
* toread/towrite point to the first in a chain.
* The bi_next chain must be in order.
*/
-static void add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
+static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
{
struct bio **bip;
raid6_conf_t *conf = sh->raid_conf;
@@ -914,10 +918,13 @@ static void add_stripe_bio (struct strip
else
bip = &sh->dev[dd_idx].toread;
while (*bip && (*bip)->bi_sector < bi->bi_sector) {
- BUG_ON((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector);
- bip = & (*bip)->bi_next;
+ if ((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector)
+ goto overlap;
+ bip = &(*bip)->bi_next;
}
-/* FIXME do I need to worry about overlapping bion */
+ if (*bip && (*bip)->bi_sector < bi->bi_sector + ((bi->bi_size)>>9))
+ goto overlap;
+
if (*bip && bi->bi_next && (*bip) != bi->bi_next)
BUG();
if (*bip)
@@ -932,7 +939,7 @@ static void add_stripe_bio (struct strip
(unsigned long long)sh->sector, dd_idx);
if (forwrite) {
- /* check if page is coverred */
+ /* check if page is covered */
sector_t sector = sh->dev[dd_idx].sector;
for (bi=sh->dev[dd_idx].towrite;
sector < sh->dev[dd_idx].sector + STRIPE_SECTORS &&
@@ -944,6 +951,13 @@ static void add_stripe_bio (struct strip
if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
}
+ return 1;
+
+ overlap:
+ set_bit(R5_Overlap, &sh->dev[dd_idx].flags);
+ spin_unlock_irq(&conf->device_lock);
+ spin_unlock(&sh->lock);
+ return 0;
}
@@ -1007,6 +1021,8 @@ static void handle_stripe(struct stripe_
spin_lock_irq(&conf->device_lock);
rbi = dev->toread;
dev->toread = NULL;
+ if (test_and_clear_bit(R5_Overlap, &dev->flags))
+ wake_up(&conf->wait_for_overlap);
spin_unlock_irq(&conf->device_lock);
while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
copy_data(0, rbi, dev->page, dev->sector);
@@ -1056,6 +1072,9 @@ static void handle_stripe(struct stripe_
sh->dev[i].towrite = NULL;
if (bi) to_write--;
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+ wake_up(&conf->wait_for_overlap);
+
while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
clear_bit(BIO_UPTODATE, &bi->bi_flags);
@@ -1084,6 +1103,8 @@ static void handle_stripe(struct stripe_
if (!test_bit(R5_Insync, &sh->dev[i].flags)) {
bi = sh->dev[i].toread;
sh->dev[i].toread = NULL;
+ if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+ wake_up(&conf->wait_for_overlap);
if (bi) to_read--;
while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
@@ -1563,6 +1584,7 @@ static int make_request (request_queue_t
if ( bio_data_dir(bi) == WRITE )
md_write_start(mddev);
for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
+ DEFINE_WAIT(w);
new_sector = raid6_compute_sector(logical_sector,
raid_disks, data_disks, &dd_idx, &pd_idx, conf);
@@ -1571,17 +1593,28 @@ static int make_request (request_queue_t
(unsigned long long)new_sector,
(unsigned long long)logical_sector);
+ prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
+
+ retry:
sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
if (sh) {
-
- add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK));
-
+ if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+ /* Add failed due to overlap. Flush everything
+ * and wait a while
+ */
+ raid6_unplug_device(mddev->queue);
+ release_stripe(sh);
+ schedule();
+ goto retry;
+ }
+ finish_wait(&conf->wait_for_overlap, &w);
raid6_plug_device(conf);
handle_stripe(sh);
release_stripe(sh);
} else {
/* cannot get stripe for read-ahead, just give-up */
clear_bit(BIO_UPTODATE, &bi->bi_flags);
+ finish_wait(&conf->wait_for_overlap, &w);
break;
}
@@ -1729,6 +1762,7 @@ static int run (mddev_t *mddev)
spin_lock_init(&conf->device_lock);
init_waitqueue_head(&conf->wait_for_stripe);
+ init_waitqueue_head(&conf->wait_for_overlap);
INIT_LIST_HEAD(&conf->handle_list);
INIT_LIST_HEAD(&conf->delayed_list);
INIT_LIST_HEAD(&conf->inactive_list);
diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~ 2005-02-07 13:31:05.000000000 +1100
+++ ./include/linux/raid/raid5.h 2005-02-07 13:34:23.000000000 +1100
@@ -152,6 +152,7 @@ struct stripe_head {
#define R5_Wantread 4 /* want to schedule a read */
#define R5_Wantwrite 5
#define R5_Syncio 6 /* this io need to be accounted as resync io */
+#define R5_Overlap 7 /* There is a pending overlapping request on this block */
/*
* Write method
@@ -219,6 +220,7 @@ struct raid5_private_data {
atomic_t active_stripes;
struct list_head inactive_list;
wait_queue_head_t wait_for_stripe;
+ wait_queue_head_t wait_for_overlap;
int inactive_blocked; /* release of inactive stripes blocked,
* waiting for 25% to be free
*/
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH md 1 of 6] Fix problems with verion-1 superblock code
2005-02-07 2:44 [PATCH md 0 of 6] Introduction NeilBrown
2005-02-07 2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
@ 2005-02-07 2:44 ` NeilBrown
2005-02-07 2:44 ` [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync NeilBrown
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07 2:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
- off-by-one error
- missing recalc of checksum
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-02-07 13:31:05.000000000 +1100
+++ ./drivers/md/md.c 2005-02-07 13:31:08.000000000 +1100
@@ -980,8 +980,8 @@ static void super_1_sync(mddev_t *mddev,
max_dev = 0;
ITERATE_RDEV(mddev,rdev2,tmp)
- if (rdev2->desc_nr > max_dev)
- max_dev = rdev2->desc_nr;
+ if (rdev2->desc_nr+1 > max_dev)
+ max_dev = rdev2->desc_nr+1;
sb->max_dev = cpu_to_le32(max_dev);
for (i=0; i<max_dev;i++)
@@ -998,6 +998,7 @@ static void super_1_sync(mddev_t *mddev,
}
sb->recovery_offset = cpu_to_le64(0); /* not supported yet */
+ sb->sb_csum = calc_sb_1_csum(sb);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync.
2005-02-07 2:44 [PATCH md 0 of 6] Introduction NeilBrown
2005-02-07 2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
2005-02-07 2:44 ` [PATCH md 1 of 6] Fix problems with verion-1 superblock code NeilBrown
@ 2005-02-07 2:44 ` NeilBrown
2005-02-07 2:44 ` [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array NeilBrown
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07 2:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
If the resync checkpoint for an array is at the end of the array,
It doesn't get set to MAX_SECTOR, so resyncing will be retried.
By updating curr_resync early, this problem is fixed.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-02-07 13:32:22.000000000 +1100
+++ ./drivers/md/md.c 2005-02-07 13:33:59.000000000 +1100
@@ -3475,10 +3475,12 @@ static void md_do_sync(mddev_t *mddev)
init_waitqueue_head(&mddev->recovery_wait);
last_check = 0;
- if (j)
+ if (j>2) {
printk(KERN_INFO
"md: resuming recovery of %s from checkpoint.\n",
mdname(mddev));
+ mddev->curr_resync = j;
+ }
while (j < max_sectors) {
int sectors;
@@ -3565,7 +3567,7 @@ static void md_do_sync(mddev_t *mddev)
if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) &&
mddev->curr_resync > 2 &&
- mddev->curr_resync > mddev->recovery_cp) {
+ mddev->curr_resync >= mddev->recovery_cp) {
if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
printk(KERN_INFO
"md: checkpointing recovery of %s.\n",
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
2005-02-07 2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
@ 2005-02-08 2:24 ` Andrew Morton
2005-02-08 2:39 ` Neil Brown
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2005-02-08 2:24 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown <neilb@cse.unsw.edu.au> wrote:
>
> + retry:
> sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> if (sh) {
> -
> - while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> - /* add failed due to overlap. Flush everything
> + if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> + /* Add failed due to overlap. Flush everything
> * and wait a while
> - * FIXME - overlapping requests should be handled better
> */
> raid5_unplug_device(mddev->queue);
> - set_current_state(TASK_UNINTERRUPTIBLE);
> - schedule_timeout(1);
> + release_stripe(sh);
> + schedule();
> + goto retry;
Worrisome. If the calling process has SCHED_RR or SCHED_FIFO policy, this
could cause a lockup, perhaps.
Some sort of real synchronisation scheme would be nicer. Or at the least,
what was wrong with the schedule_timeout(1)?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
2005-02-08 2:24 ` Andrew Morton
@ 2005-02-08 2:39 ` Neil Brown
2005-02-08 3:07 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2005-02-08 2:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
On Monday February 7, akpm@osdl.org wrote:
> NeilBrown <neilb@cse.unsw.edu.au> wrote:
> >
> > + retry:
> > sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> > if (sh) {
> > -
> > - while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> > - /* add failed due to overlap. Flush everything
> > + if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> > + /* Add failed due to overlap. Flush everything
> > * and wait a while
> > - * FIXME - overlapping requests should be handled better
> > */
> > raid5_unplug_device(mddev->queue);
> > - set_current_state(TASK_UNINTERRUPTIBLE);
> > - schedule_timeout(1);
> > + release_stripe(sh);
> > + schedule();
> > + goto retry;
>
> Worrisome. If the calling process has SCHED_RR or SCHED_FIFO policy, this
> could cause a lockup, perhaps.
Is that just that I should have the "prepare_to_wait" after the retry:
label rather than before, or is there something more subtle.
>
> Some sort of real synchronisation scheme would be nicer. Or at the least,
> what was wrong with the schedule_timeout(1)?
schedule_timeout shouldn't be necessary as whenever the conflicting
bio gets dealt with, a wakeup will happen on the conf->wait_for_overlap wait_queue.
(schedule_timeout(1) by itself slowed things down too much. The
proactive waking was needed, and with it in place, the timeout becomes
irrelevant).
NeilBrown
--
md: prepare_to_wait needs to be in the loop which waits for overlapping requests to complete
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/raid5.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-02-07 13:34:23.000000000 +1100
+++ ./drivers/md/raid5.c 2005-02-08 13:34:38.000000000 +1100
@@ -1433,9 +1433,9 @@ static int make_request (request_queue_t
(unsigned long long)new_sector,
(unsigned long long)logical_sector);
+ retry:
prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
- retry:
sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
if (sh) {
if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
2005-02-08 2:39 ` Neil Brown
@ 2005-02-08 3:07 ` Andrew Morton
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2005-02-08 3:07 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> On Monday February 7, akpm@osdl.org wrote:
> > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > >
> > > + retry:
> > > sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> > > if (sh) {
> > > -
> > > - while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> > > - /* add failed due to overlap. Flush everything
> > > + if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> > > + /* Add failed due to overlap. Flush everything
> > > * and wait a while
> > > - * FIXME - overlapping requests should be handled better
> > > */
> > > raid5_unplug_device(mddev->queue);
> > > - set_current_state(TASK_UNINTERRUPTIBLE);
> > > - schedule_timeout(1);
> > > + release_stripe(sh);
> > > + schedule();
> > > + goto retry;
> >
> > Worrisome. If the calling process has SCHED_RR or SCHED_FIFO policy, this
> > could cause a lockup, perhaps.
>
> Is that just that I should have the "prepare_to_wait" after the retry:
> label rather than before, or is there something more subtle.
umm, yes. We always exit from schedule() in state TASK_RUNNING, so we need
to run prepare_to_wait() each time around. See __wait_on_bit() for a
canonical example.
> ### Diffstat output
> ./drivers/md/raid5.c | 2 +-
> 1 files changed, 1 insertion(+), 1 deletion(-)
>
> diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
> --- ./drivers/md/raid5.c~current~ 2005-02-07 13:34:23.000000000 +1100
> +++ ./drivers/md/raid5.c 2005-02-08 13:34:38.000000000 +1100
> @@ -1433,9 +1433,9 @@ static int make_request (request_queue_t
> (unsigned long long)new_sector,
> (unsigned long long)logical_sector);
>
> + retry:
> prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
>
> - retry:
> sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> if (sh) {
> if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
You missed one.
--- 25/drivers/md/raid5.c~md-fix-handling-of-overlapping-requests-in-raid5-fix 2005-02-07 19:04:18.000000000 -0800
+++ 25-akpm/drivers/md/raid5.c 2005-02-07 19:04:48.000000000 -0800
@@ -1433,9 +1433,8 @@ static int make_request (request_queue_t
(unsigned long long)new_sector,
(unsigned long long)logical_sector);
- prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
-
retry:
+ prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
if (sh) {
if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
diff -puN drivers/md/raid6main.c~md-fix-handling-of-overlapping-requests-in-raid5-fix drivers/md/raid6main.c
--- 25/drivers/md/raid6main.c~md-fix-handling-of-overlapping-requests-in-raid5-fix 2005-02-07 19:04:18.000000000 -0800
+++ 25-akpm/drivers/md/raid6main.c 2005-02-07 19:04:54.000000000 -0800
@@ -1593,9 +1593,8 @@ static int make_request (request_queue_t
(unsigned long long)new_sector,
(unsigned long long)logical_sector);
- prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
-
retry:
+ prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
if (sh) {
if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
_
(That piece of the code looks pretty fugly in an 80-col xterm btw..)
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-02-08 3:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-07 2:44 [PATCH md 0 of 6] Introduction NeilBrown
2005-02-07 2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
2005-02-08 2:24 ` Andrew Morton
2005-02-08 2:39 ` Neil Brown
2005-02-08 3:07 ` Andrew Morton
2005-02-07 2:44 ` [PATCH md 1 of 6] Fix problems with verion-1 superblock code NeilBrown
2005-02-07 2:44 ` [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync NeilBrown
2005-02-07 2:44 ` [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array NeilBrown
2005-02-07 2:44 ` [PATCH md 5 of 6] Remove extra loop from copy_data NeilBrown
2005-02-07 2:44 ` [PATCH md 3 of 6] Make md work a bit better with devfs NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).