[PATCH md 0 of 6] Introduction

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH md 0 of 6] Introduction
@ 2005-02-07  2:44 NeilBrown
  2005-02-07  2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


Following are 6 patches for md in 2.6.11-rc2-mm2

I believe they are all suitable for inclusion in 2.6.11

- Fix silly bugs in version-1 superblock code (which currently is 
   not widely used)
- Fix possible Oops in raid_set_faulty (which is hard to trigger in 
   normal usage)
- Make md work better with devfs when it is compiled in.
- Fix a possible endless-resync-loop
- Removing the for loop in "copy_data" (raid5 and raid6) which caused 
  some copinging to be done multiple times (should be harmless, but is 
  a waste)
- Fix handling for overlapping requests to raid5 and raid6.  -mm currently
  has a patch which is even more broken than the original code.  Adding
  this patch fixes it up.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array.
  2005-02-07  2:44 [PATCH md 0 of 6] Introduction NeilBrown
                   ` (2 preceding siblings ...)
  2005-02-07  2:44 ` [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync NeilBrown
@ 2005-02-07  2:44 ` NeilBrown
  2005-02-07  2:44 ` [PATCH md 5 of 6] Remove extra loop from copy_data NeilBrown
  2005-02-07  2:44 ` [PATCH md 3 of 6] Make md work a bit better with devfs NeilBrown
  5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


hot_add_disk and hot_remove_disk check mddev->pers before proceeding.
set_disk_faulty should too.


Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c |    3 +++
 1 files changed, 3 insertions(+)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-02-07 13:31:08.000000000 +1100
+++ ./drivers/md/md.c	2005-02-07 13:32:00.000000000 +1100
@@ -2489,6 +2489,9 @@ static int set_disk_faulty(mddev_t *mdde
 {
 	mdk_rdev_t *rdev;
 
+	if (mddev->pers == NULL)
+		return -ENODEV;
+
 	rdev = find_rdev(mddev, dev);
 	if (!rdev)
 		return -ENODEV;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH md 3 of 6] Make md work a bit better with devfs
  2005-02-07  2:44 [PATCH md 0 of 6] Introduction NeilBrown
                   ` (4 preceding siblings ...)
  2005-02-07  2:44 ` [PATCH md 5 of 6] Remove extra loop from copy_data NeilBrown
@ 2005-02-07  2:44 ` NeilBrown
  5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


- set ->devfs_name
- create initial devfs names slightly differently so
  as not to conflict
- re-read partition table when an array is assembled at boot
  time - not sure why this is needed, but it is.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c     |    9 ++++++---
 ./init/do_mounts_md.c |   10 ++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-02-07 13:32:00.000000000 +1100
+++ ./drivers/md/md.c	2005-02-07 13:32:22.000000000 +1100
@@ -1518,10 +1518,13 @@ static struct kobject *md_probe(dev_t de
 	}
 	disk->major = MAJOR(dev);
 	disk->first_minor = unit << shift;
-	if (partitioned)
+	if (partitioned) {
 		sprintf(disk->disk_name, "md_d%d", unit);
-	else
+		sprintf(disk->devfs_name, "md/d%d", unit);
+	} else {
 		sprintf(disk->disk_name, "md%d", unit);
+		sprintf(disk->devfs_name, "md/%d", unit);
+	}
 	disk->fops = &md_fops;
 	disk->private_data = mddev;
 	disk->queue = mddev->queue;
@@ -3773,7 +3776,7 @@ int __init md_init(void)
 	for (minor=0; minor < MAX_MD_DEVS; ++minor)
 		devfs_mk_bdev(MKDEV(mdp_major, minor<<MdpMinorShift),
 			      S_IFBLK|S_IRUSR|S_IWUSR,
-			      "md/d%d", minor);
+			      "md/mdp%d", minor);
 
 
 	register_reboot_notifier(&md_notifier);

diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
--- ./init/do_mounts_md.c~current~	2005-02-07 13:31:05.000000000 +1100
+++ ./init/do_mounts_md.c	2005-02-07 13:32:22.000000000 +1100
@@ -232,6 +232,16 @@ static void __init md_setup_drive(void)
 			err = sys_ioctl(fd, RUN_ARRAY, 0);
 		if (err)
 			printk(KERN_WARNING "md: starting md%d failed\n", minor);
+		else {
+			/* reread the partition table.
+			 * I (neilb) and not sure why this is needed, but I cannot
+			 * boot a kernel with devfs compiled in from partitioned md 
+			 * array without it
+			 */
+			sys_close(fd);
+			fd = sys_open(name, 0, 0);
+			sys_ioctl(fd, BLKRRPART, 0);
+		}
 		sys_close(fd);
 	}
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH md 5 of 6] Remove extra loop from copy_data
  2005-02-07  2:44 [PATCH md 0 of 6] Introduction NeilBrown
                   ` (3 preceding siblings ...)
  2005-02-07  2:44 ` [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array NeilBrown
@ 2005-02-07  2:44 ` NeilBrown
  2005-02-07  2:44 ` [PATCH md 3 of 6] Make md work a bit better with devfs NeilBrown
  5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


copy_data currently loops over bio's in a list, but the 
caller also does the same looping, sometimes with extra work.
So remove the loop from copy_data.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/raid5.c     |   72 ++++++++++++++++++++++-------------------------
 ./drivers/md/raid6main.c |   65 ++++++++++++++++++++----------------------
 2 files changed, 65 insertions(+), 72 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2005-02-07 13:31:05.000000000 +1100
+++ ./drivers/md/raid5.c	2005-02-07 13:34:09.000000000 +1100
@@ -613,11 +613,10 @@ static sector_t compute_blocknr(struct s
 
 
 /*
- * Copy data between a page in the stripe cache, and one or more bion
- * The page could align with the middle of the bio, or there could be 
- * several bion, each with several bio_vecs, which cover part of the page
- * Multiple bion are linked together on bi_next.  There may be extras
- * at the end of this list.  We ignore them.
+ * Copy data between a page in the stripe cache, and a bio.
+ * There are no alignment or size guarantees between the page or the
+ * bio except that there is some overlap.
+ * All iovecs in the bio must be considered. 
  */
 static void copy_data(int frombio, struct bio *bio,
 		     struct page *page,
@@ -626,41 +625,38 @@ static void copy_data(int frombio, struc
 	char *pa = page_address(page);
 	struct bio_vec *bvl;
 	int i;
+	int page_offset;
 
-	for (;bio && bio->bi_sector < sector+STRIPE_SECTORS;
-	      bio = r5_next_bio(bio, sector) ) {
-		int page_offset;
-		if (bio->bi_sector >= sector)
-			page_offset = (signed)(bio->bi_sector - sector) * 512;
-		else 
-			page_offset = (signed)(sector - bio->bi_sector) * -512;
-		bio_for_each_segment(bvl, bio, i) {
-			int len = bio_iovec_idx(bio,i)->bv_len;
-			int clen;
-			int b_offset = 0;			
-
-			if (page_offset < 0) {
-				b_offset = -page_offset;
-				page_offset += b_offset;
-				len -= b_offset;
-			}
-
-			if (len > 0 && page_offset + len > STRIPE_SIZE)
-				clen = STRIPE_SIZE - page_offset;	
-			else clen = len;
+	if (bio->bi_sector >= sector)
+		page_offset = (signed)(bio->bi_sector - sector) * 512;
+	else 
+		page_offset = (signed)(sector - bio->bi_sector) * -512;
+	bio_for_each_segment(bvl, bio, i) {
+		int len = bio_iovec_idx(bio,i)->bv_len;
+		int clen;
+		int b_offset = 0;			
+
+		if (page_offset < 0) {
+			b_offset = -page_offset;
+			page_offset += b_offset;
+			len -= b_offset;
+		}
+
+		if (len > 0 && page_offset + len > STRIPE_SIZE)
+			clen = STRIPE_SIZE - page_offset;	
+		else clen = len;
 			
-			if (clen > 0) {
-				char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
-				if (frombio)
-					memcpy(pa+page_offset, ba+b_offset, clen);
-				else
-					memcpy(ba+b_offset, pa+page_offset, clen);
-				__bio_kunmap_atomic(ba, KM_USER0);
-			}	
-			if (clen < len) /* hit end of page */
-				break;
-			page_offset +=  len;
-		}
+		if (clen > 0) {
+			char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
+			if (frombio)
+				memcpy(pa+page_offset, ba+b_offset, clen);
+			else
+				memcpy(ba+b_offset, pa+page_offset, clen);
+			__bio_kunmap_atomic(ba, KM_USER0);
+		}	
+		if (clen < len) /* hit end of page */
+			break;
+		page_offset +=  len;
 	}
 }
 

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2005-02-07 13:31:05.000000000 +1100
+++ ./drivers/md/raid6main.c	2005-02-07 13:34:09.000000000 +1100
@@ -670,41 +670,38 @@ static void copy_data(int frombio, struc
 	char *pa = page_address(page);
 	struct bio_vec *bvl;
 	int i;
+	int page_offset;
 
-	for (;bio && bio->bi_sector < sector+STRIPE_SECTORS;
-	      bio = r5_next_bio(bio, sector) ) {
-		int page_offset;
-		if (bio->bi_sector >= sector)
-			page_offset = (signed)(bio->bi_sector - sector) * 512;
-		else
-			page_offset = (signed)(sector - bio->bi_sector) * -512;
-		bio_for_each_segment(bvl, bio, i) {
-			int len = bio_iovec_idx(bio,i)->bv_len;
-			int clen;
-			int b_offset = 0;
-
-			if (page_offset < 0) {
-				b_offset = -page_offset;
-				page_offset += b_offset;
-				len -= b_offset;
-			}
-
-			if (len > 0 && page_offset + len > STRIPE_SIZE)
-				clen = STRIPE_SIZE - page_offset;
-			else clen = len;
-
-			if (clen > 0) {
-				char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
-				if (frombio)
-					memcpy(pa+page_offset, ba+b_offset, clen);
-				else
-					memcpy(ba+b_offset, pa+page_offset, clen);
-				__bio_kunmap_atomic(ba, KM_USER0);
-			}
-			if (clen < len) /* hit end of page */
-				break;
-			page_offset +=  len;
-		}
+	if (bio->bi_sector >= sector)
+		page_offset = (signed)(bio->bi_sector - sector) * 512;
+	else 
+		page_offset = (signed)(sector - bio->bi_sector) * -512;
+	bio_for_each_segment(bvl, bio, i) {
+		int len = bio_iovec_idx(bio,i)->bv_len;
+		int clen;
+		int b_offset = 0;			
+
+		if (page_offset < 0) {
+			b_offset = -page_offset;
+			page_offset += b_offset;
+			len -= b_offset;
+		}
+
+		if (len > 0 && page_offset + len > STRIPE_SIZE)
+			clen = STRIPE_SIZE - page_offset;	
+		else clen = len;
+			
+		if (clen > 0) {
+			char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
+			if (frombio)
+				memcpy(pa+page_offset, ba+b_offset, clen);
+			else
+				memcpy(ba+b_offset, pa+page_offset, clen);
+			__bio_kunmap_atomic(ba, KM_USER0);
+		}	
+		if (clen < len) /* hit end of page */
+			break;
+		page_offset +=  len;
 	}
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
  2005-02-07  2:44 [PATCH md 0 of 6] Introduction NeilBrown
@ 2005-02-07  2:44 ` NeilBrown
  2005-02-08  2:24   ` Andrew Morton
  2005-02-07  2:44 ` [PATCH md 1 of 6] Fix problems with verion-1 superblock code NeilBrown
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2005-02-07  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


If we detect an overlap, we set a flag and wait for a wakeup.
When requests are handled, if the flag was set, we perform the
wakeup.

Note that the code currently in -mm is badly broken.  With this
patch applied, it passes tests the use O_DIRECT to cause lots
of overlapping requests.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/raid5.c         |   52 ++++++++++++++++++++++++++++++++---------
 ./drivers/md/raid6main.c     |   54 +++++++++++++++++++++++++++++++++++--------
 ./include/linux/raid/raid5.h |    2 +
 3 files changed, 87 insertions(+), 21 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2005-02-07 13:34:09.000000000 +1100
+++ ./drivers/md/raid5.c	2005-02-07 13:34:23.000000000 +1100
@@ -49,7 +49,7 @@
  * This macro is used to determine the 'next' bio in the list, given the sector
  * of the current stripe+device
  */
-#define r5_next_bio(bio, sect) ( ( bio->bi_sector + (bio->bi_size>>9) < sect + STRIPE_SECTORS) ? bio->bi_next : NULL)
+#define r5_next_bio(bio, sect) ( ( (bio)->bi_sector + ((bio)->bi_size>>9) < sect + STRIPE_SECTORS) ? (bio)->bi_next : NULL)
 /*
  * The following can be used to debug the driver
  */
@@ -722,6 +722,10 @@ static void compute_parity(struct stripe
 				ptr[count++] = page_address(sh->dev[i].page);
 				chosen = sh->dev[i].towrite;
 				sh->dev[i].towrite = NULL;
+
+				if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+					wake_up(&conf->wait_for_overlap);
+
 				if (sh->dev[i].written) BUG();
 				sh->dev[i].written = chosen;
 				check_xor();
@@ -734,6 +738,10 @@ static void compute_parity(struct stripe
 			if (i!=pd_idx && sh->dev[i].towrite) {
 				chosen = sh->dev[i].towrite;
 				sh->dev[i].towrite = NULL;
+
+				if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+					wake_up(&conf->wait_for_overlap);
+
 				if (sh->dev[i].written) BUG();
 				sh->dev[i].written = chosen;
 			}
@@ -790,7 +798,7 @@ static void compute_parity(struct stripe
  * toread/towrite point to the first in a chain. 
  * The bi_next chain must be in order.
  */
-static int add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
+static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
 {
 	struct bio **bip;
 	raid5_conf_t *conf = sh->raid_conf;
@@ -808,9 +816,12 @@ static int add_stripe_bio (struct stripe
 		bip = &sh->dev[dd_idx].toread;
 	while (*bip && (*bip)->bi_sector < bi->bi_sector) {
 		if ((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector)
-			return 0; /* cannot add just now due to overlap */
+			goto overlap;
 		bip = & (*bip)->bi_next;
 	}
+	if (*bip && (*bip)->bi_sector < bi->bi_sector + ((bi->bi_size)>>9)) 
+		goto overlap;
+
 	if (*bip && bi->bi_next && (*bip) != bi->bi_next)
 		BUG();
 	if (*bip)
@@ -825,7 +836,7 @@ static int add_stripe_bio (struct stripe
 		(unsigned long long)sh->sector, dd_idx);
 
 	if (forwrite) {
-		/* check if page is coverred */
+		/* check if page is covered */
 		sector_t sector = sh->dev[dd_idx].sector;
 		for (bi=sh->dev[dd_idx].towrite;
 		     sector < sh->dev[dd_idx].sector + STRIPE_SECTORS &&
@@ -838,6 +849,12 @@ static int add_stripe_bio (struct stripe
 			set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
 	}
 	return 1;
+
+ overlap:
+	set_bit(R5_Overlap, &sh->dev[dd_idx].flags);
+	spin_unlock_irq(&conf->device_lock);
+	spin_unlock(&sh->lock);
+	return 0;
 }
 
 
@@ -898,6 +915,8 @@ static void handle_stripe(struct stripe_
 			spin_lock_irq(&conf->device_lock);
 			rbi = dev->toread;
 			dev->toread = NULL;
+			if (test_and_clear_bit(R5_Overlap, &dev->flags))
+				wake_up(&conf->wait_for_overlap);
 			spin_unlock_irq(&conf->device_lock);
 			while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
 				copy_data(0, rbi, dev->page, dev->sector);
@@ -945,6 +964,9 @@ static void handle_stripe(struct stripe_
 			sh->dev[i].towrite = NULL;
 			if (bi) to_write--;
 
+			if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+				wake_up(&conf->wait_for_overlap);
+
 			while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
 				struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
 				clear_bit(BIO_UPTODATE, &bi->bi_flags);
@@ -973,6 +995,8 @@ static void handle_stripe(struct stripe_
 			if (!test_bit(R5_Insync, &sh->dev[i].flags)) {
 				bi = sh->dev[i].toread;
 				sh->dev[i].toread = NULL;
+				if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+					wake_up(&conf->wait_for_overlap);
 				if (bi) to_read--;
 				while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
 					struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
@@ -1400,6 +1424,7 @@ static int make_request (request_queue_t
 	if ( bio_data_dir(bi) == WRITE )
 		md_write_start(mddev);
 	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
+		DEFINE_WAIT(w);
 		
 		new_sector = raid5_compute_sector(logical_sector,
 						  raid_disks, data_disks, &dd_idx, &pd_idx, conf);
@@ -1408,25 +1433,29 @@ static int make_request (request_queue_t
 			(unsigned long long)new_sector, 
 			(unsigned long long)logical_sector);
 
+		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
+
+	retry:
 		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
-
-			while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
-				/* add failed due to overlap.  Flush everything
+			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+				/* Add failed due to overlap.  Flush everything
 				 * and wait a while
-				 * FIXME - overlapping requests should be handled better
 				 */
 				raid5_unplug_device(mddev->queue);
-				set_current_state(TASK_UNINTERRUPTIBLE);
-				schedule_timeout(1);
+				release_stripe(sh);
+				schedule();
+				goto retry;
 			}
-
+			finish_wait(&conf->wait_for_overlap, &w);
 			raid5_plug_device(conf);
 			handle_stripe(sh);
 			release_stripe(sh);
+			
 		} else {
 			/* cannot get stripe for read-ahead, just give-up */
 			clear_bit(BIO_UPTODATE, &bi->bi_flags);
+			finish_wait(&conf->wait_for_overlap, &w);
 			break;
 		}
 			
@@ -1574,6 +1603,7 @@ static int run (mddev_t *mddev)
 
 	spin_lock_init(&conf->device_lock);
 	init_waitqueue_head(&conf->wait_for_stripe);
+	init_waitqueue_head(&conf->wait_for_overlap);
 	INIT_LIST_HEAD(&conf->handle_list);
 	INIT_LIST_HEAD(&conf->delayed_list);
 	INIT_LIST_HEAD(&conf->inactive_list);

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2005-02-07 13:34:09.000000000 +1100
+++ ./drivers/md/raid6main.c	2005-02-07 13:34:23.000000000 +1100
@@ -54,7 +54,7 @@
  * This macro is used to determine the 'next' bio in the list, given the sector
  * of the current stripe+device
  */
-#define r5_next_bio(bio, sect) ( ( bio->bi_sector + (bio->bi_size>>9) < sect + STRIPE_SECTORS) ? bio->bi_next : NULL)
+#define r5_next_bio(bio, sect) ( ( (bio)->bi_sector + ((bio)->bi_size>>9) < sect + STRIPE_SECTORS) ? (bio)->bi_next : NULL)
 /*
  * The following can be used to debug the driver
  */
@@ -690,7 +690,7 @@ static void copy_data(int frombio, struc
 		if (len > 0 && page_offset + len > STRIPE_SIZE)
 			clen = STRIPE_SIZE - page_offset;	
 		else clen = len;
-			
+		
 		if (clen > 0) {
 			char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
 			if (frombio)
@@ -735,6 +735,10 @@ static void compute_parity(struct stripe
 			if ( i != pd_idx && i != qd_idx && sh->dev[i].towrite ) {
 				chosen = sh->dev[i].towrite;
 				sh->dev[i].towrite = NULL;
+
+				if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+					wake_up(&conf->wait_for_overlap);
+
 				if (sh->dev[i].written) BUG();
 				sh->dev[i].written = chosen;
 			}
@@ -897,7 +901,7 @@ static void compute_block_2(struct strip
  * toread/towrite point to the first in a chain.
  * The bi_next chain must be in order.
  */
-static void add_stripe_bio (struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
+static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, int forwrite)
 {
 	struct bio **bip;
 	raid6_conf_t *conf = sh->raid_conf;
@@ -914,10 +918,13 @@ static void add_stripe_bio (struct strip
 	else
 		bip = &sh->dev[dd_idx].toread;
 	while (*bip && (*bip)->bi_sector < bi->bi_sector) {
-		BUG_ON((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector);
-		bip = & (*bip)->bi_next;
+		if ((*bip)->bi_sector + ((*bip)->bi_size >> 9) > bi->bi_sector)
+			goto overlap;
+		bip = &(*bip)->bi_next;
 	}
-/* FIXME do I need to worry about overlapping bion */
+	if (*bip && (*bip)->bi_sector < bi->bi_sector + ((bi->bi_size)>>9)) 
+		goto overlap;
+
 	if (*bip && bi->bi_next && (*bip) != bi->bi_next)
 		BUG();
 	if (*bip)
@@ -932,7 +939,7 @@ static void add_stripe_bio (struct strip
 		(unsigned long long)sh->sector, dd_idx);
 
 	if (forwrite) {
-		/* check if page is coverred */
+		/* check if page is covered */
 		sector_t sector = sh->dev[dd_idx].sector;
 		for (bi=sh->dev[dd_idx].towrite;
 		     sector < sh->dev[dd_idx].sector + STRIPE_SECTORS &&
@@ -944,6 +951,13 @@ static void add_stripe_bio (struct strip
 		if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
 			set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
 	}
+	return 1;
+
+ overlap:
+	set_bit(R5_Overlap, &sh->dev[dd_idx].flags);
+	spin_unlock_irq(&conf->device_lock);
+	spin_unlock(&sh->lock);
+	return 0;
 }
 
 
@@ -1007,6 +1021,8 @@ static void handle_stripe(struct stripe_
 			spin_lock_irq(&conf->device_lock);
 			rbi = dev->toread;
 			dev->toread = NULL;
+			if (test_and_clear_bit(R5_Overlap, &dev->flags))
+				wake_up(&conf->wait_for_overlap);
 			spin_unlock_irq(&conf->device_lock);
 			while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
 				copy_data(0, rbi, dev->page, dev->sector);
@@ -1056,6 +1072,9 @@ static void handle_stripe(struct stripe_
 			sh->dev[i].towrite = NULL;
 			if (bi) to_write--;
 
+			if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+				wake_up(&conf->wait_for_overlap);
+
 			while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
 				struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
 				clear_bit(BIO_UPTODATE, &bi->bi_flags);
@@ -1084,6 +1103,8 @@ static void handle_stripe(struct stripe_
 			if (!test_bit(R5_Insync, &sh->dev[i].flags)) {
 				bi = sh->dev[i].toread;
 				sh->dev[i].toread = NULL;
+				if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+					wake_up(&conf->wait_for_overlap);
 				if (bi) to_read--;
 				while (bi && bi->bi_sector < sh->dev[i].sector + STRIPE_SECTORS){
 					struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
@@ -1563,6 +1584,7 @@ static int make_request (request_queue_t
 	if ( bio_data_dir(bi) == WRITE )
 		md_write_start(mddev);
 	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
+		DEFINE_WAIT(w);
 
 		new_sector = raid6_compute_sector(logical_sector,
 						  raid_disks, data_disks, &dd_idx, &pd_idx, conf);
@@ -1571,17 +1593,28 @@ static int make_request (request_queue_t
 		       (unsigned long long)new_sector,
 		       (unsigned long long)logical_sector);
 
+		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
+
+	retry:
 		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
-
-			add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK));
-
+			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+				/* Add failed due to overlap.  Flush everything
+				 * and wait a while
+				 */
+				raid6_unplug_device(mddev->queue);
+				release_stripe(sh);
+				schedule();
+				goto retry;
+			}
+			finish_wait(&conf->wait_for_overlap, &w);
 			raid6_plug_device(conf);
 			handle_stripe(sh);
 			release_stripe(sh);
 		} else {
 			/* cannot get stripe for read-ahead, just give-up */
 			clear_bit(BIO_UPTODATE, &bi->bi_flags);
+			finish_wait(&conf->wait_for_overlap, &w);
 			break;
 		}
 
@@ -1729,6 +1762,7 @@ static int run (mddev_t *mddev)
 
 	spin_lock_init(&conf->device_lock);
 	init_waitqueue_head(&conf->wait_for_stripe);
+	init_waitqueue_head(&conf->wait_for_overlap);
 	INIT_LIST_HEAD(&conf->handle_list);
 	INIT_LIST_HEAD(&conf->delayed_list);
 	INIT_LIST_HEAD(&conf->inactive_list);

diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~	2005-02-07 13:31:05.000000000 +1100
+++ ./include/linux/raid/raid5.h	2005-02-07 13:34:23.000000000 +1100
@@ -152,6 +152,7 @@ struct stripe_head {
 #define	R5_Wantread	4	/* want to schedule a read */
 #define	R5_Wantwrite	5
 #define	R5_Syncio	6	/* this io need to be accounted as resync io */
+#define	R5_Overlap	7	/* There is a pending overlapping request on this block */
 
 /*
  * Write method
@@ -219,6 +220,7 @@ struct raid5_private_data {
 	atomic_t		active_stripes;
 	struct list_head	inactive_list;
 	wait_queue_head_t	wait_for_stripe;
+	wait_queue_head_t	wait_for_overlap;
 	int			inactive_blocked;	/* release of inactive stripes blocked,
 							 * waiting for 25% to be free
 							 */        

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH md 1 of 6] Fix problems with verion-1 superblock code
  2005-02-07  2:44 [PATCH md 0 of 6] Introduction NeilBrown
  2005-02-07  2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
@ 2005-02-07  2:44 ` NeilBrown
  2005-02-07  2:44 ` [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync NeilBrown
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


- off-by-one error
- missing recalc of checksum

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-02-07 13:31:05.000000000 +1100
+++ ./drivers/md/md.c	2005-02-07 13:31:08.000000000 +1100
@@ -980,8 +980,8 @@ static void super_1_sync(mddev_t *mddev,
 
 	max_dev = 0;
 	ITERATE_RDEV(mddev,rdev2,tmp)
-		if (rdev2->desc_nr > max_dev)
-			max_dev = rdev2->desc_nr;
+		if (rdev2->desc_nr+1 > max_dev)
+			max_dev = rdev2->desc_nr+1;
 	
 	sb->max_dev = cpu_to_le32(max_dev);
 	for (i=0; i<max_dev;i++)
@@ -998,6 +998,7 @@ static void super_1_sync(mddev_t *mddev,
 	}
 
 	sb->recovery_offset = cpu_to_le64(0); /* not supported yet */
+	sb->sb_csum = calc_sb_1_csum(sb);
 }
 
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync.
  2005-02-07  2:44 [PATCH md 0 of 6] Introduction NeilBrown
  2005-02-07  2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
  2005-02-07  2:44 ` [PATCH md 1 of 6] Fix problems with verion-1 superblock code NeilBrown
@ 2005-02-07  2:44 ` NeilBrown
  2005-02-07  2:44 ` [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array NeilBrown
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2005-02-07  2:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid


If the resync checkpoint for an array is at the end of the array,
It doesn't get set to MAX_SECTOR, so resyncing will be retried.
By updating curr_resync early, this problem is fixed.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-02-07 13:32:22.000000000 +1100
+++ ./drivers/md/md.c	2005-02-07 13:33:59.000000000 +1100
@@ -3475,10 +3475,12 @@ static void md_do_sync(mddev_t *mddev)
 	init_waitqueue_head(&mddev->recovery_wait);
 	last_check = 0;
 
-	if (j)
+	if (j>2) {
 		printk(KERN_INFO 
 			"md: resuming recovery of %s from checkpoint.\n",
 			mdname(mddev));
+		mddev->curr_resync = j;
+	}
 
 	while (j < max_sectors) {
 		int sectors;
@@ -3565,7 +3567,7 @@ static void md_do_sync(mddev_t *mddev)
 
 	if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) &&
 	    mddev->curr_resync > 2 &&
-	    mddev->curr_resync > mddev->recovery_cp) {
+	    mddev->curr_resync >= mddev->recovery_cp) {
 		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
 			printk(KERN_INFO 
 				"md: checkpointing recovery of %s.\n",

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
  2005-02-07  2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
@ 2005-02-08  2:24   ` Andrew Morton
  2005-02-08  2:39     ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2005-02-08  2:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown <neilb@cse.unsw.edu.au> wrote:
>
> +	retry:
>   		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
>   		if (sh) {
>  -
>  -			while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
>  -				/* add failed due to overlap.  Flush everything
>  +			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
>  +				/* Add failed due to overlap.  Flush everything
>   				 * and wait a while
>  -				 * FIXME - overlapping requests should be handled better
>   				 */
>   				raid5_unplug_device(mddev->queue);
>  -				set_current_state(TASK_UNINTERRUPTIBLE);
>  -				schedule_timeout(1);
>  +				release_stripe(sh);
>  +				schedule();
>  +				goto retry;

Worrisome.  If the calling process has SCHED_RR or SCHED_FIFO policy, this
could cause a lockup, perhaps.

Some sort of real synchronisation scheme would be nicer.  Or at the least,
what was wrong with the schedule_timeout(1)?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
  2005-02-08  2:24   ` Andrew Morton
@ 2005-02-08  2:39     ` Neil Brown
  2005-02-08  3:07       ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2005-02-08  2:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid

On Monday February 7, akpm@osdl.org wrote:
> NeilBrown <neilb@cse.unsw.edu.au> wrote:
> >
> > +	retry:
> >   		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> >   		if (sh) {
> >  -
> >  -			while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> >  -				/* add failed due to overlap.  Flush everything
> >  +			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> >  +				/* Add failed due to overlap.  Flush everything
> >   				 * and wait a while
> >  -				 * FIXME - overlapping requests should be handled better
> >   				 */
> >   				raid5_unplug_device(mddev->queue);
> >  -				set_current_state(TASK_UNINTERRUPTIBLE);
> >  -				schedule_timeout(1);
> >  +				release_stripe(sh);
> >  +				schedule();
> >  +				goto retry;
> 
> Worrisome.  If the calling process has SCHED_RR or SCHED_FIFO policy, this
> could cause a lockup, perhaps.

Is that just that I should have the "prepare_to_wait" after the retry:
label rather than before, or is there something more subtle.

> 
> Some sort of real synchronisation scheme would be nicer.  Or at the least,
> what was wrong with the schedule_timeout(1)?

schedule_timeout shouldn't be necessary as whenever the conflicting
bio gets dealt with, a wakeup will happen on the conf->wait_for_overlap wait_queue.

(schedule_timeout(1) by itself slowed things down too much. The
proactive waking was needed, and with it in place, the timeout becomes
irrelevant).

NeilBrown


--
md: prepare_to_wait needs to be in the loop which waits for overlapping requests to complete

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/raid5.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2005-02-07 13:34:23.000000000 +1100
+++ ./drivers/md/raid5.c	2005-02-08 13:34:38.000000000 +1100
@@ -1433,9 +1433,9 @@ static int make_request (request_queue_t
 			(unsigned long long)new_sector, 
 			(unsigned long long)logical_sector);
 
+	retry:
 		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
 
-	retry:
 		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
 			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH md 6 of 6] Fix handling of overlapping requests in raid5
  2005-02-08  2:39     ` Neil Brown
@ 2005-02-08  3:07       ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2005-02-08  3:07 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> On Monday February 7, akpm@osdl.org wrote:
> > NeilBrown <neilb@cse.unsw.edu.au> wrote:
> > >
> > > +	retry:
> > >   		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> > >   		if (sh) {
> > >  -
> > >  -			while (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> > >  -				/* add failed due to overlap.  Flush everything
> > >  +			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
> > >  +				/* Add failed due to overlap.  Flush everything
> > >   				 * and wait a while
> > >  -				 * FIXME - overlapping requests should be handled better
> > >   				 */
> > >   				raid5_unplug_device(mddev->queue);
> > >  -				set_current_state(TASK_UNINTERRUPTIBLE);
> > >  -				schedule_timeout(1);
> > >  +				release_stripe(sh);
> > >  +				schedule();
> > >  +				goto retry;
> > 
> > Worrisome.  If the calling process has SCHED_RR or SCHED_FIFO policy, this
> > could cause a lockup, perhaps.
> 
> Is that just that I should have the "prepare_to_wait" after the retry:
> label rather than before, or is there something more subtle.

umm, yes.  We always exit from schedule() in state TASK_RUNNING, so we need
to run prepare_to_wait() each time around.  See __wait_on_bit() for a
canonical example.

> ### Diffstat output
>  ./drivers/md/raid5.c |    2 +-
>  1 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
> --- ./drivers/md/raid5.c~current~	2005-02-07 13:34:23.000000000 +1100
> +++ ./drivers/md/raid5.c	2005-02-08 13:34:38.000000000 +1100
> @@ -1433,9 +1433,9 @@ static int make_request (request_queue_t
>  			(unsigned long long)new_sector, 
>  			(unsigned long long)logical_sector);
>  
> +	retry:
>  		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
>  
> -	retry:
>  		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
>  		if (sh) {
>  			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {

You missed one.

--- 25/drivers/md/raid5.c~md-fix-handling-of-overlapping-requests-in-raid5-fix	2005-02-07 19:04:18.000000000 -0800
+++ 25-akpm/drivers/md/raid5.c	2005-02-07 19:04:48.000000000 -0800
@@ -1433,9 +1433,8 @@ static int make_request (request_queue_t
 			(unsigned long long)new_sector, 
 			(unsigned long long)logical_sector);
 
-		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
-
 	retry:
+		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
 		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
 			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
diff -puN drivers/md/raid6main.c~md-fix-handling-of-overlapping-requests-in-raid5-fix drivers/md/raid6main.c
--- 25/drivers/md/raid6main.c~md-fix-handling-of-overlapping-requests-in-raid5-fix	2005-02-07 19:04:18.000000000 -0800
+++ 25-akpm/drivers/md/raid6main.c	2005-02-07 19:04:54.000000000 -0800
@@ -1593,9 +1593,8 @@ static int make_request (request_queue_t
 		       (unsigned long long)new_sector,
 		       (unsigned long long)logical_sector);
 
-		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
-
 	retry:
+		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
 		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
 			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {

_

(That piece of the code looks pretty fugly in an 80-col xterm btw..)


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-02-08  3:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-07  2:44 [PATCH md 0 of 6] Introduction NeilBrown
2005-02-07  2:44 ` [PATCH md 6 of 6] Fix handling of overlapping requests in raid5 NeilBrown
2005-02-08  2:24   ` Andrew Morton
2005-02-08  2:39     ` Neil Brown
2005-02-08  3:07       ` Andrew Morton
2005-02-07  2:44 ` [PATCH md 1 of 6] Fix problems with verion-1 superblock code NeilBrown
2005-02-07  2:44 ` [PATCH md 4 of 6] Fix endless loop when syncing an array that doesn't need any resync NeilBrown
2005-02-07  2:44 ` [PATCH md 2 of 6] Prevent oops when drive set faulty in inactive md array NeilBrown
2005-02-07  2:44 ` [PATCH md 5 of 6] Remove extra loop from copy_data NeilBrown
2005-02-07  2:44 ` [PATCH md 3 of 6] Make md work a bit better with devfs NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).