[PATCH 000 of 4] md: Various fixes for new cache-bypassing-reads in raid5/6

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 000 of 4] md: Various fixes for new cache-bypassing-reads in raid5/6
@ 2006-11-14  0:22 NeilBrown
  2006-11-14  0:22 ` [PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx NeilBrown
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: NeilBrown @ 2006-11-14  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel

The patches which enable reading from raid5 without going through the
stripe cache have a bug which causes corruption when reading from a
raid6.
It might be appropriate to put out a hotfix for rc5-mm1 which reverts

   md-enable-bypassing-cache-for-reads.patch

The follow 4 patches fix this bug and a few other bugs I found while
re-reviewing and re-testing this code.  There are actually about 9
separate bugs here, but I grouped some of them to avoid having lots
of tiny patches.

NeilBrown

 [PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx
 [PATCH 002 of 4] md: Fix newly introduced read-corruption with raid6.
 [PATCH 003 of 4] md: Misc fixes for aligned-read handling.
 [PATCH 004 of 4] md: Fix a couple more bugs in raid5/6 aligned reads

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx
  2006-11-14  0:22 [PATCH 000 of 4] md: Various fixes for new cache-bypassing-reads in raid5/6 NeilBrown
@ 2006-11-14  0:22 ` NeilBrown
  2006-11-14  0:22 ` [PATCH 002 of 4] md: Fix newly introduced read-corruption with raid6 NeilBrown
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2006-11-14  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


stripe_to_pdidx finds the index of the parity disk for a given
stripe.
It assumes raid5 in that it uses "disks-1" to determine the number
of data disks.

This is incorrect for raid6 but fortunately the two usages cancel each
other out.  The only way that 'data_disks' affects the calculation of
pd_idx in raid5_compute_sector is when it is divided into the
sector number.  But as that sector number is calculated by multiplying
in the wrong value of 'data_disks' the division produces the right
value.

So it is innocuous but needs to be fixed.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-11-14 10:05:00.000000000 +1100
+++ ./drivers/md/raid5.c	2006-11-14 10:33:41.000000000 +1100
@@ -1355,8 +1355,10 @@ static int stripe_to_pdidx(sector_t stri
 	int pd_idx, dd_idx;
 	int chunk_offset = sector_div(stripe, sectors_per_chunk);
 
-	raid5_compute_sector(stripe*(disks-1)*sectors_per_chunk
-			     + chunk_offset, disks, disks-1, &dd_idx, &pd_idx, conf);
+	raid5_compute_sector(stripe * (disks - conf->max_degraded)
+			     *sectors_per_chunk + chunk_offset,
+			     disks, disks - conf->max_degraded,
+			     &dd_idx, &pd_idx, conf);
 	return pd_idx;
 }
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 002 of 4] md: Fix newly introduced read-corruption with raid6.
  2006-11-14  0:22 [PATCH 000 of 4] md: Various fixes for new cache-bypassing-reads in raid5/6 NeilBrown
  2006-11-14  0:22 ` [PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx NeilBrown
@ 2006-11-14  0:22 ` NeilBrown
  2006-11-14  0:22 ` [PATCH 003 of 4] md: Misc fixes for aligned-read handling NeilBrown
  2006-11-14  0:22 ` [PATCH 004 of 4] md: Fix a couple more bugs in raid5/6 aligned reads NeilBrown
  3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2006-11-14  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


chunk_aligned_read and retry_aligned_read assume that
    data_disks == raid_disks - 1
which is not true for raid6.
So when an aligned read request bypasses the cache, we can get the wrong data.

Also change the calculate of raid_disks in compute_blocknr to make it
more obviously correct.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-11-14 10:33:41.000000000 +1100
+++ ./drivers/md/raid5.c	2006-11-14 10:34:17.000000000 +1100
@@ -823,7 +823,8 @@ static sector_t raid5_compute_sector(sec
 static sector_t compute_blocknr(struct stripe_head *sh, int i)
 {
 	raid5_conf_t *conf = sh->raid_conf;
-	int raid_disks = sh->disks, data_disks = raid_disks - 1;
+	int raid_disks = sh->disks;
+	int data_disks = raid_disks - conf->max_degraded;
 	sector_t new_sector = sh->sector, check;
 	int sectors_per_chunk = conf->chunk_size >> 9;
 	sector_t stripe;
@@ -859,7 +860,6 @@ static sector_t compute_blocknr(struct s
 		}
 		break;
 	case 6:
-		data_disks = raid_disks - 2;
 		if (i == raid6_next_disk(sh->pd_idx, raid_disks))
 			return 0; /* It is the Q disk */
 		switch (conf->algorithm) {
@@ -2725,7 +2725,7 @@ static int chunk_aligned_read(request_qu
 	mddev_t *mddev = q->queuedata;
 	raid5_conf_t *conf = mddev_to_conf(mddev);
 	const unsigned int raid_disks = conf->raid_disks;
-	const unsigned int data_disks = raid_disks - 1;
+	const unsigned int data_disks = raid_disks - conf->max_degraded;
 	unsigned int dd_idx, pd_idx;
 	struct bio* align_bi;
 	mdk_rdev_t *rdev;
@@ -3145,7 +3145,7 @@ static int  retry_aligned_read(raid5_con
 	logical_sector = raid_bio->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
 	sector = raid5_compute_sector(	logical_sector,
 					conf->raid_disks,
-					conf->raid_disks-1,
+					conf->raid_disks - conf->max_degraded,
 					&dd_idx,
 					&pd_idx,
 					conf);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 003 of 4] md: Misc fixes for aligned-read handling.
  2006-11-14  0:22 [PATCH 000 of 4] md: Various fixes for new cache-bypassing-reads in raid5/6 NeilBrown
  2006-11-14  0:22 ` [PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx NeilBrown
  2006-11-14  0:22 ` [PATCH 002 of 4] md: Fix newly introduced read-corruption with raid6 NeilBrown
@ 2006-11-14  0:22 ` NeilBrown
  2006-11-14  0:22 ` [PATCH 004 of 4] md: Fix a couple more bugs in raid5/6 aligned reads NeilBrown
  3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2006-11-14  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


1/ When aligned requests fail (read error) they need to be retried 
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to 
   the list, not to single-bio...

2/ The cloned bio is being used-after-free (to test BIO_UPTODATE).

3/ We forgot to add rdev->data_offset when submitting
   a bio for aligned-read
4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-11-14 10:34:17.000000000 +1100
+++ ./drivers/md/raid5.c	2006-11-14 10:34:33.000000000 +1100
@@ -2658,8 +2658,8 @@ static void add_bio_to_retry(struct bio 
 
 	spin_lock_irqsave(&conf->device_lock, flags);
 
-	bi->bi_next = conf->retry_read_aligned;
-	conf->retry_read_aligned = bi;
+	bi->bi_next = conf->retry_read_aligned_list;
+	conf->retry_read_aligned_list = bi;
 
 	spin_unlock_irqrestore(&conf->device_lock, flags);
 	md_wakeup_thread(conf->mddev->thread);
@@ -2698,6 +2698,7 @@ static int raid5_align_endio(struct bio 
 	struct bio* raid_bi  = bi->bi_private;
 	mddev_t *mddev;
 	raid5_conf_t *conf;
+	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
 
 	if (bi->bi_size)
 		return 1;
@@ -2706,7 +2707,7 @@ static int raid5_align_endio(struct bio 
 	mddev = raid_bi->bi_bdev->bd_disk->queue->queuedata;
 	conf = mddev_to_conf(mddev);
 
-	if (!error && test_bit(BIO_UPTODATE, &bi->bi_flags)) {
+	if (!error && uptodate) {
 		bio_endio(raid_bi, bytes, 0);
 		if (atomic_dec_and_test(&conf->active_aligned_reads))
 			wake_up(&conf->wait_for_stripe);
@@ -2759,9 +2760,11 @@ static int chunk_aligned_read(request_qu
 	rcu_read_lock();
 	rdev = rcu_dereference(conf->disks[dd_idx].rdev);
 	if (rdev && test_bit(In_sync, &rdev->flags)) {
-		align_bi->bi_bdev =  rdev->bdev;
 		atomic_inc(&rdev->nr_pending);
 		rcu_read_unlock();
+		align_bi->bi_bdev =  rdev->bdev;
+		align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);
+		align_bi->bi_sector += rdev->data_offset;
 
 		spin_lock_irq(&conf->device_lock);
 		wait_event_lock_irq(conf->wait_for_stripe,
@@ -3151,7 +3154,8 @@ static int  retry_aligned_read(raid5_con
 					conf);
 	last_sector = raid_bio->bi_sector + (raid_bio->bi_size>>9);
 
-	for (; logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
+	for (; logical_sector < last_sector;
+	     logical_sector += STRIPE_SECTORS, scnt++) {
 
 		if (scnt < raid_bio->bi_hw_segments)
 			/* already done this stripe */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 004 of 4] md: Fix a couple more bugs in raid5/6 aligned reads
  2006-11-14  0:22 [PATCH 000 of 4] md: Various fixes for new cache-bypassing-reads in raid5/6 NeilBrown
                   ` (2 preceding siblings ...)
  2006-11-14  0:22 ` [PATCH 003 of 4] md: Misc fixes for aligned-read handling NeilBrown
@ 2006-11-14  0:22 ` NeilBrown
  3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2006-11-14  0:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


1/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

2/ We leak a cloned by if the target rdev is not usasble.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-11-14 11:00:51.000000000 +1100
+++ ./drivers/md/raid5.c	2006-11-14 11:06:44.000000000 +1100
@@ -2699,6 +2699,7 @@ static int raid5_align_endio(struct bio 
 	mddev_t *mddev;
 	raid5_conf_t *conf;
 	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
+	mdk_rdev_t *rdev;
 
 	if (bi->bi_size)
 		return 1;
@@ -2706,6 +2707,10 @@ static int raid5_align_endio(struct bio 
 
 	mddev = raid_bi->bi_bdev->bd_disk->queue->queuedata;
 	conf = mddev_to_conf(mddev);
+	rdev = (void*)raid_bi->bi_next;
+	raid_bi->bi_next = NULL;
+
+	rdev_dec_pending(rdev, conf->mddev);
 
 	if (!error && uptodate) {
 		bio_endio(raid_bi, bytes, 0);
@@ -2762,6 +2767,7 @@ static int chunk_aligned_read(request_qu
 	if (rdev && test_bit(In_sync, &rdev->flags)) {
 		atomic_inc(&rdev->nr_pending);
 		rcu_read_unlock();
+		raid_bio->bi_next = (void*)rdev;
 		align_bi->bi_bdev =  rdev->bdev;
 		align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);
 		align_bi->bi_sector += rdev->data_offset;
@@ -2777,6 +2783,7 @@ static int chunk_aligned_read(request_qu
 		return 1;
 	} else {
 		rcu_read_unlock();
+		bio_put(align_bi);
 		return 0;
 	}
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-11-14  0:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-14  0:22 [PATCH 000 of 4] md: Various fixes for new cache-bypassing-reads in raid5/6 NeilBrown
2006-11-14  0:22 ` [PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx NeilBrown
2006-11-14  0:22 ` [PATCH 002 of 4] md: Fix newly introduced read-corruption with raid6 NeilBrown
2006-11-14  0:22 ` [PATCH 003 of 4] md: Misc fixes for aligned-read handling NeilBrown
2006-11-14  0:22 ` [PATCH 004 of 4] md: Fix a couple more bugs in raid5/6 aligned reads NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).