linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10}
@ 2007-06-12  1:09 NeilBrown
  2007-06-12  1:09 ` [PATCH 001 of 2] md: Fix two raid10 bugs NeilBrown
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: NeilBrown @ 2007-06-12  1:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Mike Accetta, Neil Brown, stable


Following are a couple of bugfixes for raid10 and raid1.  They only
affect fairly uncommon configurations (more than 2 mirrors) and can
cause data corruption.  Thay are suitable for 2.6.22 and 21-stable.

Thanks,
NeilBrown


 [PATCH 001 of 2] md: Fix two raid10 bugs.
 [PATCH 002 of 2] md: Fix bug in error handling during raid1 repair.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 001 of 2] md: Fix two raid10 bugs.
  2007-06-12  1:09 [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10} NeilBrown
@ 2007-06-12  1:09 ` NeilBrown
  2007-06-12  1:09 ` [PATCH 002 of 2] md: Fix bug in error handling during raid1 repair NeilBrown
  2007-06-12 17:04 ` [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10} Bill Davidsen
  2 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2007-06-12  1:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown, stable


1/ When resyncing a degraded raid10 which has more than 2 copies of each block,
  garbage can get synced on top of good data.

2/ We round the wrong way in part of the device size calculation, which
  can cause confusion.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: stable@kernel.org

### Diffstat output
 ./drivers/md/raid10.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
--- .prev/drivers/md/raid10.c	2007-06-12 10:19:04.000000000 +1000
+++ ./drivers/md/raid10.c	2007-06-12 10:20:31.000000000 +1000
@@ -1866,6 +1866,7 @@ static sector_t sync_request(mddev_t *md
 			int d = r10_bio->devs[i].devnum;
 			bio = r10_bio->devs[i].bio;
 			bio->bi_end_io = NULL;
+			clear_bit(BIO_UPTODATE, &bio->bi_flags);
 			if (conf->mirrors[d].rdev == NULL ||
 			    test_bit(Faulty, &conf->mirrors[d].rdev->flags))
 				continue;
@@ -2036,6 +2037,11 @@ static int run(mddev_t *mddev)
 	/* 'size' is now the number of chunks in the array */
 	/* calculate "used chunks per device" in 'stride' */
 	stride = size * conf->copies;
+
+	/* We need to round up when dividing by raid_disks to
+	 * get the stride size.
+	 */
+	stride += conf->raid_disks - 1;
 	sector_div(stride, conf->raid_disks);
 	mddev->size = stride  << (conf->chunk_shift-1);
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 002 of 2] md: Fix bug in error handling during raid1 repair.
  2007-06-12  1:09 [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10} NeilBrown
  2007-06-12  1:09 ` [PATCH 001 of 2] md: Fix two raid10 bugs NeilBrown
@ 2007-06-12  1:09 ` NeilBrown
  2007-06-12 17:04 ` [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10} Bill Davidsen
  2 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2007-06-12  1:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Mike Accetta, Neil Brown, stable


From: Mike Accetta <maccetta@laurelnetworks.com>

If raid1/repair (which reads all block and fixes any differences
it finds) hits a read error, it doesn't reset the bio for writing
before writing correct data back, so the read error isn't fixed,
and the device probably gets a zero-length write which it might
complain about.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: stable@kernel.org

### Diffstat output
 ./drivers/md/raid1.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c	2007-06-12 10:48:57.000000000 +1000
+++ ./drivers/md/raid1.c	2007-06-12 10:49:05.000000000 +1000
@@ -1240,17 +1240,24 @@ static void sync_request_write(mddev_t *
 			}
 		r1_bio->read_disk = primary;
 		for (i=0; i<mddev->raid_disks; i++)
-			if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
-			    test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) {
+			if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
 				int j;
 				int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
 				struct bio *pbio = r1_bio->bios[primary];
 				struct bio *sbio = r1_bio->bios[i];
-				for (j = vcnt; j-- ; )
-					if (memcmp(page_address(pbio->bi_io_vec[j].bv_page),
-						   page_address(sbio->bi_io_vec[j].bv_page),
-						   PAGE_SIZE))
-						break;
+
+				if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+					for (j = vcnt; j-- ; ) {
+						struct page *p, *s;
+						p = pbio->bi_io_vec[j].bv_page;
+						s = sbio->bi_io_vec[j].bv_page;
+						if (memcmp(page_address(p),
+							   page_address(s),
+							   PAGE_SIZE))
+							break;
+					}
+				} else
+					j = 0;
 				if (j >= 0)
 					mddev->resync_mismatches += r1_bio->sectors;
 				if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10}
  2007-06-12  1:09 [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10} NeilBrown
  2007-06-12  1:09 ` [PATCH 001 of 2] md: Fix two raid10 bugs NeilBrown
  2007-06-12  1:09 ` [PATCH 002 of 2] md: Fix bug in error handling during raid1 repair NeilBrown
@ 2007-06-12 17:04 ` Bill Davidsen
  2 siblings, 0 replies; 4+ messages in thread
From: Bill Davidsen @ 2007-06-12 17:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andrew Morton, linux-raid, linux-kernel, Mike Accetta, stable

NeilBrown wrote:
> Following are a couple of bugfixes for raid10 and raid1.  They only
> affect fairly uncommon configurations (more than 2 mirrors) and can
> cause data corruption.  Thay are suitable for 2.6.22 and 21-stable.
>
> Thanks,
> NeilBrown
>
>
>  [PATCH 001 of 2] md: Fix two raid10 bugs.
>  [PATCH 002 of 2] md: Fix bug in error handling during raid1 repair.

I don't know about uncommon, given that I have six machines in this 
building with three way RAID-1 for the boot partition, to be sure I can 
get off the ground enough to get the other partitions up.

And since you added "write-mostly" for remote mirrors I do have a few 
systems doing >2 mirrors as well. This set of patches definitely will be 
in my kernel by this afternoon.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-06-12 17:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-12  1:09 [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10} NeilBrown
2007-06-12  1:09 ` [PATCH 001 of 2] md: Fix two raid10 bugs NeilBrown
2007-06-12  1:09 ` [PATCH 002 of 2] md: Fix bug in error handling during raid1 repair NeilBrown
2007-06-12 17:04 ` [PATCH 000 of 2] md: Introduction - bugfixes for md/raid{1,10} Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).