RAID1 repair GPF crash w/3.10-rc7

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID1 repair GPF crash w/3.10-rc7
@ 2013-07-02  1:20 Joe Lawrence
  2013-07-03 21:49 ` Joe Lawrence
  0 siblings, 1 reply; 10+ messages in thread
From: Joe Lawrence @ 2013-07-02  1:20 UTC (permalink / raw)
  To: linux-raid; +Cc: Kent Overstreet, NeilBrown

Hi Kent & Neil,

I've hit a crash in MD during RAID1 repair while running 3.10-rc7:

CPU: 1 PID: 987 Comm: md124_raid1 Tainted: GF          O 3.10.0-rc7.sra+ #2
Hardware name: Stratus ftServer 2700/G7LAY, BIOS BIOS Version 6.2:52 04/09/2013
task: ffff880164b818a0 ti: ffff880161b1c000 task.ti: ffff880161b1c000
RIP: 0010:[<ffffffff812fd6dd>]  [<ffffffff812fd6dd>] memcpy+0xd/0x110
RSP: 0018:ffff880161b1dcd0  EFLAGS: 00010246
RAX: ffff880079e69000 RBX: ffff880161b1c000 RCX: 0000000000000200
RDX: 0000000000000000 RSI: dadfe2db46463b6b RDI: ffff880079e69000
RBP: ffff880161b1dd28 R08: 00000000000000ff R09: 0000000000000009
R10: 0000000000001000 R11: 000000000000000a R12: 0000000000000000
R13: ffff880154105040 R14: 000000006b6b6b6b R15: ffff8801541048b0
FS:  0000000000000000(0000) GS:ffff88017b220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe775086738 CR3: 00000001764b8000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffff811c833a ffff880100001000 ffff880154104ec0 ffff880161b1dfd8
 ffff880154104830 ffff8801606e9598 ffff880154104830 ffff88015dd9b6d8
 ffff880174df8000 0000000000000001 ffff8801508b3738 ffff880161b1de38
Call Trace:
 [<ffffffff811c833a>] ? bio_copy_data+0x12a/0x1c0
 [<ffffffffa004535e>] raid1d+0xa8e/0xe90 [raid1]
 [<ffffffff810737cb>] ? prepare_to_wait+0x5b/0x90
 [<ffffffff814c435d>] md_thread+0x11d/0x170
 [<ffffffff81073600>] ? wake_up_bit+0x40/0x40
 [<ffffffff814c4240>] ? md_rdev_init+0x110/0x110
 [<ffffffff81073080>] kthread+0xc0/0xd0
 [<ffffffff81072fc0>] ? flush_kthread_worker+0x80/0x80
 [<ffffffff81655d9c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81072fc0>] ? flush_kthread_worker+0x80/0x80
Code: 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8 eb fb ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 
RIP  [<ffffffff812fd6dd>] memcpy+0xd/0x110
 RSP <ffff880161b1dcd0>

Crash analysis showed the struct bio *src bi_idx was set to bi_vcnt when
the GPF occurred.  Further instrumentation of repeat incidents revealed
bi_idx was set to bi_vnct on bio_copy_data() entry.  bio_copy_data() may
handle copying from the middle of the bi_io_vec[], but it didn't cope
well when the initial bio_iovec() call pointed past the end of
bi_io_vec[].

Commit d3b45c2 "raid1: use bio_copy_data()" introduced a slight change
of behavior in process_checks(), which had been copying the entire
bi_io_vec[], regardless of the source bio bi_idx value.  Reverting this
commit appeased the MD RAID1 repair (no crashes and no mismatch_cnt
after re-checking), but did not answer the question of why
bio_copy_data() was called with a source bio bi_idx == bi_vcnt.

A few trips through git bisect found process_checks() only started
passing such bios in commit f79ea416 "block: Refactor
blk_update_request()". 

This leads me to believe that MD didn't expect the source bio bi_idx to
have moved before it called bio_copy_data().  In the commit "block:
Refactor blk_update_request()", bio_advance() increments bi_idx if
!BIO_NO_ADVANCE_ITER_MASK, but I'm not sure if that's a clue or not...
FWIW, bi_rw is set to 0.

Any ideas guys?  This is very repeatable (just create a new MD RAID1,
force data differences to both sides and kick off a repair).

Regards,

-- Joe   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID1 repair GPF crash w/3.10-rc7
  2013-07-02  1:20 RAID1 repair GPF crash w/3.10-rc7 Joe Lawrence
@ 2013-07-03 21:49 ` Joe Lawrence
  2013-07-04  0:53   ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Joe Lawrence @ 2013-07-03 21:49 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-raid, NeilBrown

On Mon, 1 Jul 2013, Joe Lawrence wrote:

> Hi Kent & Neil,
> 
> I've hit a crash in MD during RAID1 repair while running 3.10-rc7:
>
> [ ... snip ... ]

Hi Neil,

Looking through the MD source, I'm trying to understand part of the
RAID1 repair path.  I came up with a few questions:

1 - During user initiated RAID1 repair, is the loop at the bottom of
sync_request(), under the bio_full label, responsible for submitting all
of the initial read bios?

2 - Does process_checks() later find the first uptodate read bio and
copy its data into the other r1_bio->bios[] for write repair to the
other disks?

If both are true, then perhaps the following applies to this crash...

Comments in commit f79ea416 "block: Refactor blk_update_request()" msg
include:

    Note that req_bio_endio() now always calls bio_advance() - which
    means it always loops over the biovec, not just on partial
    completions.  Don't expect it to affect performance, but worth
    noting.

Now that process_checks() has been further modified for immutable bio
prep (commit d3b45c2 "raid1: use bio_copy_data()"), it calls
bio_copy_data() to fill in the write repair bios... which starts
indexing the bi_bio_vec[] from wherever bi_idx happens to be.

If this is indeed the case, I'm having trouble coming up with a good
solution:

  - Immutable bios means drivers don't touch bi_idx.  So MD shouldn't
    "re-wind" the source bi_idx before calling bio_copy_data().

  - bio_copy_data() could copy the entire source bi_bio_vec[], as MD had
    done in the past, but that is that safe?  (ie, can we map bio
    vectors once they have been iterated over?)

Thanks,

-- Joe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID1 repair GPF crash w/3.10-rc7
  2013-07-03 21:49 ` Joe Lawrence
@ 2013-07-04  0:53   ` NeilBrown
  2013-07-08 20:06     ` Joe Lawrence
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2013-07-04  0:53 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2718 bytes --]

On Wed, 3 Jul 2013 17:49:51 -0400 (EDT) Joe Lawrence
<joe.lawrence@stratus.com> wrote:

> On Mon, 1 Jul 2013, Joe Lawrence wrote:
> 
> > Hi Kent & Neil,
> > 
> > I've hit a crash in MD during RAID1 repair while running 3.10-rc7:
> >
> > [ ... snip ... ]
> 
> Hi Neil,
> 
> Looking through the MD source, I'm trying to understand part of the
> RAID1 repair path.  I came up with a few questions:
> 
> 1 - During user initiated RAID1 repair, is the loop at the bottom of
> sync_request(), under the bio_full label, responsible for submitting all
> of the initial read bios?

Yes.

> 
> 2 - Does process_checks() later find the first uptodate read bio and
> copy its data into the other r1_bio->bios[] for write repair to the
> other disks?

Yes.

> 
> If both are true, then perhaps the following applies to this crash...
> 
> Comments in commit f79ea416 "block: Refactor blk_update_request()" msg
> include:
> 
>     Note that req_bio_endio() now always calls bio_advance() - which
>     means it always loops over the biovec, not just on partial
>     completions.  Don't expect it to affect performance, but worth
>     noting.
> 
> Now that process_checks() has been further modified for immutable bio
> prep (commit d3b45c2 "raid1: use bio_copy_data()"), it calls
> bio_copy_data() to fill in the write repair bios... which starts
> indexing the bi_bio_vec[] from wherever bi_idx happens to be.
> 
> If this is indeed the case, I'm having trouble coming up with a good
> solution:
> 
>   - Immutable bios means drivers don't touch bi_idx.  So MD shouldn't
>     "re-wind" the source bi_idx before calling bio_copy_data().

But MD "owns" this bio.  It knows exactly how it was created, and can do
what ever it likes after it has been returned.

I would propose "bio_rewind" that exactly undoes any "bio_advance". 
Something like

void bio_rewind(struct bio *bio)
{
	int bytes = 0;

	if (bio->bi_idx < bio->bi_vcnt && bio_iovec(bio)->bv_offset > 0) {
		bytes = bio_iovec(bio)->bv_offset;
		bio_iovec(bio)->bv_offset -= bytes;
		bio_iovec(bio)->bv_len += bytes;
	}
	while (bio->bi_idx) {
		bio->bi_idx -= 1;
		bio_iovec(bio)->bv_len += bio_iovec(bio)->bv_offset;
		bio_iovec(bio)->bv_offset = 0;
		bytes += bio_iovec(bio)->bv_len;
	}
	bio->bi_size += bytes;
	bio->bi_bi_sector -= bytes >> 9;
}

Then call that on pbio at the same place we call bio_reset on sbio.

You could probably also call bio_rewind on sbio, and remove lots of that
code for setting the bio up again.
(or don't bother with bio_rewind, and use the same big lump of code on both
pbio and sbio).

Could you confirm that one of those works?

Thanks.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID1 repair GPF crash w/3.10-rc7
  2013-07-04  0:53   ` NeilBrown
@ 2013-07-08 20:06     ` Joe Lawrence
  2013-07-08 20:25       ` [PATCH 1/2] block: add bio_rewind() to reset bio_vec Joe Lawrence
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Joe Lawrence @ 2013-07-08 20:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, kmo

[+cc Kent's new mail?]

On Thu, 4 Jul 2013 10:53:37 +1000
NeilBrown <neilb@suse.de> wrote:
> 
> I would propose "bio_rewind" that exactly undoes any "bio_advance". 
>
> [ ... snip ...]
> 
> Then call that on pbio at the same place we call bio_reset on sbio.
> 
> You could probably also call bio_rewind on sbio, and remove lots of that
> code for setting the bio up again.

This appears to work for my test case (no crashes or post repair
mismatch_cnt):

  mdadm --fail /dev/$MD /dev/sda3
  mdadm --remove /dev/$MD /dev/sda3
  dd if=/dev/urandom of=/dev/$MD bs=1M count=500
  mdadm --stop /dev/$MD

  mdadm --create /dev/$MD --level=1 --assume-clean --raid-devices=2 \
  	--bitmap=internal /dev/sda3 /dev/sdi3
  echo repair > /sys/block/$MD/md/sync_action

I'll reply to this email with the patches that implement your
suggested changes.  Feel free to combine or redo them, posting them here
was the easiest way to provide my signed-off should you need it.

Although it works, having to rewind the bio io_vec index (and
clearing the bi_next pointers) before calling bio_copy_data feels a
bit clunky.  What bio_copy_data is really doing is
"bio_copy_remaining_data in a bio chain," whereas MD wants 
"bio_copy_completed_data from a single bio".

I took a look at Kent's tree and a lot of the block layer handling was
simplified through the bvec_iter.  I don't know if that code is
destined for 3.11.  If so, it would probably be easier to retest and
base any MD changes off of his.  And for 3.10 stable, the minimal
fix would be for MD to just manipulate the bi_idx itself (or revert
calling bio_copy_data altogether.)

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] block: add bio_rewind() to reset bio_vec
  2013-07-08 20:06     ` Joe Lawrence
@ 2013-07-08 20:25       ` Joe Lawrence
  2013-07-14 23:40         ` NeilBrown
  2013-07-08 20:25       ` [PATCH 2/2] md: raid1: use bio_rewind() before bio_copy_data() Joe Lawrence
  2013-07-17  6:12       ` RAID1 repair GPF crash w/3.10-rc7 NeilBrown
  2 siblings, 1 reply; 10+ messages in thread
From: Joe Lawrence @ 2013-07-08 20:25 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown, kmo, Joe Lawrence

Provide a mechanism for drivers to "rewind" a bio, essentially undoing
all bio_advance() calls on the bio. After the rewind, the bio idx will
be 0, size and sector reset, and each bio_vec len and offset restored to
their initial values.

Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 fs/bio.c            | 27 +++++++++++++++++++++++++++
 include/linux/bio.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..04309df 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -833,6 +833,33 @@ void bio_advance(struct bio *bio, unsigned bytes)
 EXPORT_SYMBOL(bio_advance);
 
 /**
+ * bio_rewind - reset a bio to its start
+ * @bio:        bio to rewind
+ *
+ * This resets bi_sector, bi_size and bi_idx; completely undoing all
+ * bio_advances on the @bio.
+ */
+void bio_rewind(struct bio *bio)
+{
+	int bytes = 0;
+
+	if (bio->bi_idx < bio->bi_vcnt && bio_iovec(bio)->bv_offset > 0) {
+		bytes = bio_iovec(bio)->bv_offset;
+		bio_iovec(bio)->bv_offset -= bytes;
+		bio_iovec(bio)->bv_len += bytes;
+	}
+	while (bio->bi_idx) {
+		bio->bi_idx -= 1;
+		bio_iovec(bio)->bv_len += bio_iovec(bio)->bv_offset;
+		bio_iovec(bio)->bv_offset = 0;
+		bytes += bio_iovec(bio)->bv_len;
+	}
+	bio->bi_size += bytes;
+	bio->bi_sector -= bytes >> 9;
+}
+EXPORT_SYMBOL(bio_rewind);
+
+/**
  * bio_alloc_pages - allocates a single page for each bvec in a bio
  * @bio: bio to allocate pages for
  * @gfp_mask: flags for allocation
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ef24466..e2082fb 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -258,6 +258,7 @@ extern int bio_phys_segments(struct request_queue *, struct bio *);
 
 extern int submit_bio_wait(int rw, struct bio *bio);
 extern void bio_advance(struct bio *, unsigned);
+extern void bio_rewind(struct bio *);
 
 extern void bio_init(struct bio *);
 extern void bio_reset(struct bio *);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] md: raid1: use bio_rewind() before bio_copy_data()
  2013-07-08 20:06     ` Joe Lawrence
  2013-07-08 20:25       ` [PATCH 1/2] block: add bio_rewind() to reset bio_vec Joe Lawrence
@ 2013-07-08 20:25       ` Joe Lawrence
  2013-07-09  4:33         ` NeilBrown
  2013-07-17  6:12       ` RAID1 repair GPF crash w/3.10-rc7 NeilBrown
  2 siblings, 1 reply; 10+ messages in thread
From: Joe Lawrence @ 2013-07-08 20:25 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown, kmo, Joe Lawrence

bio_copy_data() cannot execute on a fully completed source (or
destination) bio.  MD RAID1 is recycling a completed read bio for RAID
repair purposes and needs to fix up the source and destination bios for
bio_copy_data(): in particular, the io-vector index needs to be rewound
and the bios unchained.

Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/md/raid1.c | 28 +++++-----------------------
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6e17f81..bb1e80f 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1861,7 +1861,6 @@ static int process_checks(struct r1bio *r1_bio)
 		int j;
 		struct bio *pbio = r1_bio->bios[primary];
 		struct bio *sbio = r1_bio->bios[i];
-		int size;
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
@@ -1887,28 +1886,11 @@ static int process_checks(struct r1bio *r1_bio)
 			rdev_dec_pending(conf->mirrors[i].rdev, mddev);
 			continue;
 		}
-		/* fixup the bio for reuse */
-		bio_reset(sbio);
-		sbio->bi_vcnt = vcnt;
-		sbio->bi_size = r1_bio->sectors << 9;
-		sbio->bi_sector = r1_bio->sector +
-			conf->mirrors[i].rdev->data_offset;
-		sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
-		sbio->bi_end_io = end_sync_read;
-		sbio->bi_private = r1_bio;
-
-		size = sbio->bi_size;
-		for (j = 0; j < vcnt ; j++) {
-			struct bio_vec *bi;
-			bi = &sbio->bi_io_vec[j];
-			bi->bv_offset = 0;
-			if (size > PAGE_SIZE)
-				bi->bv_len = PAGE_SIZE;
-			else
-				bi->bv_len = size;
-			size -= PAGE_SIZE;
-		}
-
+		/* copy pbio io_vec data into sbio, but first reset io_vec
+		 * index and unchain both */
+		bio_rewind(pbio);
+		bio_rewind(sbio);
+		pbio->bi_next = sbio->bi_next = NULL;
 		bio_copy_data(sbio, pbio);
 	}
 	return 0;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] md: raid1: use bio_rewind() before bio_copy_data()
  2013-07-08 20:25       ` [PATCH 2/2] md: raid1: use bio_rewind() before bio_copy_data() Joe Lawrence
@ 2013-07-09  4:33         ` NeilBrown
  0 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2013-07-09  4:33 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-raid, kmo

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

On Mon,  8 Jul 2013 16:25:14 -0400 Joe Lawrence <joe.lawrence@stratus.com>
wrote:

> bio_copy_data() cannot execute on a fully completed source (or
> destination) bio.  MD RAID1 is recycling a completed read bio for RAID
> repair purposes and needs to fix up the source and destination bios for
> bio_copy_data(): in particular, the io-vector index needs to be rewound
> and the bios unchained.
> 
> Suggested-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>

Thanks Joe.

Jens:  are you happy with this?  If so is it OK for me to forward to Linus
and -stable for 3.10, or would you rather?
The follow-on patch which uses the so avoid a crash is 

http://marc.info/?l=linux-raid&m=137331514131503&w=2

(This is sitting near the top of "git://neil.brown.name/md for-next" if you
want to cherry-pick it).

Thanks,
NeilBrown

> ---
>  drivers/md/raid1.c | 28 +++++-----------------------
>  1 file changed, 5 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 6e17f81..bb1e80f 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1861,7 +1861,6 @@ static int process_checks(struct r1bio *r1_bio)
>  		int j;
>  		struct bio *pbio = r1_bio->bios[primary];
>  		struct bio *sbio = r1_bio->bios[i];
> -		int size;
>  
>  		if (sbio->bi_end_io != end_sync_read)
>  			continue;
> @@ -1887,28 +1886,11 @@ static int process_checks(struct r1bio *r1_bio)
>  			rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>  			continue;
>  		}
> -		/* fixup the bio for reuse */
> -		bio_reset(sbio);
> -		sbio->bi_vcnt = vcnt;
> -		sbio->bi_size = r1_bio->sectors << 9;
> -		sbio->bi_sector = r1_bio->sector +
> -			conf->mirrors[i].rdev->data_offset;
> -		sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
> -		sbio->bi_end_io = end_sync_read;
> -		sbio->bi_private = r1_bio;
> -
> -		size = sbio->bi_size;
> -		for (j = 0; j < vcnt ; j++) {
> -			struct bio_vec *bi;
> -			bi = &sbio->bi_io_vec[j];
> -			bi->bv_offset = 0;
> -			if (size > PAGE_SIZE)
> -				bi->bv_len = PAGE_SIZE;
> -			else
> -				bi->bv_len = size;
> -			size -= PAGE_SIZE;
> -		}
> -
> +		/* copy pbio io_vec data into sbio, but first reset io_vec
> +		 * index and unchain both */
> +		bio_rewind(pbio);
> +		bio_rewind(sbio);
> +		pbio->bi_next = sbio->bi_next = NULL;
>  		bio_copy_data(sbio, pbio);
>  	}
>  	return 0;


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] block: add bio_rewind() to reset bio_vec
  2013-07-08 20:25       ` [PATCH 1/2] block: add bio_rewind() to reset bio_vec Joe Lawrence
@ 2013-07-14 23:40         ` NeilBrown
  0 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2013-07-14 23:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Joe Lawrence, linux-raid, kmo

[-- Attachment #1: Type: text/plain, Size: 2745 bytes --]


Hi Jens,
 (I tried sending this a week ago but managed to not include your email - no
  wonder I didn't get a reply!)

 Are you OK with adding this to fs/bio.c?
 The follow-on patch which uses it to fix a crash in md in 3.10 is at 
    http://marc.info/?l=linux-raid&m=137331514131503&w=2

 If so: will you submit or should I?  Both patches are in my "for-next" and
 so should be in linux-next.

 If not, I'll just do something local in raid1 to reset the bio before
 calling bio_copy_data().

Thanks,
NeilBrown


On Mon,  8 Jul 2013 16:25:13 -0400 Joe Lawrence <joe.lawrence@stratus.com>
wrote:

> Provide a mechanism for drivers to "rewind" a bio, essentially undoing
> all bio_advance() calls on the bio. After the rewind, the bio idx will
> be 0, size and sector reset, and each bio_vec len and offset restored to
> their initial values.
> 
> Suggested-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>  fs/bio.c            | 27 +++++++++++++++++++++++++++
>  include/linux/bio.h |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/fs/bio.c b/fs/bio.c
> index 94bbc04..04309df 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -833,6 +833,33 @@ void bio_advance(struct bio *bio, unsigned bytes)
>  EXPORT_SYMBOL(bio_advance);
>  
>  /**
> + * bio_rewind - reset a bio to its start
> + * @bio:        bio to rewind
> + *
> + * This resets bi_sector, bi_size and bi_idx; completely undoing all
> + * bio_advances on the @bio.
> + */
> +void bio_rewind(struct bio *bio)
> +{
> +	int bytes = 0;
> +
> +	if (bio->bi_idx < bio->bi_vcnt && bio_iovec(bio)->bv_offset > 0) {
> +		bytes = bio_iovec(bio)->bv_offset;
> +		bio_iovec(bio)->bv_offset -= bytes;
> +		bio_iovec(bio)->bv_len += bytes;
> +	}
> +	while (bio->bi_idx) {
> +		bio->bi_idx -= 1;
> +		bio_iovec(bio)->bv_len += bio_iovec(bio)->bv_offset;
> +		bio_iovec(bio)->bv_offset = 0;
> +		bytes += bio_iovec(bio)->bv_len;
> +	}
> +	bio->bi_size += bytes;
> +	bio->bi_sector -= bytes >> 9;
> +}
> +EXPORT_SYMBOL(bio_rewind);
> +
> +/**
>   * bio_alloc_pages - allocates a single page for each bvec in a bio
>   * @bio: bio to allocate pages for
>   * @gfp_mask: flags for allocation
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index ef24466..e2082fb 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -258,6 +258,7 @@ extern int bio_phys_segments(struct request_queue *, struct bio *);
>  
>  extern int submit_bio_wait(int rw, struct bio *bio);
>  extern void bio_advance(struct bio *, unsigned);
> +extern void bio_rewind(struct bio *);
>  
>  extern void bio_init(struct bio *);
>  extern void bio_reset(struct bio *);


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RAID1 repair GPF crash w/3.10-rc7
  2013-07-08 20:06     ` Joe Lawrence
  2013-07-08 20:25       ` [PATCH 1/2] block: add bio_rewind() to reset bio_vec Joe Lawrence
  2013-07-08 20:25       ` [PATCH 2/2] md: raid1: use bio_rewind() before bio_copy_data() Joe Lawrence
@ 2013-07-17  6:12       ` NeilBrown
  2013-07-17 21:01         ` Joe Lawrence
  2 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2013-07-17  6:12 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-raid, kmo

[-- Attachment #1: Type: text/plain, Size: 5599 bytes --]

On Mon, 8 Jul 2013 16:06:56 -0400 Joe Lawrence <joe.lawrence@stratus.com>
wrote:

> [+cc Kent's new mail?]
> 
> On Thu, 4 Jul 2013 10:53:37 +1000
> NeilBrown <neilb@suse.de> wrote:
> > 
> > I would propose "bio_rewind" that exactly undoes any "bio_advance". 
> >
> > [ ... snip ...]
> > 
> > Then call that on pbio at the same place we call bio_reset on sbio.
> > 
> > You could probably also call bio_rewind on sbio, and remove lots of that
> > code for setting the bio up again.
> 
> This appears to work for my test case (no crashes or post repair
> mismatch_cnt):
> 
>   mdadm --fail /dev/$MD /dev/sda3
>   mdadm --remove /dev/$MD /dev/sda3
>   dd if=/dev/urandom of=/dev/$MD bs=1M count=500
>   mdadm --stop /dev/$MD
> 
>   mdadm --create /dev/$MD --level=1 --assume-clean --raid-devices=2 \
>   	--bitmap=internal /dev/sda3 /dev/sdi3
>   echo repair > /sys/block/$MD/md/sync_action
> 
> I'll reply to this email with the patches that implement your
> suggested changes.  Feel free to combine or redo them, posting them here
> was the easiest way to provide my signed-off should you need it.
> 
> Although it works, having to rewind the bio io_vec index (and
> clearing the bi_next pointers) before calling bio_copy_data feels a
> bit clunky.  What bio_copy_data is really doing is
> "bio_copy_remaining_data in a bio chain," whereas MD wants 
> "bio_copy_completed_data from a single bio".
> 
> I took a look at Kent's tree and a lot of the block layer handling was
> simplified through the bvec_iter.  I don't know if that code is
> destined for 3.11.  If so, it would probably be easier to retest and
> base any MD changes off of his.  And for 3.10 stable, the minimal
> fix would be for MD to just manipulate the bi_idx itself (or revert
> calling bio_copy_data altogether.)
> 

Hi,
 just letting you know I've decided to fix this differently for 3.11 and
 3.10.y.

I noticed there was another problem in that code, in that the len argument
passed to memcmp was bv_len which might have been modified.

So I moved the "fix up the bio" code to the top of the function and applied
it to all relevant bios.

I'd still like to go with bio_rewind or similar, but it's probably best for
that to wait for the other bio stuff to settle.  Maybe it won't be needed.

NeilBrown


commit 9a3856f56b467425e13a19d95524524e76eab040
Author: NeilBrown <neilb@suse.de>
Date:   Wed Jul 17 15:19:29 2013 +1000

    md/raid1: fix bio handling problems in process_checks()
    
    Recent change to use bio_copy_data() in raid1 when repairing
    an array is faulty.
    
    The underlying may have changed the bio in various ways using
    bio_advance and these need to be undone not just for the 'sbio' which
    is being copied to, but also the 'pbio' (primary) which is being
    copied from.
    
    So perform the reset on all bios that were read from and do it early.
    
    This also ensure that the sbio->bi_io_vec[j].bv_len passed to
    memcmp is correct.
    
    Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
    Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index ec73458..d60412c 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1849,6 +1849,36 @@ static int process_checks(struct r1bio *r1_bio)
 	int i;
 	int vcnt;
 
+	/* Fix variable parts of all bios */
+	vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
+	for (i = 0; i < conf->raid_disks * 2; i++) {
+		int j;
+		int size;
+		struct bio *b = r1_bio->bios[i];
+		if (b->bi_end_io != end_sync_read)
+			continue;
+		/* fixup the bio for reuse */
+		bio_reset(b);
+		b->bi_vcnt = vcnt;
+		b->bi_size = r1_bio->sectors << 9;
+		b->bi_sector = r1_bio->sector +
+			conf->mirrors[i].rdev->data_offset;
+		b->bi_bdev = conf->mirrors[i].rdev->bdev;
+		b->bi_end_io = end_sync_read;
+		b->bi_private = r1_bio;
+
+		size = b->bi_size;
+		for (j = 0; j < vcnt ; j++) {
+			struct bio_vec *bi;
+			bi = &b->bi_io_vec[j];
+			bi->bv_offset = 0;
+			if (size > PAGE_SIZE)
+				bi->bv_len = PAGE_SIZE;
+			else
+				bi->bv_len = size;
+			size -= PAGE_SIZE;
+		}
+	}
 	for (primary = 0; primary < conf->raid_disks * 2; primary++)
 		if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
 		    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
@@ -1857,12 +1887,10 @@ static int process_checks(struct r1bio *r1_bio)
 			break;
 		}
 	r1_bio->read_disk = primary;
-	vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		int j;
 		struct bio *pbio = r1_bio->bios[primary];
 		struct bio *sbio = r1_bio->bios[i];
-		int size;
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
@@ -1888,27 +1916,6 @@ static int process_checks(struct r1bio *r1_bio)
 			rdev_dec_pending(conf->mirrors[i].rdev, mddev);
 			continue;
 		}
-		/* fixup the bio for reuse */
-		bio_reset(sbio);
-		sbio->bi_vcnt = vcnt;
-		sbio->bi_size = r1_bio->sectors << 9;
-		sbio->bi_sector = r1_bio->sector +
-			conf->mirrors[i].rdev->data_offset;
-		sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
-		sbio->bi_end_io = end_sync_read;
-		sbio->bi_private = r1_bio;
-
-		size = sbio->bi_size;
-		for (j = 0; j < vcnt ; j++) {
-			struct bio_vec *bi;
-			bi = &sbio->bi_io_vec[j];
-			bi->bv_offset = 0;
-			if (size > PAGE_SIZE)
-				bi->bv_len = PAGE_SIZE;
-			else
-				bi->bv_len = size;
-			size -= PAGE_SIZE;
-		}
 
 		bio_copy_data(sbio, pbio);
 	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: RAID1 repair GPF crash w/3.10-rc7
  2013-07-17  6:12       ` RAID1 repair GPF crash w/3.10-rc7 NeilBrown
@ 2013-07-17 21:01         ` Joe Lawrence
  0 siblings, 0 replies; 10+ messages in thread
From: Joe Lawrence @ 2013-07-17 21:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, kmo

On Wed, 17 Jul 2013 16:12:30 +1000
NeilBrown <neilb@suse.de> wrote:

> Hi,
>  just letting you know I've decided to fix this differently for 3.11 and
>  3.10.y.
> 
> I noticed there was another problem in that code, in that the len argument
> passed to memcmp was bv_len which might have been modified.
> 
> So I moved the "fix up the bio" code to the top of the function and applied
> it to all relevant bios.
> 
> I'd still like to go with bio_rewind or similar, but it's probably best for
> that to wait for the other bio stuff to settle.  Maybe it won't be needed.
> 
> NeilBrown
> 
> 
> commit 9a3856f56b467425e13a19d95524524e76eab040
> Author: NeilBrown <neilb@suse.de>
> Date:   Wed Jul 17 15:19:29 2013 +1000
> 
>     md/raid1: fix bio handling problems in process_checks()
>     
>     Recent change to use bio_copy_data() in raid1 when repairing
>     an array is faulty.
>     
>     The underlying may have changed the bio in various ways using
>     bio_advance and these need to be undone not just for the 'sbio' which
>     is being copied to, but also the 'pbio' (primary) which is being
>     copied from.
>     
>     So perform the reset on all bios that were read from and do it early.
>     
>     This also ensure that the sbio->bi_io_vec[j].bv_len passed to
>     memcmp is correct.
>     
>     Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
>     Signed-off-by: NeilBrown <neilb@suse.de>
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index ec73458..d60412c 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1849,6 +1849,36 @@ static int process_checks(struct r1bio *r1_bio)
>  	int i;
>  	int vcnt;
>  
> +	/* Fix variable parts of all bios */
> +	vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
> +	for (i = 0; i < conf->raid_disks * 2; i++) {
> +		int j;
> +		int size;
> +		struct bio *b = r1_bio->bios[i];
> +		if (b->bi_end_io != end_sync_read)
> +			continue;
> +		/* fixup the bio for reuse */
> +		bio_reset(b);
> +		b->bi_vcnt = vcnt;
> +		b->bi_size = r1_bio->sectors << 9;
> +		b->bi_sector = r1_bio->sector +
> +			conf->mirrors[i].rdev->data_offset;
> +		b->bi_bdev = conf->mirrors[i].rdev->bdev;
> +		b->bi_end_io = end_sync_read;
> +		b->bi_private = r1_bio;
> +
> +		size = b->bi_size;
> +		for (j = 0; j < vcnt ; j++) {
> +			struct bio_vec *bi;
> +			bi = &b->bi_io_vec[j];
> +			bi->bv_offset = 0;
> +			if (size > PAGE_SIZE)
> +				bi->bv_len = PAGE_SIZE;
> +			else
> +				bi->bv_len = size;
> +			size -= PAGE_SIZE;
> +		}
> +	}
>  	for (primary = 0; primary < conf->raid_disks * 2; primary++)
>  		if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
>  		    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
> @@ -1857,12 +1887,10 @@ static int process_checks(struct r1bio *r1_bio)
>  			break;
>  		}
>  	r1_bio->read_disk = primary;
> -	vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
>  	for (i = 0; i < conf->raid_disks * 2; i++) {
>  		int j;
>  		struct bio *pbio = r1_bio->bios[primary];
>  		struct bio *sbio = r1_bio->bios[i];
> -		int size;
>  
>  		if (sbio->bi_end_io != end_sync_read)
>  			continue;
> @@ -1888,27 +1916,6 @@ static int process_checks(struct r1bio *r1_bio)
>  			rdev_dec_pending(conf->mirrors[i].rdev, mddev);
>  			continue;
>  		}
> -		/* fixup the bio for reuse */
> -		bio_reset(sbio);
> -		sbio->bi_vcnt = vcnt;
> -		sbio->bi_size = r1_bio->sectors << 9;
> -		sbio->bi_sector = r1_bio->sector +
> -			conf->mirrors[i].rdev->data_offset;
> -		sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
> -		sbio->bi_end_io = end_sync_read;
> -		sbio->bi_private = r1_bio;
> -
> -		size = sbio->bi_size;
> -		for (j = 0; j < vcnt ; j++) {
> -			struct bio_vec *bi;
> -			bi = &sbio->bi_io_vec[j];
> -			bi->bv_offset = 0;
> -			if (size > PAGE_SIZE)
> -				bi->bv_len = PAGE_SIZE;
> -			else
> -				bi->bv_len = size;
> -			size -= PAGE_SIZE;
> -		}
>  
>  		bio_copy_data(sbio, pbio);
>  	}

Hi Neil, this version looks good so far in my testing.  Thanks for fixing.

-- Joe

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-07-17 21:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-02  1:20 RAID1 repair GPF crash w/3.10-rc7 Joe Lawrence
2013-07-03 21:49 ` Joe Lawrence
2013-07-04  0:53   ` NeilBrown
2013-07-08 20:06     ` Joe Lawrence
2013-07-08 20:25       ` [PATCH 1/2] block: add bio_rewind() to reset bio_vec Joe Lawrence
2013-07-14 23:40         ` NeilBrown
2013-07-08 20:25       ` [PATCH 2/2] md: raid1: use bio_rewind() before bio_copy_data() Joe Lawrence
2013-07-09  4:33         ` NeilBrown
2013-07-17  6:12       ` RAID1 repair GPF crash w/3.10-rc7 NeilBrown
2013-07-17 21:01         ` Joe Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).