[PATCH] raid5: Avoid doing more read on dev of a stripe at the same time

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
@ 2012-09-15  2:20 Jianpeng Ma
  2012-09-20  2:51 ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Jianpeng Ma @ 2012-09-15  2:20 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

In func 'ops_run_bio' if you read the dev which the last reading
of this dev didn't return,it will destrory the  req/rreq'source of rdev.
It may call hung-task.
For example, for badsector or other reasons, read-operation only used
stripe instead of chunk_aligned_read.
First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
bios merged.
Because media error of sector from 0 to 7, the request retried.
At this time, raid5d readed stripe0 again.But it will set 'bio->next =
NULL'.So the stripe 8 and 16 didn't return.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
---
 drivers/md/raid5.c |   13 ++++++++++---
 drivers/md/raid5.h |    1 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index adda94d..e52ba1b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -547,9 +547,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 				rw = WRITE_FUA;
 			else
 				rw = WRITE;
-		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
-			rw = READ;
-		else if (test_and_clear_bit(R5_WantReplace,
+		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) {
+			/*The last read did not return,so skip this read*/
+			if (test_and_set_bit(R5_Reading, &sh->dev[i].flags)) {
+				clear_bit(R5_LOCKED, &sh->dev[i].flags);
+				continue;
+			} else
+				rw = READ;
+		} else if (test_and_clear_bit(R5_WantReplace,
 					    &sh->dev[i].flags)) {
 			rw = WRITE;
 			replace_only = 1;
@@ -700,6 +705,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 			pr_debug("skip op %ld on disc %d for sector %llu\n",
 				bi->bi_rw, i, (unsigned long long)sh->sector);
 			clear_bit(R5_LOCKED, &sh->dev[i].flags);
+			clear_bit(R5_Reading, &sh->dev[i].flags);
 			set_bit(STRIPE_HANDLE, &sh->state);
 		}
 	}
@@ -1818,6 +1824,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
 	}
 	rdev_dec_pending(rdev, conf->mddev);
 	clear_bit(R5_LOCKED, &sh->dev[i].flags);
+	clear_bit(R5_Reading, &sh->dev[i].flags);
 	set_bit(STRIPE_HANDLE, &sh->state);
 	release_stripe(sh);
 }
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index a9fc249..a596df8 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -298,6 +298,7 @@ enum r5dev_flags {
 	R5_WantReplace, /* We need to update the replacement, we have read
 			 * data in, and now is a good time to write it out.
 			 */
+	R5_Reading,	/* this dev on reading on lld*/
 };
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-15  2:20 [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time Jianpeng Ma
@ 2012-09-20  2:51 ` NeilBrown
  2012-09-20  3:04   ` Jianpeng Ma
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2012-09-20  2:51 UTC (permalink / raw)
  To: Jianpeng Ma; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3705 bytes --]

On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:

> In func 'ops_run_bio' if you read the dev which the last reading
> of this dev didn't return,it will destrory the  req/rreq'source of rdev.
> It may call hung-task.
> For example, for badsector or other reasons, read-operation only used
> stripe instead of chunk_aligned_read.
> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
> bios merged.
> Because media error of sector from 0 to 7, the request retried.
> At this time, raid5d readed stripe0 again.But it will set 'bio->next =
> NULL'.So the stripe 8 and 16 didn't return.
> 
> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>

Hi,
 I'm really trying, but I cannot understand what you are saying.

I think the situation that you are describing involves a 24 sector request.
This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16.

So 'toread' on the first device of each stripe points to this bio, and
bi_next is NULL.

The "req" bio for each device is filled out to read one page and these three
'req' bios are submitted.  The block layer merges these into a single request.

This request reports an error because there is a read error somewhere in the
first  8 sectors.

So one, or maybe all, of the 'req' bios return with an error?
Then RAID5 marks that device as being bad and reads from the other devices to
recover the data.

Where exactly does something go wrong?  which bio gets bi_next set to NULL
when it shouldn't?

If you could explain with lots of detail it would help a lot.

Thanks,
NeilBrown


> ---
>  drivers/md/raid5.c |   13 ++++++++++---
>  drivers/md/raid5.h |    1 +
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index adda94d..e52ba1b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -547,9 +547,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
>  				rw = WRITE_FUA;
>  			else
>  				rw = WRITE;
> -		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
> -			rw = READ;
> -		else if (test_and_clear_bit(R5_WantReplace,
> +		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) {
> +			/*The last read did not return,so skip this read*/
> +			if (test_and_set_bit(R5_Reading, &sh->dev[i].flags)) {
> +				clear_bit(R5_LOCKED, &sh->dev[i].flags);
> +				continue;
> +			} else
> +				rw = READ;
> +		} else if (test_and_clear_bit(R5_WantReplace,
>  					    &sh->dev[i].flags)) {
>  			rw = WRITE;
>  			replace_only = 1;
> @@ -700,6 +705,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
>  			pr_debug("skip op %ld on disc %d for sector %llu\n",
>  				bi->bi_rw, i, (unsigned long long)sh->sector);
>  			clear_bit(R5_LOCKED, &sh->dev[i].flags);
> +			clear_bit(R5_Reading, &sh->dev[i].flags);
>  			set_bit(STRIPE_HANDLE, &sh->state);
>  		}
>  	}
> @@ -1818,6 +1824,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
>  	}
>  	rdev_dec_pending(rdev, conf->mddev);
>  	clear_bit(R5_LOCKED, &sh->dev[i].flags);
> +	clear_bit(R5_Reading, &sh->dev[i].flags);
>  	set_bit(STRIPE_HANDLE, &sh->state);
>  	release_stripe(sh);
>  }
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index a9fc249..a596df8 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -298,6 +298,7 @@ enum r5dev_flags {
>  	R5_WantReplace, /* We need to update the replacement, we have read
>  			 * data in, and now is a good time to write it out.
>  			 */
> +	R5_Reading,	/* this dev on reading on lld*/
>  };
>  
>  /*


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-20  2:51 ` NeilBrown
@ 2012-09-20  3:04   ` Jianpeng Ma
  2012-09-20  3:24     ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Jianpeng Ma @ 2012-09-20  3:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 2012-09-20 10:51 NeilBrown <neilb@suse.de> Wrote:
>On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>
>> In func 'ops_run_bio' if you read the dev which the last reading
>> of this dev didn't return,it will destrory the  req/rreq'source of rdev.
>> It may call hung-task.
>> For example, for badsector or other reasons, read-operation only used
>> stripe instead of chunk_aligned_read.
>> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
>> bios merged.
>> Because media error of sector from 0 to 7, the request retried.
>> At this time, raid5d readed stripe0 again.But it will set 'bio->next =
>> NULL'.So the stripe 8 and 16 didn't return.
>> 
>> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>
>Hi,
> I'm really trying, but I cannot understand what you are saying.
>
Sorry for my bad english.
>I think the situation that you are describing involves a 24 sector request.
>This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16.
>
>So 'toread' on the first device of each stripe points to this bio, and
>bi_next is NULL.
>
>The "req" bio for each device is filled out to read one page and these three
>'req' bios are submitted.  The block layer merges these into a single request.
>
>This request reports an error because there is a read error somewhere in the
>first  8 sectors.
>
Yes,
>So one, or maybe all, of the 'req' bios return with an error?
From my test, when req did not return and at the same time, the bio(stripe 0) send.
So this operation will set bi_next is NULL.
Why am i mention the badsector? Because my disks which had badsector can reproduced easily.
For disks which didn't have badsector,i can't reproduced this bug.

>Then RAID5 marks that device as being bad and reads from the other devices to
>recover the data.
>
>Where exactly does something go wrong?  which bio gets bi_next set to NULL
>when it shouldn't?
>
>If you could explain with lots of detail it would help a lot.
>
The question is to ensure the first-read returned and the next to send on same dev of stripe.
>Thanks,
>NeilBrown
>
>
>> ---
>>  drivers/md/raid5.c |   13 ++++++++++---
>>  drivers/md/raid5.h |    1 +
>>  2 files changed, 11 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index adda94d..e52ba1b 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -547,9 +547,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
>>  				rw = WRITE_FUA;
>>  			else
>>  				rw = WRITE;
>> -		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
>> -			rw = READ;
>> -		else if (test_and_clear_bit(R5_WantReplace,
>> +		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) {
>> +			/*The last read did not return,so skip this read*/
>> +			if (test_and_set_bit(R5_Reading, &sh->dev[i].flags)) {
>> +				clear_bit(R5_LOCKED, &sh->dev[i].flags);
>> +				continue;
>> +			} else
>> +				rw = READ;
>> +		} else if (test_and_clear_bit(R5_WantReplace,
>>  					    &sh->dev[i].flags)) {
>>  			rw = WRITE;
>>  			replace_only = 1;
>> @@ -700,6 +705,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
>>  			pr_debug("skip op %ld on disc %d for sector %llu\n",
>>  				bi->bi_rw, i, (unsigned long long)sh->sector);
>>  			clear_bit(R5_LOCKED, &sh->dev[i].flags);
>> +			clear_bit(R5_Reading, &sh->dev[i].flags);
>>  			set_bit(STRIPE_HANDLE, &sh->state);
>>  		}
>>  	}
>> @@ -1818,6 +1824,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
>>  	}
>>  	rdev_dec_pending(rdev, conf->mddev);
>>  	clear_bit(R5_LOCKED, &sh->dev[i].flags);
>> +	clear_bit(R5_Reading, &sh->dev[i].flags);
>>  	set_bit(STRIPE_HANDLE, &sh->state);
>>  	release_stripe(sh);
>>  }
>> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
>> index a9fc249..a596df8 100644
>> --- a/drivers/md/raid5.h
>> +++ b/drivers/md/raid5.h
>> @@ -298,6 +298,7 @@ enum r5dev_flags {
>>  	R5_WantReplace, /* We need to update the replacement, we have read
>>  			 * data in, and now is a good time to write it out.
>>  			 */
>> +	R5_Reading,	/* this dev on reading on lld*/
>>  };
>>  
>>  /*
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-20  3:04   ` Jianpeng Ma
@ 2012-09-20  3:24     ` NeilBrown
  2012-09-20  6:04       ` Jianpeng Ma
  2012-09-21  2:24       ` Jianpeng Ma
  0 siblings, 2 replies; 10+ messages in thread
From: NeilBrown @ 2012-09-20  3:24 UTC (permalink / raw)
  To: Jianpeng Ma; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5075 bytes --]

On Thu, 20 Sep 2012 11:04:46 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:

> On 2012-09-20 10:51 NeilBrown <neilb@suse.de> Wrote:
> >On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
> >
> >> In func 'ops_run_bio' if you read the dev which the last reading
> >> of this dev didn't return,it will destrory the  req/rreq'source of rdev.
> >> It may call hung-task.
> >> For example, for badsector or other reasons, read-operation only used
> >> stripe instead of chunk_aligned_read.
> >> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
> >> bios merged.
> >> Because media error of sector from 0 to 7, the request retried.
> >> At this time, raid5d readed stripe0 again.But it will set 'bio->next =
> >> NULL'.So the stripe 8 and 16 didn't return.
> >> 
> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> >
> >Hi,
> > I'm really trying, but I cannot understand what you are saying.
> >
> Sorry for my bad english.
> >I think the situation that you are describing involves a 24 sector request.
> >This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16.
> >
> >So 'toread' on the first device of each stripe points to this bio, and
> >bi_next is NULL.
> >
> >The "req" bio for each device is filled out to read one page and these three
> >'req' bios are submitted.  The block layer merges these into a single request.
> >
> >This request reports an error because there is a read error somewhere in the
> >first  8 sectors.
> >
> Yes,
> >So one, or maybe all, of the 'req' bios return with an error?
> From my test, when req did not return and at the same time, the bio(stripe 0) send.
> So this operation will set bi_next is NULL.

Are you saying that we send another bio before the first one has returned?
That shouldn't be possible as sh->count will prevent it from happening.
While there is an outstanding request, sh->count will be >0, and until
sh->count is 0, we won't try to send any more requests.

So I still don't understand.  Please try to provide as much detail as
possible.  If it is easier, write in your own language and use
translate.google.com to convert to english. ??

Thanks,
NeilBrown



> Why am i mention the badsector? Because my disks which had badsector can reproduced easily.
> For disks which didn't have badsector,i can't reproduced this bug.
> 
> >Then RAID5 marks that device as being bad and reads from the other devices to
> >recover the data.
> >
> >Where exactly does something go wrong?  which bio gets bi_next set to NULL
> >when it shouldn't?
> >
> >If you could explain with lots of detail it would help a lot.
> >
> The question is to ensure the first-read returned and the next to send on same dev of stripe.
> >Thanks,
> >NeilBrown
> >
> >
> >> ---
> >>  drivers/md/raid5.c |   13 ++++++++++---
> >>  drivers/md/raid5.h |    1 +
> >>  2 files changed, 11 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index adda94d..e52ba1b 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -547,9 +547,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
> >>  				rw = WRITE_FUA;
> >>  			else
> >>  				rw = WRITE;
> >> -		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
> >> -			rw = READ;
> >> -		else if (test_and_clear_bit(R5_WantReplace,
> >> +		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) {
> >> +			/*The last read did not return,so skip this read*/
> >> +			if (test_and_set_bit(R5_Reading, &sh->dev[i].flags)) {
> >> +				clear_bit(R5_LOCKED, &sh->dev[i].flags);
> >> +				continue;
> >> +			} else
> >> +				rw = READ;
> >> +		} else if (test_and_clear_bit(R5_WantReplace,
> >>  					    &sh->dev[i].flags)) {
> >>  			rw = WRITE;
> >>  			replace_only = 1;
> >> @@ -700,6 +705,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
> >>  			pr_debug("skip op %ld on disc %d for sector %llu\n",
> >>  				bi->bi_rw, i, (unsigned long long)sh->sector);
> >>  			clear_bit(R5_LOCKED, &sh->dev[i].flags);
> >> +			clear_bit(R5_Reading, &sh->dev[i].flags);
> >>  			set_bit(STRIPE_HANDLE, &sh->state);
> >>  		}
> >>  	}
> >> @@ -1818,6 +1824,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
> >>  	}
> >>  	rdev_dec_pending(rdev, conf->mddev);
> >>  	clear_bit(R5_LOCKED, &sh->dev[i].flags);
> >> +	clear_bit(R5_Reading, &sh->dev[i].flags);
> >>  	set_bit(STRIPE_HANDLE, &sh->state);
> >>  	release_stripe(sh);
> >>  }
> >> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> >> index a9fc249..a596df8 100644
> >> --- a/drivers/md/raid5.h
> >> +++ b/drivers/md/raid5.h
> >> @@ -298,6 +298,7 @@ enum r5dev_flags {
> >>  	R5_WantReplace, /* We need to update the replacement, we have read
> >>  			 * data in, and now is a good time to write it out.
> >>  			 */
> >> +	R5_Reading,	/* this dev on reading on lld*/
> >>  };
> >>  
> >>  /*
> >
> >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-20  3:24     ` NeilBrown
@ 2012-09-20  6:04       ` Jianpeng Ma
  2012-09-21  2:24       ` Jianpeng Ma
  1 sibling, 0 replies; 10+ messages in thread
From: Jianpeng Ma @ 2012-09-20  6:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 2012-09-20 11:24 NeilBrown <neilb@suse.de> Wrote:
>On Thu, 20 Sep 2012 11:04:46 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>
>> On 2012-09-20 10:51 NeilBrown <neilb@suse.de> Wrote:
>> >On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>> >
>> >> In func 'ops_run_bio' if you read the dev which the last reading
>> >> of this dev didn't return,it will destrory the  req/rreq'source of rdev.
>> >> It may call hung-task.
>> >> For example, for badsector or other reasons, read-operation only used
>> >> stripe instead of chunk_aligned_read.
>> >> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
>> >> bios merged.
>> >> Because media error of sector from 0 to 7, the request retried.
>> >> At this time, raid5d readed stripe0 again.But it will set 'bio->next =
>> >> NULL'.So the stripe 8 and 16 didn't return.
>> >> 
>> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>> >
>> >Hi,
>> > I'm really trying, but I cannot understand what you are saying.
>> >
>> Sorry for my bad english.
>> >I think the situation that you are describing involves a 24 sector request.
>> >This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16.
>> >
>> >So 'toread' on the first device of each stripe points to this bio, and
>> >bi_next is NULL.
>> >
>> >The "req" bio for each device is filled out to read one page and these three
>> >'req' bios are submitted.  The block layer merges these into a single request.
>> >
>> >This request reports an error because there is a read error somewhere in the
>> >first  8 sectors.
>> >
>> Yes,
>> >So one, or maybe all, of the 'req' bios return with an error?
>> From my test, when req did not return and at the same time, the bio(stripe 0) send.
>> So this operation will set bi_next is NULL.
>
>Are you saying that we send another bio before the first one has returned?
>That shouldn't be possible as sh->count will prevent it from happening.
>While there is an outstanding request, sh->count will be >0, and until
>sh->count is 0, we won't try to send any more requests.
>

>So I still don't understand.  Please try to provide as much detail as
>possible.  If it is easier, write in your own language and use
>translate.google.com to convert to english. ??
>
Because my disks which had badsector wrote and can't reproduce this bug now.
1: create raid5 
	mdadm -CR /dev/md0 -l5 -c4 -n4 missing /dev/sd[cde] 
	sdb and sdc  had badsectors.
2:do read operation 
  while true;do
  dd if=/dev/md0 of=/dev/null bs=3M count=1 oflag=direct
  done
3:dd will be hang.

Suppose read stripeN from missed disk, it only read disk sdc/sdd/sde and to computer.
1: first send read operation about sdc/sdd/sde for computer
2: then read sdc through chunk_aligned_read and met error and used stripe to read.
If this read-operation can add stripe and send to sdc,there is chance to occur the problem like explained.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-20  3:24     ` NeilBrown
  2012-09-20  6:04       ` Jianpeng Ma
@ 2012-09-21  2:24       ` Jianpeng Ma
  2012-09-25  7:29         ` NeilBrown
  1 sibling, 1 reply; 10+ messages in thread
From: Jianpeng Ma @ 2012-09-21  2:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 2012-09-20 11:24 NeilBrown <neilb@suse.de> Wrote:
>On Thu, 20 Sep 2012 11:04:46 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>
>> On 2012-09-20 10:51 NeilBrown <neilb@suse.de> Wrote:
>> >On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>> >
>> >> In func 'ops_run_bio' if you read the dev which the last reading
>> >> of this dev didn't return,it will destrory the  req/rreq'source of rdev.
>> >> It may call hung-task.
>> >> For example, for badsector or other reasons, read-operation only used
>> >> stripe instead of chunk_aligned_read.
>> >> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
>> >> bios merged.
>> >> Because media error of sector from 0 to 7, the request retried.
>> >> At this time, raid5d readed stripe0 again.But it will set 'bio->next =
>> >> NULL'.So the stripe 8 and 16 didn't return.
>> >> 
>> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>> >
>> >Hi,
>> > I'm really trying, but I cannot understand what you are saying.
>> >
>> Sorry for my bad english.
>> >I think the situation that you are describing involves a 24 sector request.
>> >This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16.
>> >
>> >So 'toread' on the first device of each stripe points to this bio, and
>> >bi_next is NULL.
>> >
>> >The "req" bio for each device is filled out to read one page and these three
>> >'req' bios are submitted.  The block layer merges these into a single request.
>> >
>> >This request reports an error because there is a read error somewhere in the
>> >first  8 sectors.
>> >
>> Yes,
>> >So one, or maybe all, of the 'req' bios return with an error?
>> From my test, when req did not return and at the same time, the bio(stripe 0) send.
>> So this operation will set bi_next is NULL.
>
>Are you saying that we send another bio before the first one has returned?
>That shouldn't be possible as sh->count will prevent it from happening.
>While there is an outstanding request, sh->count will be >0, and until
>sh->count is 0, we won't try to send any more requests.
>
>So I still don't understand.  Please try to provide as much detail as
>possible.  If it is easier, write in your own language and use
>translate.google.com to convert to english. ??
>
>Thanks,
>NeilBrown

Hi,
 i wrote a shell-script can reproduct this bug.
Note: mdadm -V
mdadm - v3.3-pre - Unreleased


#!/bin/bash

declare -i count
declare -i sector
count=0
sector=2048
while true
do
	hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sdc > /dev/null
	hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sdd > /dev/null
	hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sde > /dev/null
	let count++
	let sector+=$count*8
	if (($count == 40));then
		break
	fi
done

while true
do
	mdadm -S /dev/md0
	mdadm -CR /dev/md0 -l5 -c4 -n4 missing /dev/sd[cde]
	dd if=/dev/md0 of=/dev/null bs=10M count=1 iflag=direct
	sleep 1
done


Thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-21  2:24       ` Jianpeng Ma
@ 2012-09-25  7:29         ` NeilBrown
  2012-09-26  2:43           ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2012-09-25  7:29 UTC (permalink / raw)
  To: Jianpeng Ma; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3378 bytes --]

On Fri, 21 Sep 2012 10:24:45 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:

> On 2012-09-20 11:24 NeilBrown <neilb@suse.de> Wrote:
> >On Thu, 20 Sep 2012 11:04:46 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
> >
> >> On 2012-09-20 10:51 NeilBrown <neilb@suse.de> Wrote:
> >> >On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
> >> >
> >> >> In func 'ops_run_bio' if you read the dev which the last reading
> >> >> of this dev didn't return,it will destrory the  req/rreq'source of rdev.
> >> >> It may call hung-task.
> >> >> For example, for badsector or other reasons, read-operation only used
> >> >> stripe instead of chunk_aligned_read.
> >> >> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
> >> >> bios merged.
> >> >> Because media error of sector from 0 to 7, the request retried.
> >> >> At this time, raid5d readed stripe0 again.But it will set 'bio->next =
> >> >> NULL'.So the stripe 8 and 16 didn't return.
> >> >> 
> >> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> >> >
> >> >Hi,
> >> > I'm really trying, but I cannot understand what you are saying.
> >> >
> >> Sorry for my bad english.
> >> >I think the situation that you are describing involves a 24 sector request.
> >> >This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16.
> >> >
> >> >So 'toread' on the first device of each stripe points to this bio, and
> >> >bi_next is NULL.
> >> >
> >> >The "req" bio for each device is filled out to read one page and these three
> >> >'req' bios are submitted.  The block layer merges these into a single request.
> >> >
> >> >This request reports an error because there is a read error somewhere in the
> >> >first  8 sectors.
> >> >
> >> Yes,
> >> >So one, or maybe all, of the 'req' bios return with an error?
> >> From my test, when req did not return and at the same time, the bio(stripe 0) send.
> >> So this operation will set bi_next is NULL.
> >
> >Are you saying that we send another bio before the first one has returned?
> >That shouldn't be possible as sh->count will prevent it from happening.
> >While there is an outstanding request, sh->count will be >0, and until
> >sh->count is 0, we won't try to send any more requests.
> >
> >So I still don't understand.  Please try to provide as much detail as
> >possible.  If it is easier, write in your own language and use
> >translate.google.com to convert to english. ??
> >
> >Thanks,
> >NeilBrown
> 
> Hi,
>  i wrote a shell-script can reproduct this bug.
> Note: mdadm -V
> mdadm - v3.3-pre - Unreleased
> 
> 
> #!/bin/bash
> 
> declare -i count
> declare -i sector
> count=0
> sector=2048
> while true
> do
> 	hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sdc > /dev/null
> 	hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sdd > /dev/null
> 	hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sde > /dev/null
> 	let count++
> 	let sector+=$count*8
> 	if (($count == 40));then
> 		break
> 	fi
> done
> 
> while true
> do
> 	mdadm -S /dev/md0
> 	mdadm -CR /dev/md0 -l5 -c4 -n4 missing /dev/sd[cde]
> 	dd if=/dev/md0 of=/dev/null bs=10M count=1 iflag=direct
> 	sleep 1
> done
> 
> 
> Thanks

Thanks a lot!
I'll try this out and see what I find.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-25  7:29         ` NeilBrown
@ 2012-09-26  2:43           ` NeilBrown
  2012-09-26  4:09             ` Jianpeng Ma
  2012-09-26  9:14             ` Jianpeng Ma
  0 siblings, 2 replies; 10+ messages in thread
From: NeilBrown @ 2012-09-26  2:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jianpeng Ma, linux-raid

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

On Tue, 25 Sep 2012 17:29:12 +1000 NeilBrown <neilb@suse.de> wrote:

> On Fri, 21 Sep 2012 10:24:45 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:

> > Hi,
> >  i wrote a shell-script can reproduct this bug.
> > Note: mdadm -V
> > mdadm - v3.3-pre - Unreleased
> > 
...

> Thanks a lot!
> I'll try this out and see what I find.

I've been running this script (with the mentioned version of mdadm) and
nothing has gone wrong.
The 'dd' keeps reporting read errors of course, as does the kernel, but not
crash or freeze or anything.

What exactly happens when you run it?  Do you get a kernel stack trace at all?
Can you post logs showing where it goes wrong?

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-26  2:43           ` NeilBrown
@ 2012-09-26  4:09             ` Jianpeng Ma
  2012-09-26  9:14             ` Jianpeng Ma
  1 sibling, 0 replies; 10+ messages in thread
From: Jianpeng Ma @ 2012-09-26  4:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 2012-09-26 10:43 NeilBrown <neilb@suse.de> Wrote:
>On Tue, 25 Sep 2012 17:29:12 +1000 NeilBrown <neilb@suse.de> wrote:
>
>> On Fri, 21 Sep 2012 10:24:45 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>
>> > Hi,
>> >  i wrote a shell-script can reproduct this bug.
>> > Note: mdadm -V
>> > mdadm - v3.3-pre - Unreleased
>> > 
>...
>
>> Thanks a lot!
>> I'll try this out and see what I find.
>
>I've been running this script (with the mentioned version of mdadm) and
>nothing has gone wrong.
>The 'dd' keeps reporting read errors of course, as does the kernel, but not
>crash or freeze or anything.
What's the version of mdadm? Use the feature of badblock log to test.
Are you sure?
>
>What exactly happens when you run it?  Do you get a kernel stack trace at all?
>Can you post logs showing where it goes wrong?
>
Ok, i will redo and catch message to send you.
Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
  2012-09-26  2:43           ` NeilBrown
  2012-09-26  4:09             ` Jianpeng Ma
@ 2012-09-26  9:14             ` Jianpeng Ma
  1 sibling, 0 replies; 10+ messages in thread
From: Jianpeng Ma @ 2012-09-26  9:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

On 2012-09-26 10:43 NeilBrown <neilb@suse.de> Wrote:
>On Tue, 25 Sep 2012 17:29:12 +1000 NeilBrown <neilb@suse.de> wrote:
>
>> On Fri, 21 Sep 2012 10:24:45 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>
>> > Hi,
>> >  i wrote a shell-script can reproduct this bug.
>> > Note: mdadm -V
>> > mdadm - v3.3-pre - Unreleased
>> > 
>...
>
>> Thanks a lot!
>> I'll try this out and see what I find.
>
>I've been running this script (with the mentioned version of mdadm) and
>nothing has gone wrong.
>The 'dd' keeps reporting read errors of course, as does the kernel, but not
>crash or freeze or anything.
>
>What exactly happens when you run it?  Do you get a kernel stack trace at all?
>Can you post logs showing where it goes wrong?
>
>Thanks,
>NeilBrown

Hi:
The attachement is the kerne log which enabling dynamic_debug.
I hope you can get some useful infos from this log.If can't,can you suggest what place to add debug?
 
Thanks!

[-- Attachment #2: kern.rar --]
[-- Type: application/octet-stream, Size: 34412 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-09-26  9:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-15  2:20 [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time Jianpeng Ma
2012-09-20  2:51 ` NeilBrown
2012-09-20  3:04   ` Jianpeng Ma
2012-09-20  3:24     ` NeilBrown
2012-09-20  6:04       ` Jianpeng Ma
2012-09-21  2:24       ` Jianpeng Ma
2012-09-25  7:29         ` NeilBrown
2012-09-26  2:43           ` NeilBrown
2012-09-26  4:09             ` Jianpeng Ma
2012-09-26  9:14             ` Jianpeng Ma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).