Bad raid0 bio too large problem

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bad raid0 bio too large problem
@ 2015-09-22 15:30 Jes Sorensen
  2015-09-22 16:07 ` Jes Sorensen
  2015-09-23  2:25 ` Neil Brown
  0 siblings, 2 replies; 10+ messages in thread
From: Jes Sorensen @ 2015-09-22 15:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: Xiao Ni, linux-raid, yizhan

Hi Neil,

I think we have some bad side effects with this patch:

commit 199dc6ed5179251fa6158a461499c24bdd99c836
Author: NeilBrown <neilb@suse.com>
Date:   Mon Aug 3 13:11:47 2015 +1000

    md/raid0: update queue parameter in a safer location.
    
    When a (e.g.) RAID5 array is reshaped to RAID0, the updating
    of queue parameters (e.g. max number of sectors per bio) is
    done in the wrong place.
    It should be part of ->run, but it is actually part of ->takeover.
    This means it happens before level_store() calls:
    
        blk_set_stacking_limits(&mddev->queue->limits);
    
Running the '03r0assem' test suite fills my kernel log with output like
below. Yi Zhang also had issues where writes failed too.

robably something we need to resolve for 4.2-final or revert the
offending patch.

Cheers,
Jes

md: bind<loop0>
md: bind<loop1>
md: bind<loop2>
md/raid0:md2: md_size is 116736 sectors.
md: RAID0 configuration for md2 - 1 zone
md: zone0=[loop0/loop1/loop2]
      zone-offset=         0KB, device-offset=         0KB, size=     58368KB

md2: detected capacity change from 0 to 59768832
bio too big device loop0 (296 > 255)
bio too big device loop0 (272 > 255)
bio too big device loop1 (672 > 255)
bio too big device loop1 (352 > 255)
bio too big device loop2 (1024 > 255)
bio too big device loop0 (672 > 255)
bio too big device loop1 (256 > 255)
bio too big device loop2 (288 > 255)
bio too big device loop2 (736 > 255)
bio too big device loop0 (288 > 255)
bio too big device loop2 (256 > 255)
bio too big device loop2 (368 > 255)
bio too big device loop0 (488 > 255)
bio too big device loop0 (360 > 255)
bio too big device loop0 (256 > 255)
bio too big device loop1 (288 > 255)
bio too big device loop1 (736 > 255)
bio too big device loop2 (288 > 255)
bio too big device loop1 (256 > 255)
bio too big device loop1 (512 > 255)
bio too big device loop2 (856 > 255)
bio too big device loop2 (256 > 255)
bio too big device loop0 (288 > 255)
bio too big device loop0 (736 > 255)
bio too big device loop1 (288 > 255)
bio too big device loop0 (256 > 255)
bio too big device loop0 (512 > 255)
bio too big device loop1 (856 > 255)
bio too big device loop1 (256 > 255)
bio too big device loop2 (288 > 255)
bio too big device loop2 (736 > 255)
md2: detected capacity change from 59768832 to 0
md: md2 stopped.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-22 15:30 Bad raid0 bio too large problem Jes Sorensen
@ 2015-09-22 16:07 ` Jes Sorensen
  2015-09-23  2:25 ` Neil Brown
  1 sibling, 0 replies; 10+ messages in thread
From: Jes Sorensen @ 2015-09-22 16:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Xiao Ni, linux-raid, yizhan

Jes Sorensen <Jes.Sorensen@redhat.com> writes:
> Hi Neil,
>
> I think we have some bad side effects with this patch:
>
> commit 199dc6ed5179251fa6158a461499c24bdd99c836
> Author: NeilBrown <neilb@suse.com>
> Date:   Mon Aug 3 13:11:47 2015 +1000
>
>     md/raid0: update queue parameter in a safer location.
>     
>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>     of queue parameters (e.g. max number of sectors per bio) is
>     done in the wrong place.
>     It should be part of ->run, but it is actually part of ->takeover.
>     This means it happens before level_store() calls:
>     
>         blk_set_stacking_limits(&mddev->queue->limits);
>     
> Running the '03r0assem' test suite fills my kernel log with output like
> below. Yi Zhang also had issues where writes failed too.
>
> robably something we need to resolve for 4.2-final or revert the
> offending patch.

Obviously I meant 4.3 here :)

Jes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-22 15:30 Bad raid0 bio too large problem Jes Sorensen
  2015-09-22 16:07 ` Jes Sorensen
@ 2015-09-23  2:25 ` Neil Brown
  2015-09-23 11:18   ` Jes Sorensen
  1 sibling, 1 reply; 10+ messages in thread
From: Neil Brown @ 2015-09-23  2:25 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Xiao Ni, linux-raid, yizhan

[-- Attachment #1: Type: text/plain, Size: 1727 bytes --]

Jes Sorensen <Jes.Sorensen@redhat.com> writes:

> Hi Neil,
>
> I think we have some bad side effects with this patch:
>
> commit 199dc6ed5179251fa6158a461499c24bdd99c836
> Author: NeilBrown <neilb@suse.com>
> Date:   Mon Aug 3 13:11:47 2015 +1000
>
>     md/raid0: update queue parameter in a safer location.
>     
>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>     of queue parameters (e.g. max number of sectors per bio) is
>     done in the wrong place.
>     It should be part of ->run, but it is actually part of ->takeover.
>     This means it happens before level_store() calls:
>     
>         blk_set_stacking_limits(&mddev->queue->limits);
>     
> Running the '03r0assem' test suite fills my kernel log with output like
> below. Yi Zhang also had issues where writes failed too.
>
> robably something we need to resolve for 4.2-final or revert the
> offending patch.
>
> Cheers,
> Jes
>
> md: bind<loop0>
> md: bind<loop1>
> md: bind<loop2>
> md/raid0:md2: md_size is 116736 sectors.
> md: RAID0 configuration for md2 - 1 zone
> md: zone0=[loop0/loop1/loop2]
>       zone-offset=         0KB, device-offset=         0KB, size=     58368KB
>
> md2: detected capacity change from 0 to 59768832
> bio too big device loop0 (296 > 255)
> bio too big device loop0 (272 > 255)

1/ Why do you blame that particular patch?

2/ Where is that error message coming from?  I cannot find "bio too big"
  in the kernel (except in a comment).
  Commit: 54efd50bfd87 ("block: make generic_make_request handle arbitrarily sized bios")
  removed the only instance of the error message that I know of.

Which kernel exactly are you testing?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-23  2:25 ` Neil Brown
@ 2015-09-23 11:18   ` Jes Sorensen
  2015-09-23 11:20     ` Jes Sorensen
  2015-09-24  2:53     ` Neil Brown
  0 siblings, 2 replies; 10+ messages in thread
From: Jes Sorensen @ 2015-09-23 11:18 UTC (permalink / raw)
  To: Neil Brown; +Cc: Xiao Ni, linux-raid, yizhan

Neil Brown <neilb@suse.de> writes:
> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>
>> Hi Neil,
>>
>> I think we have some bad side effects with this patch:
>>
>> commit 199dc6ed5179251fa6158a461499c24bdd99c836
>> Author: NeilBrown <neilb@suse.com>
>> Date:   Mon Aug 3 13:11:47 2015 +1000
>>
>>     md/raid0: update queue parameter in a safer location.
>>     
>>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>>     of queue parameters (e.g. max number of sectors per bio) is
>>     done in the wrong place.
>>     It should be part of ->run, but it is actually part of ->takeover.
>>     This means it happens before level_store() calls:
>>     
>>         blk_set_stacking_limits(&mddev->queue->limits);
>>     
>> Running the '03r0assem' test suite fills my kernel log with output like
>> below. Yi Zhang also had issues where writes failed too.
>>
>> robably something we need to resolve for 4.2-final or revert the
>> offending patch.
>>
>> Cheers,
>> Jes
>>
>> md: bind<loop0>
>> md: bind<loop1>
>> md: bind<loop2>
>> md/raid0:md2: md_size is 116736 sectors.
>> md: RAID0 configuration for md2 - 1 zone
>> md: zone0=[loop0/loop1/loop2]
>>       zone-offset=         0KB, device-offset=         0KB, size=     58368KB
>>
>> md2: detected capacity change from 0 to 59768832
>> bio too big device loop0 (296 > 255)
>> bio too big device loop0 (272 > 255)
>
> 1/ Why do you blame that particular patch?
>
> 2/ Where is that error message coming from?  I cannot find "bio too big"
>   in the kernel (except in a comment).
>   Commit: 54efd50bfd87 ("block: make generic_make_request handle
> arbitrarily sized bios")
>   removed the only instance of the error message that I know of.
>
> Which kernel exactly are you testing?

I blame it because of bisect - I revert that patch and the issue goes
away.

I checked out 199dc6ed5179251fa6158a461499c24bdd99c836 in Linus' tree,
see the bio too large. I revert it and it goes away.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-23 11:18   ` Jes Sorensen
@ 2015-09-23 11:20     ` Jes Sorensen
  2015-09-23 11:42       ` Jes Sorensen
  2015-09-24  2:53     ` Neil Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Jes Sorensen @ 2015-09-23 11:20 UTC (permalink / raw)
  To: Neil Brown; +Cc: Xiao Ni, linux-raid, yizhan

Jes Sorensen <Jes.Sorensen@redhat.com> writes:
> Neil Brown <neilb@suse.de> writes:
>> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>>
>>> Hi Neil,
>>>
>>> I think we have some bad side effects with this patch:
>>>
>>> commit 199dc6ed5179251fa6158a461499c24bdd99c836
>>> Author: NeilBrown <neilb@suse.com>
>>> Date:   Mon Aug 3 13:11:47 2015 +1000
>>>
>>>     md/raid0: update queue parameter in a safer location.
>>>     
>>>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>>>     of queue parameters (e.g. max number of sectors per bio) is
>>>     done in the wrong place.
>>>     It should be part of ->run, but it is actually part of ->takeover.
>>>     This means it happens before level_store() calls:
>>>     
>>>         blk_set_stacking_limits(&mddev->queue->limits);
>>>     
>>> Running the '03r0assem' test suite fills my kernel log with output like
>>> below. Yi Zhang also had issues where writes failed too.
>>>
>>> robably something we need to resolve for 4.2-final or revert the
>>> offending patch.
>>>
>>> Cheers,
>>> Jes
>>>
>>> md: bind<loop0>
>>> md: bind<loop1>
>>> md: bind<loop2>
>>> md/raid0:md2: md_size is 116736 sectors.
>>> md: RAID0 configuration for md2 - 1 zone
>>> md: zone0=[loop0/loop1/loop2]
>>>       zone-offset=         0KB, device-offset=         0KB, size=     58368KB
>>>
>>> md2: detected capacity change from 0 to 59768832
>>> bio too big device loop0 (296 > 255)
>>> bio too big device loop0 (272 > 255)
>>
>> 1/ Why do you blame that particular patch?
>>
>> 2/ Where is that error message coming from?  I cannot find "bio too big"
>>   in the kernel (except in a comment).
>>   Commit: 54efd50bfd87 ("block: make generic_make_request handle
>> arbitrarily sized bios")
>>   removed the only instance of the error message that I know of.
>>
>> Which kernel exactly are you testing?
>
> I blame it because of bisect - I revert that patch and the issue goes
> away.
>
> I checked out 199dc6ed5179251fa6158a461499c24bdd99c836 in Linus' tree,
> see the bio too large. I revert it and it goes away.

Hmmm Xiao tells me that the warning message is not in upstream, but I
was sure I reproduced it in the upstream kernel as well. If I screwed
up, my apologies, I am going back to my cave and will investigate
further.

Jes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-23 11:20     ` Jes Sorensen
@ 2015-09-23 11:42       ` Jes Sorensen
  0 siblings, 0 replies; 10+ messages in thread
From: Jes Sorensen @ 2015-09-23 11:42 UTC (permalink / raw)
  To: Neil Brown; +Cc: Xiao Ni, linux-raid, yizhan

Jes Sorensen <Jes.Sorensen@redhat.com> writes:
> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>> Neil Brown <neilb@suse.de> writes:
>>> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>>>
>>>> Hi Neil,
>>>>
>>>> I think we have some bad side effects with this patch:
>>>>
>>>> commit 199dc6ed5179251fa6158a461499c24bdd99c836
>>>> Author: NeilBrown <neilb@suse.com>
>>>> Date:   Mon Aug 3 13:11:47 2015 +1000
>>>>
>>>>     md/raid0: update queue parameter in a safer location.
>>>>     
>>>>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>>>>     of queue parameters (e.g. max number of sectors per bio) is
>>>>     done in the wrong place.
>>>>     It should be part of ->run, but it is actually part of ->takeover.
>>>>     This means it happens before level_store() calls:
>>>>     
>>>>         blk_set_stacking_limits(&mddev->queue->limits);
>>>>     
>>>> Running the '03r0assem' test suite fills my kernel log with output like
>>>> below. Yi Zhang also had issues where writes failed too.
>>>>
>>>> robably something we need to resolve for 4.2-final or revert the
>>>> offending patch.
>>>>
>>>> Cheers,
>>>> Jes
>>>>
>>>> md: bind<loop0>
>>>> md: bind<loop1>
>>>> md: bind<loop2>
>>>> md/raid0:md2: md_size is 116736 sectors.
>>>> md: RAID0 configuration for md2 - 1 zone
>>>> md: zone0=[loop0/loop1/loop2]
>>>>       zone-offset=         0KB, device-offset=         0KB, size=     58368KB
>>>>
>>>> md2: detected capacity change from 0 to 59768832
>>>> bio too big device loop0 (296 > 255)
>>>> bio too big device loop0 (272 > 255)
>>>
>>> 1/ Why do you blame that particular patch?
>>>
>>> 2/ Where is that error message coming from?  I cannot find "bio too big"
>>>   in the kernel (except in a comment).
>>>   Commit: 54efd50bfd87 ("block: make generic_make_request handle
>>> arbitrarily sized bios")
>>>   removed the only instance of the error message that I know of.
>>>
>>> Which kernel exactly are you testing?
>>
>> I blame it because of bisect - I revert that patch and the issue goes
>> away.
>>
>> I checked out 199dc6ed5179251fa6158a461499c24bdd99c836 in Linus' tree,
>> see the bio too large. I revert it and it goes away.
>
> Hmmm Xiao tells me that the warning message is not in upstream, but I
> was sure I reproduced it in the upstream kernel as well. If I screwed
> up, my apologies, I am going back to my cave and will investigate
> further.

Digging a bit further, I was testing on a 4.2.0-rc5+ where I saw the
issue. That does raise the question of what this patch does to older
kernels, since it was submitted it for 2.6.35+.

"bio too big" message went away with this commit:

commit 54efd50bfd873e2dbf784e0b21a8027ba4299a3e
Author: Kent Overstreet <kent.overstreet@gmail.com>
Date:   Thu Apr 23 22:37:18 2015 -0700

    block: make generic_make_request handle arbitrarily sized bios

Cheers,
Jes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-23 11:18   ` Jes Sorensen
  2015-09-23 11:20     ` Jes Sorensen
@ 2015-09-24  2:53     ` Neil Brown
  2015-09-24  8:48       ` Xiao Ni
  2015-09-24 12:59       ` Jes Sorensen
  1 sibling, 2 replies; 10+ messages in thread
From: Neil Brown @ 2015-09-24  2:53 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Xiao Ni, linux-raid, yizhan

[-- Attachment #1: Type: text/plain, Size: 3671 bytes --]

Jes Sorensen <Jes.Sorensen@redhat.com> writes:

> Neil Brown <neilb@suse.de> writes:
>> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>>
>>> Hi Neil,
>>>
>>> I think we have some bad side effects with this patch:
>>>
>>> commit 199dc6ed5179251fa6158a461499c24bdd99c836
>>> Author: NeilBrown <neilb@suse.com>
>>> Date:   Mon Aug 3 13:11:47 2015 +1000
>>>
>>>     md/raid0: update queue parameter in a safer location.
>>>     
>>>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>>>     of queue parameters (e.g. max number of sectors per bio) is
>>>     done in the wrong place.
>>>     It should be part of ->run, but it is actually part of ->takeover.
>>>     This means it happens before level_store() calls:
>>>     
>>>         blk_set_stacking_limits(&mddev->queue->limits);
>>>     
>>> Running the '03r0assem' test suite fills my kernel log with output like
>>> below. Yi Zhang also had issues where writes failed too.
>>>
>>> robably something we need to resolve for 4.2-final or revert the
>>> offending patch.
>>>
>>> Cheers,
>>> Jes
>>>
>>> md: bind<loop0>
>>> md: bind<loop1>
>>> md: bind<loop2>
>>> md/raid0:md2: md_size is 116736 sectors.
>>> md: RAID0 configuration for md2 - 1 zone
>>> md: zone0=[loop0/loop1/loop2]
>>>       zone-offset=         0KB, device-offset=         0KB, size=     58368KB
>>>
>>> md2: detected capacity change from 0 to 59768832
>>> bio too big device loop0 (296 > 255)
>>> bio too big device loop0 (272 > 255)
>>
>> 1/ Why do you blame that particular patch?
>>
>> 2/ Where is that error message coming from?  I cannot find "bio too big"
>>   in the kernel (except in a comment).
>>   Commit: 54efd50bfd87 ("block: make generic_make_request handle
>> arbitrarily sized bios")
>>   removed the only instance of the error message that I know of.
>>
>> Which kernel exactly are you testing?
>
> I blame it because of bisect - I revert that patch and the issue goes
> away.
>
> I checked out 199dc6ed5179251fa6158a461499c24bdd99c836 in Linus' tree,
> see the bio too large. I revert it and it goes away.

Well that's pretty convincing - thanks.
And as you say - it is tagged for -stable so really needs to be fixed.

Stares at the code again.  And again.

Ahhh.  that patch moved the
  blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
to after
 disk_stack_limits(...);

That is wrong.

Could you confirm that this fixes your test?

Thanks,
NeilBrown

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 4a13c3cb940b..0875e5e7e09a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -431,12 +431,6 @@ static int raid0_run(struct mddev *mddev)
 		struct md_rdev *rdev;
 		bool discard_supported = false;
 
-		rdev_for_each(rdev, mddev) {
-			disk_stack_limits(mddev->gendisk, rdev->bdev,
-					  rdev->data_offset << 9);
-			if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
-				discard_supported = true;
-		}
 		blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
 		blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
 		blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
@@ -445,6 +439,12 @@ static int raid0_run(struct mddev *mddev)
 		blk_queue_io_opt(mddev->queue,
 				 (mddev->chunk_sectors << 9) * mddev->raid_disks);
 
+		rdev_for_each(rdev, mddev) {
+			disk_stack_limits(mddev->gendisk, rdev->bdev,
+					  rdev->data_offset << 9);
+			if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+				discard_supported = true;
+		}
 		if (!discard_supported)
 			queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
 		else

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-24  2:53     ` Neil Brown
@ 2015-09-24  8:48       ` Xiao Ni
  2015-09-24 12:59       ` Jes Sorensen
  1 sibling, 0 replies; 10+ messages in thread
From: Xiao Ni @ 2015-09-24  8:48 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jes Sorensen, linux-raid, yizhan



----- Original Message -----
> From: "Neil Brown" <neilb@suse.de>
> To: "Jes Sorensen" <Jes.Sorensen@redhat.com>
> Cc: "Xiao Ni" <xni@redhat.com>, "linux-raid" <linux-raid@vger.kernel.org>, yizhan@redhat.com
> Sent: Thursday, September 24, 2015 10:53:06 AM
> Subject: Re: Bad raid0 bio too large problem
> 
> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
> 
> > Neil Brown <neilb@suse.de> writes:
> >> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
> >>
> >>> Hi Neil,
> >>>
> >>> I think we have some bad side effects with this patch:
> >>>
> >>> commit 199dc6ed5179251fa6158a461499c24bdd99c836
> >>> Author: NeilBrown <neilb@suse.com>
> >>> Date:   Mon Aug 3 13:11:47 2015 +1000
> >>>
> >>>     md/raid0: update queue parameter in a safer location.
> >>>     
> >>>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
> >>>     of queue parameters (e.g. max number of sectors per bio) is
> >>>     done in the wrong place.
> >>>     It should be part of ->run, but it is actually part of ->takeover.
> >>>     This means it happens before level_store() calls:
> >>>     
> >>>         blk_set_stacking_limits(&mddev->queue->limits);
> >>>     
> >>> Running the '03r0assem' test suite fills my kernel log with output like
> >>> below. Yi Zhang also had issues where writes failed too.
> >>>
> >>> robably something we need to resolve for 4.2-final or revert the
> >>> offending patch.
> >>>
> >>> Cheers,
> >>> Jes
> >>>
> >>> md: bind<loop0>
> >>> md: bind<loop1>
> >>> md: bind<loop2>
> >>> md/raid0:md2: md_size is 116736 sectors.
> >>> md: RAID0 configuration for md2 - 1 zone
> >>> md: zone0=[loop0/loop1/loop2]
> >>>       zone-offset=         0KB, device-offset=         0KB, size=
> >>>       58368KB
> >>>
> >>> md2: detected capacity change from 0 to 59768832
> >>> bio too big device loop0 (296 > 255)
> >>> bio too big device loop0 (272 > 255)
> >>
> >> 1/ Why do you blame that particular patch?
> >>
> >> 2/ Where is that error message coming from?  I cannot find "bio too big"
> >>   in the kernel (except in a comment).
> >>   Commit: 54efd50bfd87 ("block: make generic_make_request handle
> >> arbitrarily sized bios")
> >>   removed the only instance of the error message that I know of.
> >>
> >> Which kernel exactly are you testing?
> >
> > I blame it because of bisect - I revert that patch and the issue goes
> > away.
> >
> > I checked out 199dc6ed5179251fa6158a461499c24bdd99c836 in Linus' tree,
> > see the bio too large. I revert it and it goes away.
> 
> Well that's pretty convincing - thanks.
> And as you say - it is tagged for -stable so really needs to be fixed.
> 
> Stares at the code again.  And again.
> 
> Ahhh.  that patch moved the
>   blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
> to after
>  disk_stack_limits(...);
> 
> That is wrong.
> 
> Could you confirm that this fixes your test?
> 
> Thanks,
> NeilBrown
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 4a13c3cb940b..0875e5e7e09a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -431,12 +431,6 @@ static int raid0_run(struct mddev *mddev)
>  		struct md_rdev *rdev;
>  		bool discard_supported = false;
>  
> -		rdev_for_each(rdev, mddev) {
> -			disk_stack_limits(mddev->gendisk, rdev->bdev,
> -					  rdev->data_offset << 9);
> -			if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
> -				discard_supported = true;
> -		}
>  		blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
>  		blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
>  		blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
> @@ -445,6 +439,12 @@ static int raid0_run(struct mddev *mddev)
>  		blk_queue_io_opt(mddev->queue,
>  				 (mddev->chunk_sectors << 9) * mddev->raid_disks);
>  
> +		rdev_for_each(rdev, mddev) {
> +			disk_stack_limits(mddev->gendisk, rdev->bdev,
> +					  rdev->data_offset << 9);
> +			if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
> +				discard_supported = true;
> +		}
>  		if (!discard_supported)
>  			queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
>  		else
> 

Hi Neil, Jes

The problem is fixed with this patch.

Best Regards
Xiao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-24  2:53     ` Neil Brown
  2015-09-24  8:48       ` Xiao Ni
@ 2015-09-24 12:59       ` Jes Sorensen
  2015-09-25  4:23         ` Neil Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Jes Sorensen @ 2015-09-24 12:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: Xiao Ni, linux-raid, yizhan

Neil Brown <neilb@suse.de> writes:
> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>
>> Neil Brown <neilb@suse.de> writes:
>>> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>>>
>>>> Hi Neil,
>>>>
>>>> I think we have some bad side effects with this patch:
>>>>
>>>> commit 199dc6ed5179251fa6158a461499c24bdd99c836
>>>> Author: NeilBrown <neilb@suse.com>
>>>> Date:   Mon Aug 3 13:11:47 2015 +1000
>>>>
>>>>     md/raid0: update queue parameter in a safer location.
>>>>     
>>>>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>>>>     of queue parameters (e.g. max number of sectors per bio) is
>>>>     done in the wrong place.
>>>>     It should be part of ->run, but it is actually part of ->takeover.
>>>>     This means it happens before level_store() calls:
>>>>     
>>>>         blk_set_stacking_limits(&mddev->queue->limits);
>>>>     
>>>> Running the '03r0assem' test suite fills my kernel log with output like
>>>> below. Yi Zhang also had issues where writes failed too.
>>>>
>>>> robably something we need to resolve for 4.2-final or revert the
>>>> offending patch.
>>>>
>>>> Cheers,
>>>> Jes
>>>>
>>>> md: bind<loop0>
>>>> md: bind<loop1>
>>>> md: bind<loop2>
>>>> md/raid0:md2: md_size is 116736 sectors.
>>>> md: RAID0 configuration for md2 - 1 zone
>>>> md: zone0=[loop0/loop1/loop2]
>>>>       zone-offset=         0KB, device-offset=         0KB, size=     58368KB
>>>>
>>>> md2: detected capacity change from 0 to 59768832
>>>> bio too big device loop0 (296 > 255)
>>>> bio too big device loop0 (272 > 255)
>>>
>>> 1/ Why do you blame that particular patch?
>>>
>>> 2/ Where is that error message coming from?  I cannot find "bio too big"
>>>   in the kernel (except in a comment).
>>>   Commit: 54efd50bfd87 ("block: make generic_make_request handle
>>> arbitrarily sized bios")
>>>   removed the only instance of the error message that I know of.
>>>
>>> Which kernel exactly are you testing?
>>
>> I blame it because of bisect - I revert that patch and the issue goes
>> away.
>>
>> I checked out 199dc6ed5179251fa6158a461499c24bdd99c836 in Linus' tree,
>> see the bio too large. I revert it and it goes away.
>
> Well that's pretty convincing - thanks.
> And as you say - it is tagged for -stable so really needs to be fixed.
>
> Stares at the code again.  And again.
>
> Ahhh.  that patch moved the
>   blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
> to after
>  disk_stack_limits(...);
>
> That is wrong.
>
> Could you confirm that this fixes your test?

I refuse to do that! .... since Xiao beat me to it! Thanks!

I was half way bisecting my way through it last night. For some reason
the problem was reproducible in 4.2 if I applied the offending patch,
but not in 4.3-rc2.

Any chance you'll push these to your git tree in the near future?

Thanks!
Jes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bad raid0 bio too large problem
  2015-09-24 12:59       ` Jes Sorensen
@ 2015-09-25  4:23         ` Neil Brown
  0 siblings, 0 replies; 10+ messages in thread
From: Neil Brown @ 2015-09-25  4:23 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Xiao Ni, linux-raid, yizhan

[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]

Jes Sorensen <Jes.Sorensen@redhat.com> writes:

> Neil Brown <neilb@suse.de> writes:
>> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>>
>>> Neil Brown <neilb@suse.de> writes:
>>>> Jes Sorensen <Jes.Sorensen@redhat.com> writes:
>>>>
>>>>> Hi Neil,
>>>>>
>>>>> I think we have some bad side effects with this patch:
>>>>>
>>>>> commit 199dc6ed5179251fa6158a461499c24bdd99c836
>>>>> Author: NeilBrown <neilb@suse.com>
>>>>> Date:   Mon Aug 3 13:11:47 2015 +1000
>>>>>
>>>>>     md/raid0: update queue parameter in a safer location.
>>>>>     
>>>>>     When a (e.g.) RAID5 array is reshaped to RAID0, the updating
>>>>>     of queue parameters (e.g. max number of sectors per bio) is
>>>>>     done in the wrong place.
>>>>>     It should be part of ->run, but it is actually part of ->takeover.
>>>>>     This means it happens before level_store() calls:
>>>>>     
>>>>>         blk_set_stacking_limits(&mddev->queue->limits);
>>>>>     
>>>>> Running the '03r0assem' test suite fills my kernel log with output like
>>>>> below. Yi Zhang also had issues where writes failed too.
>>>>>
>>>>> robably something we need to resolve for 4.2-final or revert the
>>>>> offending patch.
>>>>>
>>>>> Cheers,
>>>>> Jes
>>>>>
>>>>> md: bind<loop0>
>>>>> md: bind<loop1>
>>>>> md: bind<loop2>
>>>>> md/raid0:md2: md_size is 116736 sectors.
>>>>> md: RAID0 configuration for md2 - 1 zone
>>>>> md: zone0=[loop0/loop1/loop2]
>>>>>       zone-offset=         0KB, device-offset=         0KB, size=     58368KB
>>>>>
>>>>> md2: detected capacity change from 0 to 59768832
>>>>> bio too big device loop0 (296 > 255)
>>>>> bio too big device loop0 (272 > 255)
>>>>
>>>> 1/ Why do you blame that particular patch?
>>>>
>>>> 2/ Where is that error message coming from?  I cannot find "bio too big"
>>>>   in the kernel (except in a comment).
>>>>   Commit: 54efd50bfd87 ("block: make generic_make_request handle
>>>> arbitrarily sized bios")
>>>>   removed the only instance of the error message that I know of.
>>>>
>>>> Which kernel exactly are you testing?
>>>
>>> I blame it because of bisect - I revert that patch and the issue goes
>>> away.
>>>
>>> I checked out 199dc6ed5179251fa6158a461499c24bdd99c836 in Linus' tree,
>>> see the bio too large. I revert it and it goes away.
>>
>> Well that's pretty convincing - thanks.
>> And as you say - it is tagged for -stable so really needs to be fixed.
>>
>> Stares at the code again.  And again.
>>
>> Ahhh.  that patch moved the
>>   blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
>> to after
>>  disk_stack_limits(...);
>>
>> That is wrong.
>>
>> Could you confirm that this fixes your test?
>
> I refuse to do that! .... since Xiao beat me to it! Thanks!
>
> I was half way bisecting my way through it last night. For some reason
> the problem was reproducible in 4.2 if I applied the offending patch,
> but not in 4.3-rc2.
>
> Any chance you'll push these to your git tree in the near future?

Pushed.
Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-09-25  4:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-22 15:30 Bad raid0 bio too large problem Jes Sorensen
2015-09-22 16:07 ` Jes Sorensen
2015-09-23  2:25 ` Neil Brown
2015-09-23 11:18   ` Jes Sorensen
2015-09-23 11:20     ` Jes Sorensen
2015-09-23 11:42       ` Jes Sorensen
2015-09-24  2:53     ` Neil Brown
2015-09-24  8:48       ` Xiao Ni
2015-09-24 12:59       ` Jes Sorensen
2015-09-25  4:23         ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).