[PATCH] block: don't check request size in blk_cloned_rq_check

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] block: don't check request size in blk_cloned_rq_check_limits()
@ 2016-05-30  7:24 Hannes Reinecke
  2016-06-10 13:19 ` Mike Snitzer
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-05-30  7:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mike Snitzer, Brian King, linux-scsi, linux-block,
	Hannes Reinecke

When checking a cloned request there is no need to check
the overall request size; this won't have changed even
when resubmitting to another queue.
Without this patch ppc64le on ibmvfc fails to boot.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-core.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2475b1c7..e108bf0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2160,11 +2160,6 @@ EXPORT_SYMBOL(submit_bio);
 static int blk_cloned_rq_check_limits(struct request_queue *q,
 				      struct request *rq)
 {
-	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
-		printk(KERN_ERR "%s: over max size limit.\n", __func__);
-		return -EIO;
-	}
-
 	/*
 	 * queue's settings related to segment counting like q->bounce_pfn
 	 * may differ from that of other stacking queues.
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-05-30  7:24 [PATCH] block: don't check request size in blk_cloned_rq_check_limits() Hannes Reinecke
@ 2016-06-10 13:19 ` Mike Snitzer
  2016-06-10 13:30   ` Hannes Reinecke
  2016-06-11  2:22   ` Martin K. Petersen
  0 siblings, 2 replies; 20+ messages in thread
From: Mike Snitzer @ 2016-06-10 13:19 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, Brian King, linux-scsi, linux-block, mark.bergman,
	Martin K. Petersen

On Mon, May 30 2016 at  3:24am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> When checking a cloned request there is no need to check
> the overall request size; this won't have changed even
> when resubmitting to another queue.
> Without this patch ppc64le on ibmvfc fails to boot.

By simply removing the check aren't you papering over the real problem?
Looking at Martin's commit f31dc1cd490539 (which introduced the current
variant of the limits check) I'm not convinced it is equivalent to what
he replaced.  I'll look closer in a bit.

Also you categorized your fix was for "ppc64le on ibmvfc"; whereas Mark
has reported this issue (off-list) against x86_64.  By making it seem
ppc64le specific I didn't take this patch to be generally applicable.

Mike

> ---
>  block/blk-core.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 2475b1c7..e108bf0 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2160,11 +2160,6 @@ EXPORT_SYMBOL(submit_bio);
>  static int blk_cloned_rq_check_limits(struct request_queue *q,
>  				      struct request *rq)
>  {
> -	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
> -		printk(KERN_ERR "%s: over max size limit.\n", __func__);
> -		return -EIO;
> -	}
> -
>  	/*
>  	 * queue's settings related to segment counting like q->bounce_pfn
>  	 * may differ from that of other stacking queues.
> -- 
> 1.8.5.6
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-10 13:19 ` Mike Snitzer
@ 2016-06-10 13:30   ` Hannes Reinecke
  2016-06-10 14:18     ` Mike Snitzer
  2016-06-11  2:22   ` Martin K. Petersen
  1 sibling, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-06-10 13:30 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Brian King, linux-scsi, linux-block, mark.bergman,
	Martin K. Petersen

On 06/10/2016 03:19 PM, Mike Snitzer wrote:
> On Mon, May 30 2016 at  3:24am -0400,
> Hannes Reinecke <hare@suse.de> wrote:
> 
>> When checking a cloned request there is no need to check
>> the overall request size; this won't have changed even
>> when resubmitting to another queue.
>> Without this patch ppc64le on ibmvfc fails to boot.
> 
> By simply removing the check aren't you papering over the real problem?
> Looking at Martin's commit f31dc1cd490539 (which introduced the current
> variant of the limits check) I'm not convinced it is equivalent to what
> he replaced.  I'll look closer in a bit.
> 
The check itself is wrong, as we need (at least) to check the
max_hw_sectors here; the request is already fully assembled, so there is
a really good chance he's going beyond the max_sectors.
But trying the error still was found to be present.
So I decided to rip it out, as the overall value of this check is zero.

> Also you categorized your fix was for "ppc64le on ibmvfc"; whereas Mark
> has reported this issue (off-list) against x86_64.  By making it seem
> ppc64le specific I didn't take this patch to be generally applicable.
> 
Well, it has been observed on ppc64. That doesn't mean _only_ ppc64 is
affected. If it were ppc64 only it should've been marked as such, right?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: F. Imendï¿½rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-10 13:30   ` Hannes Reinecke
@ 2016-06-10 14:18     ` Mike Snitzer
  2016-06-11 10:05       ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-06-10 14:18 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, Brian King, linux-scsi, linux-block, mark.bergman,
	Martin K. Petersen

On Fri, Jun 10 2016 at  9:30am -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On 06/10/2016 03:19 PM, Mike Snitzer wrote:
> > On Mon, May 30 2016 at  3:24am -0400,
> > Hannes Reinecke <hare@suse.de> wrote:
> > 
> >> When checking a cloned request there is no need to check
> >> the overall request size; this won't have changed even
> >> when resubmitting to another queue.
> >> Without this patch ppc64le on ibmvfc fails to boot.
> > 
> > By simply removing the check aren't you papering over the real problem?
> > Looking at Martin's commit f31dc1cd490539 (which introduced the current
> > variant of the limits check) I'm not convinced it is equivalent to what
> > he replaced.  I'll look closer in a bit.
> > 
> The check itself is wrong, as we need (at least) to check the
> max_hw_sectors here; the request is already fully assembled, so there is
> a really good chance he's going beyond the max_sectors.
> But trying the error still was found to be present.
> So I decided to rip it out, as the overall value of this check is zero.

fine, any chance you can improve the header to include these details.
At least mention that commit f31dc1cd490539 incorrectly removed the
max_hw_sectors checking.  And then please add these tags to a v2 repost:

Fixes: f31dc1cd490539 ("block: Consolidate command flag and queue limit checks for merges")
Reported-by: Mark Bergman <mark.bergman@uphs.upenn.edu>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.7+

> > Also you categorized your fix was for "ppc64le on ibmvfc"; whereas Mark
> > has reported this issue (off-list) against x86_64.  By making it seem
> > ppc64le specific I didn't take this patch to be generally applicable.
> > 
> Well, it has been observed on ppc64. That doesn't mean _only_ ppc64 is
> affected. If it were ppc64 only it should've been marked as such, right?

If it is a generic problem, being specific about the hardware you saw it
on leads idiots like me to filter unnecessarily ;)

Though I'm curious what you mean by "it should've been marked as
such".. "it" being what?  The patch?  And how would it have been marked
as ppc64 only?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-10 13:19 ` Mike Snitzer
  2016-06-10 13:30   ` Hannes Reinecke
@ 2016-06-11  2:22   ` Martin K. Petersen
  2016-06-11 10:01     ` Hannes Reinecke
  1 sibling, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2016-06-11  2:22 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Hannes Reinecke, Jens Axboe, Brian King, linux-scsi, linux-block,
	mark.bergman, Martin K. Petersen

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

>> When checking a cloned request there is no need to check the overall
>> request size; this won't have changed even when resubmitting to
>> another queue.  Without this patch ppc64le on ibmvfc fails to boot.

Why is the number of sectors in the request bigger than the queue limit?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-11  2:22   ` Martin K. Petersen
@ 2016-06-11 10:01     ` Hannes Reinecke
  2016-06-11 11:06       ` Martin K. Petersen
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-06-11 10:01 UTC (permalink / raw)
  To: Martin K. Petersen, Mike Snitzer
  Cc: Jens Axboe, Brian King, linux-scsi, linux-block, mark.bergman

On 06/11/2016 04:22 AM, Martin K. Petersen wrote:
>>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>
>>> When checking a cloned request there is no need to check the overall
>>> request size; this won't have changed even when resubmitting to
>>> another queue.  Without this patch ppc64le on ibmvfc fails to boot.
>
> Why is the number of sectors in the request bigger than the queue limit?
>
Because we're checking the wrong limit.
blk_queue_get_max_sectors() is checking limits.max_sectors(), but the 
requests are already fully formed and can extend up to 
limits.max_hw_sectors().

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: J. Hawn, J. Guild, F. Imendï¿½rffer, HRB 16746 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-10 14:18     ` Mike Snitzer
@ 2016-06-11 10:05       ` Hannes Reinecke
  0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2016-06-11 10:05 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Brian King, linux-scsi, linux-block, mark.bergman,
	Martin K. Petersen

On 06/10/2016 04:18 PM, Mike Snitzer wrote:
> On Fri, Jun 10 2016 at  9:30am -0400,
> Hannes Reinecke <hare@suse.de> wrote:
>
>> On 06/10/2016 03:19 PM, Mike Snitzer wrote:
>>> On Mon, May 30 2016 at  3:24am -0400,
>>> Hannes Reinecke <hare@suse.de> wrote:
>>>
>>>> When checking a cloned request there is no need to check
>>>> the overall request size; this won't have changed even
>>>> when resubmitting to another queue.
>>>> Without this patch ppc64le on ibmvfc fails to boot.
>>>
>>> By simply removing the check aren't you papering over the real problem?
>>> Looking at Martin's commit f31dc1cd490539 (which introduced the current
>>> variant of the limits check) I'm not convinced it is equivalent to what
>>> he replaced.  I'll look closer in a bit.
>>>
>> The check itself is wrong, as we need (at least) to check the
>> max_hw_sectors here; the request is already fully assembled, so there is
>> a really good chance he's going beyond the max_sectors.
>> But trying the error still was found to be present.
>> So I decided to rip it out, as the overall value of this check is zero.
>
> fine, any chance you can improve the header to include these details.
> At least mention that commit f31dc1cd490539 incorrectly removed the
> max_hw_sectors checking.  And then please add these tags to a v2 repost:
>
> Fixes: f31dc1cd490539 ("block: Consolidate command flag and queue limit checks for merges")
> Reported-by: Mark Bergman <mark.bergman@uphs.upenn.edu>
> Acked-by: Mike Snitzer <snitzer@redhat.com>
> Cc: stable@vger.kernel.org # 3.7+
>
Okay, will be doing for a repost.

>>> Also you categorized your fix was for "ppc64le on ibmvfc"; whereas Mark
>>> has reported this issue (off-list) against x86_64.  By making it seem
>>> ppc64le specific I didn't take this patch to be generally applicable.
>>>
>> Well, it has been observed on ppc64. That doesn't mean _only_ ppc64 is
>> affected. If it were ppc64 only it should've been marked as such, right?
>
> If it is a generic problem, being specific about the hardware you saw it
> on leads idiots like me to filter unnecessarily ;)
>
> Though I'm curious what you mean by "it should've been marked as
> such".. "it" being what?  The patch?  And how would it have been marked
> as ppc64 only?
Exactly my point.
I was just trying to figure out what caused you to ignore the patch.

Anyway.

Will be reposting a v2 once Martin is happy.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: J. Hawn, J. Guild, F. Imendï¿½rffer, HRB 16746 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-11 10:01     ` Hannes Reinecke
@ 2016-06-11 11:06       ` Martin K. Petersen
  2016-06-11 13:10         ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Martin K. Petersen @ 2016-06-11 11:06 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Mike Snitzer, Jens Axboe, Brian King,
	linux-scsi, linux-block, mark.bergman

>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:

Hannes> Because we're checking the wrong limit.

The original code checked that the request was smaller than both
max_sectors and max_hw_sectors so it would have failed here as well.

Hannes> blk_queue_get_max_sectors() is checking limits.max_sectors(),
Hannes> but the requests are already fully formed and can extend up to
Hannes> limits.max_hw_sectors().

We should not be issuing REQ_TYPE_FS requests larger than max_sectors.
How did we get here?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-11 11:06       ` Martin K. Petersen
@ 2016-06-11 13:10         ` Hannes Reinecke
  2016-06-13  8:07           ` Christoph Hellwig
  2016-06-15  1:39           ` Martin K. Petersen
  0 siblings, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2016-06-11 13:10 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Mike Snitzer, Jens Axboe, Brian King, linux-scsi, linux-block,
	mark.bergman

On 06/11/2016 01:06 PM, Martin K. Petersen wrote:
>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
>
> Hannes> Because we're checking the wrong limit.
>
> The original code checked that the request was smaller than both
> max_sectors and max_hw_sectors so it would have failed here as well.
>
> Hannes> blk_queue_get_max_sectors() is checking limits.max_sectors(),
> Hannes> but the requests are already fully formed and can extend up to
> Hannes> limits.max_hw_sectors().
>
> We should not be issuing REQ_TYPE_FS requests larger than max_sectors.
> How did we get here?
>
Well, the primary issue is that 'blk_cloned_rq_check_limits()' doesn't 
check for BLOCK_PC, so this particular check would be applied for every 
request.

But as it turns out, even adding a check for BLOCK_PC doesn't help, so 
we're indeed seeing REQ_TYPE_FS requests with larger max_sector counts.

As to _why_ this happens I frankly have no idea. I have been staring at 
this particular code for over a year now (I've got another bug pending 
where we hit the _other_ if clause), but to no avail.
So I've resolved to drop the check altogether, seeing that max_sector 
size is _not_ something which gets changed during failover.
Therefore if the max_sector count is wrong for the cloned request it was 
already wrong for the original request, and we should've errored it out 
far earlier.
The max_segments count, OTOH, _might_ change during failover (different 
hardware has different max_segments setting, and this is being changed 
during sg mapping), so there is some value to be had from testing it here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: J. Hawn, J. Guild, F. Imendï¿½rffer, HRB 16746 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-11 13:10         ` Hannes Reinecke
@ 2016-06-13  8:07           ` Christoph Hellwig
  2016-06-15  1:39           ` Martin K. Petersen
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-06-13  8:07 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Mike Snitzer, Jens Axboe, Brian King,
	linux-scsi, linux-block, mark.bergman

On Sat, Jun 11, 2016 at 03:10:06PM +0200, Hannes Reinecke wrote:
> Well, the primary issue is that 'blk_cloned_rq_check_limits()' doesn't check
> for BLOCK_PC, so this particular check would be applied for every request.

So fix it..

> But as it turns out, even adding a check for BLOCK_PC doesn't help, so we're
> indeed seeing REQ_TYPE_FS requests with larger max_sector counts.
> 
> As to _why_ this happens I frankly have no idea. I have been staring at this
> particular code for over a year now (I've got another bug pending where we
> hit the _other_ if clause), but to no avail.
> So I've resolved to drop the check altogether, seeing that max_sector size
> is _not_ something which gets changed during failover.
> Therefore if the max_sector count is wrong for the cloned request it was
> already wrong for the original request, and we should've errored it out far
> earlier.
> The max_segments count, OTOH, _might_ change during failover (different
> hardware has different max_segments setting, and this is being changed
> during sg mapping), so there is some value to be had from testing it here.

I really think we need to drill down and figure out what's going on here
first.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-11 13:10         ` Hannes Reinecke
  2016-06-13  8:07           ` Christoph Hellwig
@ 2016-06-15  1:39           ` Martin K. Petersen
  2016-06-15  2:29             ` Mike Snitzer
  2016-06-15  6:33             ` Hannes Reinecke
  1 sibling, 2 replies; 20+ messages in thread
From: Martin K. Petersen @ 2016-06-15  1:39 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Mike Snitzer, Jens Axboe, Brian King,
	linux-scsi, linux-block, mark.bergman

>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:

Hannes> Well, the primary issue is that 'blk_cloned_rq_check_limits()'
Hannes> doesn't check for BLOCK_PC,

Yes it does. It calls blk_rq_get_max_sectors() which has an explicit
check for this:

static inline unsigned int blk_rq_get_max_sectors(struct request *rq)
{
        struct request_queue *q = rq->q;

        if (unlikely(rq->cmd_type != REQ_TYPE_FS))
                        return q->limits.max_hw_sectors;
[...]

Hannes> The max_segments count, OTOH, _might_ change during failover
Hannes> (different hardware has different max_segments setting, and this
Hannes> is being changed during sg mapping), so there is some value to
Hannes> be had from testing it here.

Oh, this happens during failover? Are you sure it's not because DM is
temporarily resetting the queue limits? max_sectors is going to be a
single page in that case. I just discussed a backport regression in this
department with Mike at LSF/MM. But that was for an older kernel.

Accidentally resetting the limits during table swaps has happened a
couple of times over the years. We trip it instantly with the database
in failover testing.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-15  1:39           ` Martin K. Petersen
@ 2016-06-15  2:29             ` Mike Snitzer
  2016-06-15  2:32               ` Martin K. Petersen
  2016-06-15  6:33             ` Hannes Reinecke
  1 sibling, 1 reply; 20+ messages in thread
From: Mike Snitzer @ 2016-06-15  2:29 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Hannes Reinecke, Jens Axboe, Brian King, linux-scsi, linux-block,
	mark.bergman

On Tue, Jun 14 2016 at  9:39pm -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
> 
> Hannes> Well, the primary issue is that 'blk_cloned_rq_check_limits()'
> Hannes> doesn't check for BLOCK_PC,
> 
> Yes it does. It calls blk_rq_get_max_sectors() which has an explicit
> check for this:
> 
> static inline unsigned int blk_rq_get_max_sectors(struct request *rq)
> {
>         struct request_queue *q = rq->q;
> 
>         if (unlikely(rq->cmd_type != REQ_TYPE_FS))
>                         return q->limits.max_hw_sectors;
> [...]
> 
> Hannes> The max_segments count, OTOH, _might_ change during failover
> Hannes> (different hardware has different max_segments setting, and this
> Hannes> is being changed during sg mapping), so there is some value to
> Hannes> be had from testing it here.
> 
> Oh, this happens during failover? Are you sure it's not because DM is
> temporarily resetting the queue limits? max_sectors is going to be a
> single page in that case. I just discussed a backport regression in this
> department with Mike at LSF/MM. But that was for an older kernel.

Not aware of any limits reset issue now...

> Accidentally resetting the limits during table swaps has happened a
> couple of times over the years. We trip it instantly with the database
> in failover testing.

But feel free to throw your DB in failover tests (w/ dm-mpath) at a
recent kernel ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-15  2:29             ` Mike Snitzer
@ 2016-06-15  2:32               ` Martin K. Petersen
  0 siblings, 0 replies; 20+ messages in thread
From: Martin K. Petersen @ 2016-06-15  2:32 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, Hannes Reinecke, Jens Axboe, Brian King,
	linux-scsi, linux-block, mark.bergman

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

>> Oh, this happens during failover? Are you sure it's not because DM is
>> temporarily resetting the queue limits? max_sectors is going to be a
>> single page in that case. I just discussed a backport regression in
>> this department with Mike at LSF/MM. But that was for an older
>> kernel.

Mike> Not aware of any limits reset issue now...

Me neither, I was assuming that Hannes is seeing this with SLES.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-15  1:39           ` Martin K. Petersen
  2016-06-15  2:29             ` Mike Snitzer
@ 2016-06-15  6:33             ` Hannes Reinecke
  2016-06-15 10:03               ` Jens Axboe
  1 sibling, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-06-15  6:33 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Mike Snitzer, Jens Axboe, Brian King, linux-scsi, linux-block,
	mark.bergman

On 06/15/2016 03:39 AM, Martin K. Petersen wrote:
>>>>>> "Hannes" == Hannes Reinecke <hare@suse.de> writes:
> 
> Hannes> Well, the primary issue is that 'blk_cloned_rq_check_limits()'
> Hannes> doesn't check for BLOCK_PC,
> 
> Yes it does. It calls blk_rq_get_max_sectors() which has an explicit
> check for this:
> 
> static inline unsigned int blk_rq_get_max_sectors(struct request *rq)
> {
>         struct request_queue *q = rq->q;
> 
>         if (unlikely(rq->cmd_type != REQ_TYPE_FS))
>                         return q->limits.max_hw_sectors;
> [...]
> 
> Hannes> The max_segments count, OTOH, _might_ change during failover
> Hannes> (different hardware has different max_segments setting, and this
> Hannes> is being changed during sg mapping), so there is some value to
> Hannes> be had from testing it here.
> 
> Oh, this happens during failover? Are you sure it's not because DM is
> temporarily resetting the queue limits? max_sectors is going to be a
> single page in that case. I just discussed a backport regression in this
> department with Mike at LSF/MM. But that was for an older kernel.
> 
> Accidentally resetting the limits during table swaps has happened a
> couple of times over the years. We trip it instantly with the database
> in failover testing.
> 
Unfortunately, we have two distinct bugs lurking in that function.
The resetting limits during failover is something we've probably hitting
with a mixed initiator setting, but it will typically materialize as a
transient error when tripping over the _other_ check.
But this particular issue is seen directly during booting.

And as I've mentioned before: what is the purpose of this check?

'max_sectors' and 'max_hw_sectors' are checked during request assembly,
and those limits are not changed even after calling
blk_recalc_rq_segments(). And if we go over any device-imposed
restrictions we'll be getting an I/O error from the driver anyway.
So why have it at all?

Especially as the system boots happily with this check removed...

Cheers,

-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: F. Imendï¿½rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-15  6:33             ` Hannes Reinecke
@ 2016-06-15 10:03               ` Jens Axboe
  2016-06-15 10:33                 ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2016-06-15 10:03 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Mike Snitzer, Brian King, linux-scsi, linux-block, mark.bergman

On 06/15/2016 08:33 AM, Hannes Reinecke wrote:
> And as I've mentioned before: what is the purpose of this check?
>
> 'max_sectors' and 'max_hw_sectors' are checked during request assembly,
> and those limits are not changed even after calling
> blk_recalc_rq_segments(). And if we go over any device-imposed
> restrictions we'll be getting an I/O error from the driver anyway.
> So why have it at all?

You don't know that to be the case. The driver asked for certain limits, 
the core MUST obey them. The driver should not need to check for these 
limits, outside of in a BUG_ON() like manner.

> Especially as the system boots happily with this check removed...

That's the case for you, but you can't assume this to be the case in 
general.

There's a _lot_ of hand waving in this thread, Hannes. How do we 
reproduce this? We need to get this fixed for real, not just delete some 
check that triggers for you and that it just happens to work without. 
That's not how you fix problems.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-15 10:03               ` Jens Axboe
@ 2016-06-15 10:33                 ` Hannes Reinecke
  2016-06-15 16:34                   ` Brian King
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2016-06-15 10:33 UTC (permalink / raw)
  To: Jens Axboe, Martin K. Petersen
  Cc: Mike Snitzer, Brian King, linux-scsi, linux-block, mark.bergman,
	Brian King

On 06/15/2016 12:03 PM, Jens Axboe wrote:
> On 06/15/2016 08:33 AM, Hannes Reinecke wrote:
>> And as I've mentioned before: what is the purpose of this check?
>>
>> 'max_sectors' and 'max_hw_sectors' are checked during request assembly,
>> and those limits are not changed even after calling
>> blk_recalc_rq_segments(). And if we go over any device-imposed
>> restrictions we'll be getting an I/O error from the driver anyway.
>> So why have it at all?
> 
> You don't know that to be the case. The driver asked for certain limits,
> the core MUST obey them. The driver should not need to check for these
> limits, outside of in a BUG_ON() like manner.
> 
Okay.
But again, what is the purpose of this check?
I do agree that we need to do a sanity check after we're recalculated
the sg elements, but we never recalculate the overall size of the request.
So why do we check only max_sector_size?
Why not max_segment_size?
Surely the queue limits can be different for that, too?

>> Especially as the system boots happily with this check removed...
> 
> That's the case for you, but you can't assume this to be the case in
> general.
> 
> There's a _lot_ of hand waving in this thread, Hannes. How do we
> reproduce this? We need to get this fixed for real, not just delete some
> check that triggers for you and that it just happens to work without.
> That's not how you fix problems.
> 
Well. Yes, I know.
This issue is ATM only ever reproduced on ppc64le running on ibmvfc. And
has been reported by a customer to us.
So there is not much I can do with reproducing here, sadly.

Maybe Brian King has some ideas/possibilities for this.
Brian, can you reproduce this with latest upstream kernel?
If so, can you file a bug at kernel.org?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: F. Imendï¿½rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-15 10:33                 ` Hannes Reinecke
@ 2016-06-15 16:34                   ` Brian King
  2016-06-16 12:35                     ` Mauricio Faria de Oliveira
  0 siblings, 1 reply; 20+ messages in thread
From: Brian King @ 2016-06-15 16:34 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe, Martin K. Petersen
  Cc: Mike Snitzer, Brian King, linux-scsi, linux-block, mark.bergman,
	Mauricio Faria de Oliveira

On 06/15/2016 05:33 AM, Hannes Reinecke wrote:
> On 06/15/2016 12:03 PM, Jens Axboe wrote:
>> On 06/15/2016 08:33 AM, Hannes Reinecke wrote:
>>> And as I've mentioned before: what is the purpose of this check?
>>>
>>> 'max_sectors' and 'max_hw_sectors' are checked during request assembly,
>>> and those limits are not changed even after calling
>>> blk_recalc_rq_segments(). And if we go over any device-imposed
>>> restrictions we'll be getting an I/O error from the driver anyway.
>>> So why have it at all?
>>
>> You don't know that to be the case. The driver asked for certain limits,
>> the core MUST obey them. The driver should not need to check for these
>> limits, outside of in a BUG_ON() like manner.
>>
> Okay.
> But again, what is the purpose of this check?
> I do agree that we need to do a sanity check after we're recalculated
> the sg elements, but we never recalculate the overall size of the request.
> So why do we check only max_sector_size?
> Why not max_segment_size?
> Surely the queue limits can be different for that, too?
> 
>>> Especially as the system boots happily with this check removed...
>>
>> That's the case for you, but you can't assume this to be the case in
>> general.
>>
>> There's a _lot_ of hand waving in this thread, Hannes. How do we
>> reproduce this? We need to get this fixed for real, not just delete some
>> check that triggers for you and that it just happens to work without.
>> That's not how you fix problems.
>>
> Well. Yes, I know.
> This issue is ATM only ever reproduced on ppc64le running on ibmvfc. And
> has been reported by a customer to us.
> So there is not much I can do with reproducing here, sadly.
> 
> Maybe Brian King has some ideas/possibilities for this.
> Brian, can you reproduce this with latest upstream kernel?
> If so, can you file a bug at kernel.org?

Mauricio was looking at this, adding him to cc. We did have a KVM config
where we could reproduce this issue as well, I think with some PCI passthrough
adapters. Mauricio - do you have any more details about the KVM config that
reproduced this issue and did you ever try to reproduce this with an upstream
kernel?

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-15 16:34                   ` Brian King
@ 2016-06-16 12:35                     ` Mauricio Faria de Oliveira
  2016-06-16 21:59                       ` Mauricio Faria de Oliveira
  0 siblings, 1 reply; 20+ messages in thread
From: Mauricio Faria de Oliveira @ 2016-06-16 12:35 UTC (permalink / raw)
  To: Brian King, Hannes Reinecke, Jens Axboe, Martin K. Petersen
  Cc: Mike Snitzer, Brian King, linux-scsi, linux-block, mark.bergman

Hey,

On 06/15/2016 01:34 PM, Brian King wrote:
> Mauricio was looking at this, adding him to cc. We did have a KVM config
> where we could reproduce this issue as well, I think with some PCI passthrough
> adapters. Mauricio - do you have any more details about the KVM config that
> reproduced this issue and did you ever try to reproduce this with an upstream
> kernel?

It's KVM guest w/ SLES 12 SP1 + kernel updates (3.12.53-60.30 introduced
the error) and PCI passthrough of Emulex FC and FCoE adapters, connected
to 4 storage systems (DS8000, XIV, SVC, and FlashSystem 840).

It seems only the XIV LUNs never hit the problem, but it didn't seem to
be storage-specific, as FS840 LUNs had a mix of hit/not-hit the problem.
One thing is that all paths of a LUN were either failed or not - no mix
within a LUN.

Unfortunately not too much analysis was performed on this system at the
time -- Hannes had already made good progress w/ the customer, and some
test kernel builds that resolved the issue were made available soon.

Now that the topic is under discussion, I've asked for some time slots
on that system, so we can test an upstream kernel and try to reproduce
the problem and analyze it more closely.

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-16 12:35                     ` Mauricio Faria de Oliveira
@ 2016-06-16 21:59                       ` Mauricio Faria de Oliveira
  2016-06-17  6:59                         ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Mauricio Faria de Oliveira @ 2016-06-16 21:59 UTC (permalink / raw)
  To: Brian King, Hannes Reinecke, Jens Axboe, Martin K. Petersen
  Cc: Mike Snitzer, Brian King, linux-scsi, linux-block, mark.bergman

On 06/16/2016 09:35 AM, Mauricio Faria de Oliveira wrote:
> On 06/15/2016 01:34 PM, Brian King wrote:
>> [...] did you ever try to reproduce this with an
>> upstream
>> kernel?

> Now that the topic is under discussion, I've asked for some time slots
> on that system, so we can test an upstream kernel and try to reproduce
> the problem and analyze it more closely.

The problem is not reproducible w/ 4.7.0-rc3 even after a few hours;
it is instantly reproducible w/ 3.12.57-60.35 (SLES 12 SP1 updates).

Hannes et al,

If you'd like some data collection from this system, feel free to ping
me about it.

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: block: don't check request size in blk_cloned_rq_check_limits()
  2016-06-16 21:59                       ` Mauricio Faria de Oliveira
@ 2016-06-17  6:59                         ` Hannes Reinecke
  0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2016-06-17  6:59 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira, Brian King, Jens Axboe,
	Martin K. Petersen
  Cc: Mike Snitzer, Brian King, linux-scsi, linux-block, mark.bergman

On 06/16/2016 11:59 PM, Mauricio Faria de Oliveira wrote:
> On 06/16/2016 09:35 AM, Mauricio Faria de Oliveira wrote:
>> On 06/15/2016 01:34 PM, Brian King wrote:
>>> [...] did you ever try to reproduce this with an
>>> upstream
>>> kernel?
>
>> Now that the topic is under discussion, I've asked for some time slots
>> on that system, so we can test an upstream kernel and try to reproduce
>> the problem and analyze it more closely.
>
> The problem is not reproducible w/ 4.7.0-rc3 even after a few hours;
> it is instantly reproducible w/ 3.12.57-60.35 (SLES 12 SP1 updates).
>
> Hannes et al,
>
> If you'd like some data collection from this system, feel free to ping
> me about it.
>
Ouch.

Okay, continue discussion on the SUSE bugzilla.
Guess we've missed some backports then.

Sorry for the noise.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: J. Hawn, J. Guild, F. Imendï¿½rffer, HRB 16746 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-06-17  6:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-30  7:24 [PATCH] block: don't check request size in blk_cloned_rq_check_limits() Hannes Reinecke
2016-06-10 13:19 ` Mike Snitzer
2016-06-10 13:30   ` Hannes Reinecke
2016-06-10 14:18     ` Mike Snitzer
2016-06-11 10:05       ` Hannes Reinecke
2016-06-11  2:22   ` Martin K. Petersen
2016-06-11 10:01     ` Hannes Reinecke
2016-06-11 11:06       ` Martin K. Petersen
2016-06-11 13:10         ` Hannes Reinecke
2016-06-13  8:07           ` Christoph Hellwig
2016-06-15  1:39           ` Martin K. Petersen
2016-06-15  2:29             ` Mike Snitzer
2016-06-15  2:32               ` Martin K. Petersen
2016-06-15  6:33             ` Hannes Reinecke
2016-06-15 10:03               ` Jens Axboe
2016-06-15 10:33                 ` Hannes Reinecke
2016-06-15 16:34                   ` Brian King
2016-06-16 12:35                     ` Mauricio Faria de Oliveira
2016-06-16 21:59                       ` Mauricio Faria de Oliveira
2016-06-17  6:59                         ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).