[patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush
@ 2015-07-29 14:22 Jeff Moyer
  2015-07-29 14:44 ` Jens Axboe
  2015-07-29 15:23 ` Ming Lei
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff Moyer @ 2015-07-29 14:22 UTC (permalink / raw)
  To: Ming Lei, Sam Bradshaw, Asai Thambi SP; +Cc: linux-kernel, axboe, dmilburn

Hi,

After commit f70ced091707 (blk-mq: support per-distpatch_queue flush
machinery), the mtip32xx driver may oops upon module load due to walking
off the end of an array in mtip_init_cmd.  On initialization of the
flush_rq, init_request is called with request_index >= the maximum queue
depth the driver supports.  For mtip32xx, this value is used to index
into an array.  What this means is that the driver will walk off the end
of the array, and either oops or cause random memory corruption.

The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx
driver in a loop.  I can typically reproduce the problem in about 30
seconds.

Now, in the case of mtip32xx, it actually doesn't support flush/fua, so
I think we can simply return without doing anything.  In addition, no
other mq-enabled driver does anything with the request_index passed into
init_request(), so no other driver is affected.  However, I'm not really
sure what is expected of drivers.  Ming, what did you envision drivers
would do when initializing the flush requests?

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 4a2ef09..f504232 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3756,6 +3756,14 @@ static int mtip_init_cmd(void *data, struct request *rq, unsigned int hctx_idx,
 	struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
 	u32 host_cap_64 = readl(dd->mmio + HOST_CAP) & HOST_CAP_64;

+	/*
+	 * For flush requests, request_idx starts at the end of the
+	 * tag space.  Since we don't support FLUSH/FUA, simply return
+	 * 0 as there's nothing to be done.
+	 */
+	if (request_idx >= MTIP_MAX_COMMAND_SLOTS)
+		return 0;
+
 	cmd->command = dmam_alloc_coherent(&dd->pdev->dev, CMD_DMA_ALLOC_SZ,
 			&cmd->command_dma, GFP_KERNEL);
 	if (!cmd->command)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush
  2015-07-29 14:22 [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush Jeff Moyer
@ 2015-07-29 14:44 ` Jens Axboe
  2015-08-05 20:44   ` Jeff Moyer
  2015-07-29 15:23 ` Ming Lei
  1 sibling, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2015-07-29 14:44 UTC (permalink / raw)
  To: Jeff Moyer, Ming Lei, Sam Bradshaw, Asai Thambi SP; +Cc: linux-kernel, dmilburn

On 07/29/2015 08:22 AM, Jeff Moyer wrote:
> Hi,
>
> After commit f70ced091707 (blk-mq: support per-distpatch_queue flush
> machinery), the mtip32xx driver may oops upon module load due to walking
> off the end of an array in mtip_init_cmd.  On initialization of the
> flush_rq, init_request is called with request_index >= the maximum queue
> depth the driver supports.  For mtip32xx, this value is used to index
> into an array.  What this means is that the driver will walk off the end
> of the array, and either oops or cause random memory corruption.
>
> The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx
> driver in a loop.  I can typically reproduce the problem in about 30
> seconds.
>
> Now, in the case of mtip32xx, it actually doesn't support flush/fua, so
> I think we can simply return without doing anything.  In addition, no
> other mq-enabled driver does anything with the request_index passed into
> init_request(), so no other driver is affected.  However, I'm not really
> sure what is expected of drivers.  Ming, what did you envision drivers
> would do when initializing the flush requests?

This is really a bug in the core, we should not have to work around this 
in the driver. I'll take a look at this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush
  2015-07-29 14:22 [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush Jeff Moyer
  2015-07-29 14:44 ` Jens Axboe
@ 2015-07-29 15:23 ` Ming Lei
  1 sibling, 0 replies; 5+ messages in thread
From: Ming Lei @ 2015-07-29 15:23 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Sam Bradshaw, Asai Thambi SP, Linux Kernel Mailing List,
	Jens Axboe, dmilburn

On Wed, Jul 29, 2015 at 10:22 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Hi,
>
> After commit f70ced091707 (blk-mq: support per-distpatch_queue flush
> machinery), the mtip32xx driver may oops upon module load due to walking
> off the end of an array in mtip_init_cmd.  On initialization of the
> flush_rq, init_request is called with request_index >= the maximum queue
> depth the driver supports.  For mtip32xx, this value is used to index
> into an array.  What this means is that the driver will walk off the end
> of the array, and either oops or cause random memory corruption.
>
> The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx
> driver in a loop.  I can typically reproduce the problem in about 30
> seconds.
>
> Now, in the case of mtip32xx, it actually doesn't support flush/fua, so
> I think we can simply return without doing anything.  In addition, no
> other mq-enabled driver does anything with the request_index passed into
> init_request(), so no other driver is affected.  However, I'm not really
> sure what is expected of drivers.  Ming, what did you envision drivers
> would do when initializing the flush requests?

Sorry for not checking this driver.

The flush command's index is deliberately set as this value, and it was
documented in include/linux/blk-mq.h:

         * Tag greater than or equal to queue_depth is for setting up
         * flush request.

Thanks,

>
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
>
> diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
> index 4a2ef09..f504232 100644
> --- a/drivers/block/mtip32xx/mtip32xx.c
> +++ b/drivers/block/mtip32xx/mtip32xx.c
> @@ -3756,6 +3756,14 @@ static int mtip_init_cmd(void *data, struct request *rq, unsigned int hctx_idx,
>         struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
>         u32 host_cap_64 = readl(dd->mmio + HOST_CAP) & HOST_CAP_64;
>
> +       /*
> +        * For flush requests, request_idx starts at the end of the
> +        * tag space.  Since we don't support FLUSH/FUA, simply return
> +        * 0 as there's nothing to be done.
> +        */
> +       if (request_idx >= MTIP_MAX_COMMAND_SLOTS)
> +               return 0;
> +
>         cmd->command = dmam_alloc_coherent(&dd->pdev->dev, CMD_DMA_ALLOC_SZ,
>                         &cmd->command_dma, GFP_KERNEL);
>         if (!cmd->command)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush
  2015-07-29 14:44 ` Jens Axboe
@ 2015-08-05 20:44   ` Jeff Moyer
  2015-08-25 20:36     ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Moyer @ 2015-08-05 20:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Sam Bradshaw, Asai Thambi SP, linux-kernel, dmilburn

Jens Axboe <axboe@kernel.dk> writes:

> On 07/29/2015 08:22 AM, Jeff Moyer wrote:
>> Hi,
>>
>> After commit f70ced091707 (blk-mq: support per-distpatch_queue flush
>> machinery), the mtip32xx driver may oops upon module load due to walking
>> off the end of an array in mtip_init_cmd.  On initialization of the
>> flush_rq, init_request is called with request_index >= the maximum queue
>> depth the driver supports.  For mtip32xx, this value is used to index
>> into an array.  What this means is that the driver will walk off the end
>> of the array, and either oops or cause random memory corruption.
>>
>> The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx
>> driver in a loop.  I can typically reproduce the problem in about 30
>> seconds.
>>
>> Now, in the case of mtip32xx, it actually doesn't support flush/fua, so
>> I think we can simply return without doing anything.  In addition, no
>> other mq-enabled driver does anything with the request_index passed into
>> init_request(), so no other driver is affected.  However, I'm not really
>> sure what is expected of drivers.  Ming, what did you envision drivers
>> would do when initializing the flush requests?
>
> This is really a bug in the core, we should not have to work around
> this in the driver. I'll take a look at this.

Hi, Jens,

Any update on this?

-Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush
  2015-08-05 20:44   ` Jeff Moyer
@ 2015-08-25 20:36     ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2015-08-25 20:36 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Ming Lei, Sam Bradshaw, Asai Thambi SP, linux-kernel, dmilburn

On 08/05/2015 02:44 PM, Jeff Moyer wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>
>> On 07/29/2015 08:22 AM, Jeff Moyer wrote:
>>> Hi,
>>>
>>> After commit f70ced091707 (blk-mq: support per-distpatch_queue flush
>>> machinery), the mtip32xx driver may oops upon module load due to walking
>>> off the end of an array in mtip_init_cmd.  On initialization of the
>>> flush_rq, init_request is called with request_index >= the maximum queue
>>> depth the driver supports.  For mtip32xx, this value is used to index
>>> into an array.  What this means is that the driver will walk off the end
>>> of the array, and either oops or cause random memory corruption.
>>>
>>> The problem is easily reproduced by doing modprobe/rmmod of the mtip32xx
>>> driver in a loop.  I can typically reproduce the problem in about 30
>>> seconds.
>>>
>>> Now, in the case of mtip32xx, it actually doesn't support flush/fua, so
>>> I think we can simply return without doing anything.  In addition, no
>>> other mq-enabled driver does anything with the request_index passed into
>>> init_request(), so no other driver is affected.  However, I'm not really
>>> sure what is expected of drivers.  Ming, what did you envision drivers
>>> would do when initializing the flush requests?
>>
>> This is really a bug in the core, we should not have to work around
>> this in the driver. I'll take a look at this.
>
> Hi, Jens,
>
> Any update on this?

To avoid stalling further on this, I'll apply the simple fix for 4.2 so 
we can move forward. It's a memory corruption issue and should get 
fixed, we can argue details later.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-25 20:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-29 14:22 [patch|rfc] mtip32x: fix regression introduced by blk-mq per-hctx flush Jeff Moyer
2015-07-29 14:44 ` Jens Axboe
2015-08-05 20:44   ` Jeff Moyer
2015-08-25 20:36     ` Jens Axboe
2015-07-29 15:23 ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox