public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
@ 2019-02-20  3:11 Ming Lei
  2019-02-20  3:13 ` Ming Lei
  2019-02-20 14:17 ` Christoph Hellwig
  0 siblings, 2 replies; 11+ messages in thread
From: Ming Lei @ 2019-02-20  3:11 UTC (permalink / raw)
  To: Christoph Hellwig, linux-block, linux-nvme, Chaitanya Kulkarni,
	Martin K. Petersen, Bart Van Assche

Hi,

The kernel oops[1] is observed when the following commit is applied.
And the panic disappears after it is reverted.


commit 6e02318eaea53eaafe628c4ffc254f57b2704561
Author: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Date:   Mon Dec 17 22:42:03 2018 -0500

    nvme: add support for the Write Zeroes command



[1] panic log
[   40.360720] ------------[ cut here ]------------
[   40.361396] kernel BUG at lib/sg_pool.c:103!
[   40.362042] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   40.362918] CPU: 2 PID: 400 Comm: kworker/2:1H Not tainted 5.0.0-rc4_6e02318eaea5+ #125
[   40.364021] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
[   40.365224] Workqueue: kblockd blk_mq_run_work_fn
[   40.365930] RIP: 0010:sg_alloc_table_chained+0x7/0x6b
[   40.366632] Code: 8d 56 ff 0f bd c6 85 f2 0f 95 c2 0f b6 d2 01 d0 83 e8 03 48 c1 e0 05 48 8b b0 98 b7 13 82 e9 4f ed e0 ff 85 f6 55 53 51 75 02 <0f> 0b 48 85 d2 48 89 d1 48 89 fb 40 0f 95 c5 81 fe 80 00 00 00 7f
[   40.369197] RSP: 0000:ffffc9000075bc60 EFLAGS: 00010246
[   40.369927] RAX: ffff8880251b2698 RBX: ffff8880255db7c0 RCX: ffff88803696a368
[   40.370913] RDX: ffff8880251b26a8 RSI: 0000000000000000 RDI: ffff8880251b2698
[   40.371896] RBP: ffff888072048950 R08: 00000000000003e8 R09: 0000000000000001
[   40.372877] R10: ffff8880255e0868 R11: 0000000000000001 R12: ffff888035d4c100
[   40.373853] R13: ffff88806b282d80 R14: ffff8880251b2480 R15: ffffc9000075bd3c
[   40.374835] FS:  0000000000000000(0000) GS:ffff888079c00000(0000) knlGS:0000000000000000
[   40.375941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.376739] CR2: 00007f1e739be000 CR3: 0000000024cb0002 CR4: 0000000000760ee0
[   40.377732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   40.378711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   40.379692] PKRU: 55555554
[   40.380076] Call Trace:
[   40.380447]  nvme_rdma_queue_rq+0x310/0x5d7 [nvme_rdma]
[   40.381185]  blk_mq_try_issue_directly+0x112/0x1f0
[   40.381868]  blk_insert_cloned_request+0xdf/0xfb
[   40.382522]  ? ktime_get+0x3f/0x92
[   40.383032]  dm_mq_queue_rq+0x29f/0x36b [dm_mod]
[   40.383696]  ? __switch_to_asm+0x40/0x70
[   40.384246]  blk_mq_dispatch_rq_list+0x28d/0x45d
[   40.384898]  ? _raw_spin_unlock+0x16/0x27
[   40.385451]  ? blk_mq_flush_busy_ctxs+0x8a/0x17c
[   40.386105]  blk_mq_sched_dispatch_requests+0x129/0x14b
[   40.386864]  __blk_mq_run_hw_queue+0xa4/0xcc
[   40.387475]  process_one_work+0x1c9/0x302
[   40.388054]  ? rescuer_thread+0x282/0x282
[   40.388614]  worker_thread+0x1ca/0x295
[   40.389151]  kthread+0x115/0x11d
[   40.389612]  ? kthread_park+0x76/0x76
[   40.390128]  ret_from_fork+0x35/0x40
[   40.390640] Modules linked in: nvme_rdma nvme_fabrics nvmet_rdma rdma_cm iw_cm ib_cm nvmet crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel ib_core null_blk scsi_debug isofs dm_service_time iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core lpc_ich mfd_core ip_tables sr_mod cdrom usb_storage sd_mod ahci libahci libata crc32c_intel virtio_scsi qemu_fw_cfg dm_mirror dm_region_hash dm_log dm_multipath dm_mod
[   40.395481] Dumping ftrace buffer:
[   40.395996]    (ftrace buffer empty)
[   40.396532] ---[ end trace c0ae1e79f5f72e15 ]---

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-20  3:11 Regression: NVMe: kernel BUG at lib/sg_pool.c:103! Ming Lei
@ 2019-02-20  3:13 ` Ming Lei
  2019-02-20 14:17 ` Christoph Hellwig
  1 sibling, 0 replies; 11+ messages in thread
From: Ming Lei @ 2019-02-20  3:13 UTC (permalink / raw)
  To: Christoph Hellwig, linux-block, linux-nvme, Chaitanya Kulkarni,
	Martin K. Petersen, Bart Van Assche

On Wed, Feb 20, 2019 at 11:11:22AM +0800, Ming Lei wrote:
> Hi,
> 
> The kernel oops[1] is observed when the following commit is applied.
> And the panic disappears after it is reverted.

Forget to mention, it is triggered by running 'nvmeof-mp/002' of blktests.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-20  3:11 Regression: NVMe: kernel BUG at lib/sg_pool.c:103! Ming Lei
  2019-02-20  3:13 ` Ming Lei
@ 2019-02-20 14:17 ` Christoph Hellwig
  2019-02-20 16:59   ` Chaitanya Kulkarni
  2019-02-20 18:16   ` Chaitanya Kulkarni
  1 sibling, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2019-02-20 14:17 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-block, linux-nvme, Chaitanya Kulkarni,
	Martin K. Petersen, Bart Van Assche

We shouldn't be allocating a scatterlist for a command that doesn't
have a payload.

The blk_rq_payload_bytes check in nvme_rdma_map_data is supposed to
prevent that.

Chaitanya, can you try to debug why this is not working?  I'm on
vacation and don't have much time right now unfortunately.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-20 14:17 ` Christoph Hellwig
@ 2019-02-20 16:59   ` Chaitanya Kulkarni
  2019-02-20 18:16   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 11+ messages in thread
From: Chaitanya Kulkarni @ 2019-02-20 16:59 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	Martin K. Petersen, Bart Van Assche

On 02/20/2019 06:17 AM, Christoph Hellwig wrote:
> We shouldn't be allocating a scatterlist for a command that doesn't
> have a payload.
>
> The blk_rq_payload_bytes check in nvme_rdma_map_data is supposed to
> prevent that.
>
> Chaitanya, can you try to debug why this is not working?  I'm on
> vacation and don't have much time right now unfortunately.
>
Yes, already working it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-20 14:17 ` Christoph Hellwig
  2019-02-20 16:59   ` Chaitanya Kulkarni
@ 2019-02-20 18:16   ` Chaitanya Kulkarni
  2019-02-20 22:55     ` Martin K. Petersen
  2019-02-21  2:16     ` Ming Lei
  1 sibling, 2 replies; 11+ messages in thread
From: Chaitanya Kulkarni @ 2019-02-20 18:16 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	Martin K. Petersen, Bart Van Assche

On 02/20/2019 06:17 AM, Christoph Hellwig wrote:
> We shouldn't be allocating a scatterlist for a command that doesn't
> have a payload.
>
> The blk_rq_payload_bytes check in nvme_rdma_map_data is supposed to
> prevent that.
>
> Chaitanya, can you try to debug why this is not working?  I'm on
> vacation and don't have much time right now unfortunately.
>


Hi Ming,

Can you please test following patch on your system ?



 From 79e178a53a9f101d1f5f6a4923298bb1ffe936ef Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Date: Wed, 20 Feb 2019 09:16:58 -0800
Subject: [PATCH] nvme-rdma: use nr_phys_segments when map rq to sgl

Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
if a command contains data to be mapped.  This fixes the case where
a struct request contains LBAs, but it has no payload, such as
Write Zeroes support.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
  drivers/nvme/host/rdma.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ac365366c2ec..c6a489049fd5 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1150,7 +1150,7 @@ static void nvme_rdma_unmap_data(struct 
nvme_rdma_queue *queue,
  	struct nvme_rdma_device *dev = queue->device;
  	struct ib_device *ibdev = dev->dev;

-	if (!blk_rq_payload_bytes(rq))
+	if (!blk_rq_nr_phys_segments(rq))
  		return;

  	if (req->mr) {
@@ -1273,7 +1273,7 @@ static int nvme_rdma_map_data(struct 
nvme_rdma_queue *queue,

  	c->common.flags |= NVME_CMD_SGL_METABUF;

-	if (!blk_rq_payload_bytes(rq))
+	if (!blk_rq_nr_phys_segments(rq))
  		return nvme_rdma_set_sg_null(c);

  	req->sg_table.sgl = req->first_sgl;
-- 
2.17.0


Also can you please share all the QEMU config/setup information that
you have used to run the tests ? I want to add this test to the blktests 
with QEMU platform.





^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-20 18:16   ` Chaitanya Kulkarni
@ 2019-02-20 22:55     ` Martin K. Petersen
  2019-02-21  1:29       ` Chaitanya Kulkarni
  2019-02-21  2:16     ` Ming Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Martin K. Petersen @ 2019-02-20 22:55 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, Ming Lei, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, Martin K. Petersen,
	Bart Van Assche


Chaitanya,

> -	if (!blk_rq_payload_bytes(rq))
> +	if (!blk_rq_nr_phys_segments(rq))

Wouldn't it be better to set RQF_SPECIAL_PAYLOAD and friends in
nvme_setup_write_zeroes() like it's done for discard?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-20 22:55     ` Martin K. Petersen
@ 2019-02-21  1:29       ` Chaitanya Kulkarni
  2019-02-21 13:37         ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Chaitanya Kulkarni @ 2019-02-21  1:29 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Ming Lei, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, Bart Van Assche

Hi Martin,

I don't mind going though that route, here are some points about
benefits of not using REQ_SPECIAL_PAYLOAD for write-zeroes :-

1. We are using RQF_SPECIAL_PAYLOAD for only discard commands and not for
     write-zeroes because it does not have any payload. Using this in the code will
     trigger more code changes to handle in the completion path.

2. Right now, blk_rq_nr_phys_segments() is used in:-
    nvme/host/pci.c
    nvme/target/loop.c

    In order to keep the code consistent, I think we should use the same function
    call everywhere in the nvme code base:-
    nvme/host/rdam.c : nvme_rdma_map_data() for the first check.
    nvme/host/fc.c : nvme_fc_queue_rq().

 3. Also blk_rq_nr_phys_segments() takes RQF_SPECIAL_PAYLOAD into account so
     no more changes required for discard and write-zeroes cases.

Regards,
-Chaitanya

On 2/20/19, 2:55 PM, "Martin K. Petersen" <martin.petersen@oracle.com> wrote:

    
    Chaitanya,
    
    > -	if (!blk_rq_payload_bytes(rq))
    > +	if (!blk_rq_nr_phys_segments(rq))
    
    Wouldn't it be better to set RQF_SPECIAL_PAYLOAD and friends in
    nvme_setup_write_zeroes() like it's done for discard?
    
    -- 
    Martin K. Petersen	Oracle Linux Engineering
    


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-20 18:16   ` Chaitanya Kulkarni
  2019-02-20 22:55     ` Martin K. Petersen
@ 2019-02-21  2:16     ` Ming Lei
  2019-02-21  2:21       ` Chaitanya Kulkarni
  1 sibling, 1 reply; 11+ messages in thread
From: Ming Lei @ 2019-02-21  2:16 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Christoph Hellwig, Ming Lei, linux-block@vger.kernel.org,
	Martin K. Petersen, linux-nvme@lists.infradead.org,
	Bart Van Assche

On Thu, Feb 21, 2019 at 2:16 AM Chaitanya Kulkarni
<Chaitanya.Kulkarni@wdc.com> wrote:
>
> On 02/20/2019 06:17 AM, Christoph Hellwig wrote:
> > We shouldn't be allocating a scatterlist for a command that doesn't
> > have a payload.
> >
> > The blk_rq_payload_bytes check in nvme_rdma_map_data is supposed to
> > prevent that.
> >
> > Chaitanya, can you try to debug why this is not working?  I'm on
> > vacation and don't have much time right now unfortunately.
> >
>
>
> Hi Ming,
>
> Can you please test following patch on your system ?

With this patch, the test can pass.

[root@ktest-04 blktests]# ./check nvmeof-mp/002
nvmeof-mp/002 (File I/O on top of multipath concurrently with logout
and login (mq)) [passed]
    runtime  75.478s  ...  54.317s

>
>
>
>  From 79e178a53a9f101d1f5f6a4923298bb1ffe936ef Mon Sep 17 00:00:00 2001
> From: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> Date: Wed, 20 Feb 2019 09:16:58 -0800
> Subject: [PATCH] nvme-rdma: use nr_phys_segments when map rq to sgl
>
> Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
> if a command contains data to be mapped.  This fixes the case where
> a struct request contains LBAs, but it has no payload, such as
> Write Zeroes support.
>
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>   drivers/nvme/host/rdma.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index ac365366c2ec..c6a489049fd5 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1150,7 +1150,7 @@ static void nvme_rdma_unmap_data(struct
> nvme_rdma_queue *queue,
>         struct nvme_rdma_device *dev = queue->device;
>         struct ib_device *ibdev = dev->dev;
>
> -       if (!blk_rq_payload_bytes(rq))
> +       if (!blk_rq_nr_phys_segments(rq))
>                 return;
>
>         if (req->mr) {
> @@ -1273,7 +1273,7 @@ static int nvme_rdma_map_data(struct
> nvme_rdma_queue *queue,
>
>         c->common.flags |= NVME_CMD_SGL_METABUF;
>
> -       if (!blk_rq_payload_bytes(rq))
> +       if (!blk_rq_nr_phys_segments(rq))
>                 return nvme_rdma_set_sg_null(c);
>
>         req->sg_table.sgl = req->first_sgl;
> --
> 2.17.0
>
>
> Also can you please share all the QEMU config/setup information that
> you have used to run the tests ? I want to add this test to the blktests
> with QEMU platform.

Just enable the required kernel config options and install
device-mapper-multipath,
then run './check nvmeof-mp/002', it will be started.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-21  2:16     ` Ming Lei
@ 2019-02-21  2:21       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 11+ messages in thread
From: Chaitanya Kulkarni @ 2019-02-21  2:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Ming Lei, linux-block@vger.kernel.org,
	Martin K. Petersen, linux-nvme@lists.infradead.org,
	Bart Van Assche

On 02/20/2019 06:16 PM, Ming Lei wrote:
> On Thu, Feb 21, 2019 at 2:16 AM Chaitanya Kulkarni
> <Chaitanya.Kulkarni@wdc.com> wrote:
>>
>> On 02/20/2019 06:17 AM, Christoph Hellwig wrote:
>>> We shouldn't be allocating a scatterlist for a command that doesn't
>>> have a payload.
>>>
>>> The blk_rq_payload_bytes check in nvme_rdma_map_data is supposed to
>>> prevent that.
>>>
>>> Chaitanya, can you try to debug why this is not working?  I'm on
>>> vacation and don't have much time right now unfortunately.
>>>
>>
>>
>> Hi Ming,
>>
>> Can you please test following patch on your system ?
>
> With this patch, the test can pass.
>
> [root@ktest-04 blktests]# ./check nvmeof-mp/002
> nvmeof-mp/002 (File I/O on top of multipath concurrently with logout
> and login (mq)) [passed]
>      runtime  75.478s  ...  54.317s
>
>>
>>
>>
>>   From 79e178a53a9f101d1f5f6a4923298bb1ffe936ef Mon Sep 17 00:00:00 2001
>> From: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> Date: Wed, 20 Feb 2019 09:16:58 -0800
>> Subject: [PATCH] nvme-rdma: use nr_phys_segments when map rq to sgl
>>
>> Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check
>> if a command contains data to be mapped.  This fixes the case where
>> a struct request contains LBAs, but it has no payload, such as
>> Write Zeroes support.
>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>    drivers/nvme/host/rdma.c | 4 ++--
>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index ac365366c2ec..c6a489049fd5 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -1150,7 +1150,7 @@ static void nvme_rdma_unmap_data(struct
>> nvme_rdma_queue *queue,
>>          struct nvme_rdma_device *dev = queue->device;
>>          struct ib_device *ibdev = dev->dev;
>>
>> -       if (!blk_rq_payload_bytes(rq))
>> +       if (!blk_rq_nr_phys_segments(rq))
>>                  return;
>>
>>          if (req->mr) {
>> @@ -1273,7 +1273,7 @@ static int nvme_rdma_map_data(struct
>> nvme_rdma_queue *queue,
>>
>>          c->common.flags |= NVME_CMD_SGL_METABUF;
>>
>> -       if (!blk_rq_payload_bytes(rq))
>> +       if (!blk_rq_nr_phys_segments(rq))
>>                  return nvme_rdma_set_sg_null(c);
>>
>>          req->sg_table.sgl = req->first_sgl;
>> --
>> 2.17.0
>>
>>
>> Also can you please share all the QEMU config/setup information that
>> you have used to run the tests ? I want to add this test to the blktests
>> with QEMU platform.
>
> Just enable the required kernel config options and install
> device-mapper-multipath,
> then run './check nvmeof-mp/002', it will be started.
>
> Thanks,
> Ming Lei
>

Thanks Ming for testing, I'll send an official patch to the mailing list.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-21  1:29       ` Chaitanya Kulkarni
@ 2019-02-21 13:37         ` Christoph Hellwig
  2019-02-21 14:22           ` Martin K. Petersen
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2019-02-21 13:37 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Martin K. Petersen, Christoph Hellwig, Ming Lei,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	Bart Van Assche

On Thu, Feb 21, 2019 at 01:29:57AM +0000, Chaitanya Kulkarni wrote:
> Hi Martin,
> 
> I don't mind going though that route, here are some points about
> benefits of not using REQ_SPECIAL_PAYLOAD for write-zeroes :-
> 
> 1. We are using RQF_SPECIAL_PAYLOAD for only discard commands and not for
>      write-zeroes because it does not have any payload. Using this in the code will
>      trigger more code changes to handle in the completion path.

Yes.  And that is the big difference to SCSI where REQ_OP_WRITE_ZEROES
turns into a WRITE SAME command that has a payload.  So for SCSI
RQF_SPECIAL_PAYLOAD for REQ_OP_WRITE_ZEROES makes a lot of sense, for
NVMe it does not.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Regression: NVMe: kernel BUG at lib/sg_pool.c:103!
  2019-02-21 13:37         ` Christoph Hellwig
@ 2019-02-21 14:22           ` Martin K. Petersen
  0 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2019-02-21 14:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chaitanya Kulkarni, Martin K. Petersen, Ming Lei,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	Bart Van Assche


Christoph,

>> 1. We are using RQF_SPECIAL_PAYLOAD for only discard commands and not
>> for write-zeroes because it does not have any payload. Using this in
>> the code will trigger more code changes to handle in the completion
>> path.
>
> Yes.  And that is the big difference to SCSI where REQ_OP_WRITE_ZEROES
> turns into a WRITE SAME command that has a payload.  So for SCSI
> RQF_SPECIAL_PAYLOAD for REQ_OP_WRITE_ZEROES makes a lot of sense, for
> NVMe it does not.

I don't actually care about using RQF_SPECIAL, it just seemed like a
quick workaround to set it and make bv_len 0.

My concern is purely rooted in all the grief we've had throughout the
block I/O stack distinguishing between the bytes acted upon on media and
the DMA transfer length. And consequently, I don't particularly like
that blk_rq_payload_bytes() doesn't handle the NVMe WRITE ZEROES
command. That seems like something that will cause us headaches in the
future...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-21 14:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-20  3:11 Regression: NVMe: kernel BUG at lib/sg_pool.c:103! Ming Lei
2019-02-20  3:13 ` Ming Lei
2019-02-20 14:17 ` Christoph Hellwig
2019-02-20 16:59   ` Chaitanya Kulkarni
2019-02-20 18:16   ` Chaitanya Kulkarni
2019-02-20 22:55     ` Martin K. Petersen
2019-02-21  1:29       ` Chaitanya Kulkarni
2019-02-21 13:37         ` Christoph Hellwig
2019-02-21 14:22           ` Martin K. Petersen
2019-02-21  2:16     ` Ming Lei
2019-02-21  2:21       ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox