linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
       [not found] <cki.F3E139361A.EN5MUSJKK9@redhat.com>
@ 2021-02-06  3:08 ` Yi Zhang
  2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Yi Zhang @ 2021-02-06  3:08 UTC (permalink / raw)
  To: linux-nvme, linux-block; +Cc: axboe, Rachel Sibley, CKI Project, Sagi Grimberg

Hello

We found this kernel NULL pointer issue with latest linux-block/for-next and it's 100% reproduced, let me know if you need more info/testing, thanks 

Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next

Reproducer: blktests nvme-tcp/012


[  124.458121] run blktests nvme/012 at 2021-02-05 21:53:34 
[  125.525568] BUG: kernel NULL pointer dereference, address: 0000000000000008 
[  125.532524] #PF: supervisor read access in kernel mode 
[  125.537665] #PF: error_code(0x0000) - not-present page 
[  125.542803] PGD 0 P4D 0  
[  125.545343] Oops: 0000 [#1] SMP NOPTI 
[  125.549009] CPU: 15 PID: 12069 Comm: kworker/15:2H Tainted: G S        I       5.11.0-rc6+ #1 
[  125.557528] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 
[  125.565093] Workqueue: kblockd blk_mq_run_work_fn 
[  125.569797] RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] 
[  125.575544] Code: 8b 75 68 44 8b 45 28 44 8b 7d 30 49 89 d4 48 c1 e2 04 4c 01 f2 45 89 fb 44 89 c7 85 ff 74 4d 44 89 e0 44 8b 55 10 48 c1 e0 04 <41> 8b 5c 06 08 45 0f b6 ca 89 d8 44 29 d8 39 f8 0f 47 c7 41 83 e9 
[  125.594290] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246 
[  125.599515] RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000 
[  125.606648] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000 
[  125.613781] RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000 
[  125.620914] R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000 
[  125.628045] R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000 
[  125.635178] FS:  0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000 
[  125.643264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  125.649009] CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0 
[  125.656142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[  125.663274] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[  125.670407] PKRU: 55555554 
[  125.673119] Call Trace: 
[  125.675575]  nvme_tcp_queue_rq+0xef/0x330 [nvme_tcp] 
[  125.680537]  blk_mq_dispatch_rq_list+0x11c/0x7c0 
[  125.685157]  ? blk_mq_flush_busy_ctxs+0xf6/0x110 
[  125.689775]  __blk_mq_sched_dispatch_requests+0x12b/0x170 
[  125.695175]  blk_mq_sched_dispatch_requests+0x30/0x60 
[  125.700227]  __blk_mq_run_hw_queue+0x2b/0x60 
[  125.704500]  process_one_work+0x1cb/0x360 
[  125.708513]  ? process_one_work+0x360/0x360 
[  125.712699]  worker_thread+0x30/0x370 
[  125.716365]  ? process_one_work+0x360/0x360 
[  125.720550]  kthread+0x116/0x130 
[  125.723782]  ? kthread_park+0x80/0x80 
[  125.727448]  ret_from_fork+0x1f/0x30 
[  125.731028] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp nvmet loop nvme nvme_core rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif mgag200 i2c_algo_bit drm_kms_helper irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support syscopyarea crc32_pclmul sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops dcdbas rapl drm intel_cstate acpi_ipmi ipmi_si mei_me dell_smbios intel_uncore i2c_i801 mei dax_pmem_compat wmi_bmof dell_wmi_descriptor ipmi_devintf device_dax intel_pch_thermal pcspkr i2c_smbus lpc_ich ipmi_msghandler dax_pmem_core acpi_power_meter ip_tables xfs libcrc32c nd_pmem nd_btt sd_mod t10_pi sg ahci libahci libata tg3 megaraid_sas crc32c_intel nfit wmi libnvdimm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvmet] 
[  125.806140] CR2: 0000000000000008 
[  125.809467] ---[ end trace 312795dd33fab339 ]--- 
[  125.824717] RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] 
[  125.830465] Code: 8b 75 68 44 8b 45 28 44 8b 7d 30 49 89 d4 48 c1 e2 04 4c 01 f2 45 89 fb 44 89 c7 85 ff 74 4d 44 89 e0 44 8b 55 10 48 c1 e0 04 <41> 8b 5c 06 08 45 0f b6 ca 89 d8 44 29 d8 39 f8 0f 47 c7 41 83 e9 
[  125.849212] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246 
[  125.854436] RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000 
[  125.861560] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000 
[  125.868692] RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000 
[  125.875817] R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000 
[  125.882948] R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000 
[  125.890072] FS:  0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000 
[  125.898158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  125.903897] CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0 
[  125.911029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[  125.918160] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[  125.925292] PKRU: 55555554 
[  125.927998] Kernel panic - not syncing: Fatal exception 
[  126.309099] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 
[  126.330633] ---[ end Kernel panic - not syncing: Fatal exception ]--- 

Best Regards,
  Yi Zhang


----- Original Message -----
From: "CKI Project" <cki-project@redhat.com>
To: axboe@kernel.dk
Cc: "Yi Zhang" <yizhan@redhat.com>
Sent: Saturday, February 6, 2021 7:36:43 AM
Subject: ✅ PASS: Test report for kernel 5.11.0-rc6 (block)


Hello,

We ran automated tests on a recent commit from this kernel tree:

       Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
            Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next

The results of these automated tests are provided below.

    Overall result: PASSED
             Merge: OK
           Compile: OK
             Tests: OK

All kernel binaries, config files, and logs are available for download here:

  https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/02/05/623209

Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.

        ,-.   ,-.
       ( C ) ( K )  Continuous
        `-',-.`-'   Kernel
          ( I )     Integration
           `-'
______________________________________________________________________________

Compile testing
---------------

We compiled the kernel for 4 architectures:

    aarch64:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg

    ppc64le:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg

    s390x:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg

    x86_64:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg



Hardware testing
----------------
We booted each kernel and ran the following tests:

  aarch64:
    Host 1:
       ✅ Boot test
       ✅ ACPI table test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       ✅ storage: SCSI VPD
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

    Host 2:
       ✅ Boot test
       ✅ storage: software RAID testing
       🚧 ✅ xfstests - ext4
       🚧 ✅ xfstests - xfs
       🚧 ✅ xfstests - btrfs
       🚧 ✅ Storage blktests
       🚧 ✅ Storage block - filesystem fio test
       🚧 ✅ Storage block - queue scheduler test
       🚧 💥 Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
       🚧 ⚡⚡⚡ stress: stress-ng

  ppc64le:
    Host 1:

       ⚡ Internal infrastructure issues prevented one or more tests (marked
       with ⚡⚡⚡) from running on this architecture.
       This is not the fault of the kernel that was tested.

       ⚡⚡⚡ Boot test
       ⚡⚡⚡ LTP
       ⚡⚡⚡ Loopdev Sanity
       ⚡⚡⚡ Memory: fork_mem
       ⚡⚡⚡ Memory function: memfd_create
       ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
       🚧 ⚡⚡⚡ CIFS Connectathon
       🚧 ⚡⚡⚡ POSIX pjd-fstest suites
       🚧 ⚡⚡⚡ Ethernet drivers sanity

    Host 2:
       ✅ Boot test
       ✅ storage: software RAID testing
       🚧 ✅ xfstests - ext4
       🚧 ✅ xfstests - xfs
       🚧 ✅ xfstests - btrfs
       🚧 ✅ Storage blktests
       🚧 ✅ Storage block - filesystem fio test
       🚧 ✅ Storage block - queue scheduler test
       🚧 💥 Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test

    Host 3:
       ✅ Boot test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

  s390x:
    Host 1:

       ⚡ Internal infrastructure issues prevented one or more tests (marked
       with ⚡⚡⚡) from running on this architecture.
       This is not the fault of the kernel that was tested.

       ✅ Boot test
       🚧 ✅ Storage blktests
       🚧 ⚡⚡⚡ Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
       🚧 ⚡⚡⚡ stress: stress-ng

    Host 2:
       ✅ Boot test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

  x86_64:
    Host 1:
       ✅ Boot test
       ✅ Storage SAN device stress - qedf driver

    Host 2:
       ✅ Boot test
       ✅ Storage SAN device stress - qla2xxx driver

    Host 3:
       ✅ Boot test
       ✅ storage: software RAID testing
       🚧 ✅ xfstests - ext4
       🚧 ✅ xfstests - xfs
       🚧 ✅ xfstests - btrfs
       🚧 ✅ xfstests - nfsv4.2
       🚧 ✅ xfstests - cifsv3.11
       🚧 ✅ Storage blktests
       🚧 ✅ Storage block - filesystem fio test
       🚧 ✅ Storage block - queue scheduler test
       🚧 💥 Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
       🚧 ⚡⚡⚡ stress: stress-ng

    Host 4:
       ✅ Boot test
       ✅ Storage SAN device stress - mpt3sas_gen1

    Host 5:
       ✅ Boot test
       ✅ ACPI table test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       ✅ storage: SCSI VPD
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

    Host 6:
       ✅ Boot test
       ✅ Storage SAN device stress - lpfc driver

  Test sources: https://gitlab.com/cki-project/kernel-tests
    💚 Pull requests are welcome for new tests or improvements to existing tests!

Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.

Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.

Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012
  2021-02-06  3:08 ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Yi Zhang
@ 2021-02-07  4:50   ` Yi Zhang
  2021-02-07  5:25     ` Chaitanya Kulkarni
                       ` (2 more replies)
  2021-02-07  7:14   ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Chaitanya Kulkarni
  2021-02-08  9:46   ` Sagi Grimberg
  2 siblings, 3 replies; 27+ messages in thread
From: Yi Zhang @ 2021-02-07  4:50 UTC (permalink / raw)
  To: linux-nvme, linux-block; +Cc: axboe, Rachel Sibley, CKI Project, Sagi Grimberg


The issue was introduced after merge the NVMe updates

commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD)
Merge: 44d10e4b2f2c 0d7389718c32
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Feb 2 07:12:06 2021 -0700

     Merge branch 'for-5.12/drivers' into for-next

     * for-5.12/drivers: (22 commits)
       nvme-tcp: use cancel tagset helper for tear down
       nvme-rdma: use cancel tagset helper for tear down
       nvme-tcp: add clean action for failed reconnection
       nvme-rdma: add clean action for failed reconnection
       nvme-core: add cancel tagset helpers
       nvme-core: get rid of the extra space
       nvme: add tracing of zns commands
       nvme: parse format nvm command details when tracing
       nvme: update enumerations for status codes
       nvmet: add lba to sect conversion helpers
       nvmet: remove extra variable in identify ns
       nvmet: remove extra variable in id-desclist
       nvmet: remove extra variable in smart log nsid
       nvme: refactor ns->ctrl by request
       nvme-tcp: pass multipage bvec to request iov_iter
       nvme-tcp: get rid of unused helper function
       nvme-tcp: fix wrong setting of request iov_iter
       nvme: support command retry delay for admin command
       nvme: constify static attribute_group structs
       nvmet-fc: use RCU proctection for assoc_list


On 2/6/21 11:08 AM, Yi Zhang wrote:
> blktests nvme-tcp/012


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012
  2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
@ 2021-02-07  5:25     ` Chaitanya Kulkarni
  2021-02-08  6:48       ` Yi Zhang
  2021-02-07  5:48     ` Chaitanya Kulkarni
  2021-02-07  5:58     ` Chaitanya Kulkarni
  2 siblings, 1 reply; 27+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-07  5:25 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme@lists.infradead.org, linux-block
  Cc: axboe@kernel.dk, Rachel Sibley, CKI Project, Sagi Grimberg

Yi,

On 2/6/21 20:51, Yi Zhang wrote:
> The issue was introduced after merge the NVMe updates
>
> commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD)
> Merge: 44d10e4b2f2c 0d7389718c32
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Tue Feb 2 07:12:06 2021 -0700
>
>      Merge branch 'for-5.12/drivers' into for-next
>
>      * for-5.12/drivers: (22 commits)
>        nvme-tcp: use cancel tagset helper for tear down
>        nvme-rdma: use cancel tagset helper for tear down
>        nvme-tcp: add clean action for failed reconnection
>        nvme-rdma: add clean action for failed reconnection
>        nvme-core: add cancel tagset helpers
>        nvme-core: get rid of the extra space
>        nvme: add tracing of zns commands
>        nvme: parse format nvm command details when tracing
>        nvme: update enumerations for status codes
>        nvmet: add lba to sect conversion helpers
>        nvmet: remove extra variable in identify ns
>        nvmet: remove extra variable in id-desclist
>        nvmet: remove extra variable in smart log nsid
>        nvme: refactor ns->ctrl by request
>        nvme-tcp: pass multipage bvec to request iov_iter
>        nvme-tcp: get rid of unused helper function
>        nvme-tcp: fix wrong setting of request iov_iter
>        nvme: support command retry delay for admin command
>        nvme: constify static attribute_group structs
>        nvmet-fc: use RCU proctection for assoc_list
>
>
> On 2/6/21 11:08 AM, Yi Zhang wrote:
>> blktests nvme-tcp/012
Thanks for reporting, you can further bisect this using following
NVMe tree if is from NVMEe tree :- http://git.infradead.org/nvme.git
Branch :- 5.12

Looking at the commit log it seems like nvme-tcp commits might be
the issue given that you got the problem in nvme_tcp_init_iter()
and just eliminating by the looking at the code
commit 9f99a29e5307 can lead to this behavior.

*I may be completely wrong as I'm not expert in the nvme-tcp host.*



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012
  2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
  2021-02-07  5:25     ` Chaitanya Kulkarni
@ 2021-02-07  5:48     ` Chaitanya Kulkarni
  2021-02-07  5:58     ` Chaitanya Kulkarni
  2 siblings, 0 replies; 27+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-07  5:48 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme@lists.infradead.org, linux-block
  Cc: axboe@kernel.dk, Rachel Sibley, CKI Project, Sagi Grimberg

On 2/6/21 20:51, Yi Zhang wrote:
> The issue was introduced after merge the NVMe updates
>
> commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD)
> Merge: 44d10e4b2f2c 0d7389718c32
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Tue Feb 2 07:12:06 2021 -0700
>
>      Merge branch 'for-5.12/drivers' into for-next
>
>      * for-5.12/drivers: (22 commits)
>        nvme-tcp: use cancel tagset helper for tear down
>        nvme-rdma: use cancel tagset helper for tear down
>        nvme-tcp: add clean action for failed reconnection
>        nvme-rdma: add clean action for failed reconnection
>        nvme-core: add cancel tagset helpers
>        nvme-core: get rid of the extra space
>        nvme: add tracing of zns commands
>        nvme: parse format nvm command details when tracing
>        nvme: update enumerations for status codes
>        nvmet: add lba to sect conversion helpers
>        nvmet: remove extra variable in identify ns
>        nvmet: remove extra variable in id-desclist
>        nvmet: remove extra variable in smart log nsid
>        nvme: refactor ns->ctrl by request
>        nvme-tcp: pass multipage bvec to request iov_iter
>        nvme-tcp: get rid of unused helper function
>        nvme-tcp: fix wrong setting of request iov_iter
>        nvme: support command retry delay for admin command
>        nvme: constify static attribute_group structs
>        nvmet-fc: use RCU proctection for assoc_list
>
>
> On 2/6/21 11:08 AM, Yi Zhang wrote:
>> blktests nvme-tcp/012
>
Running the test few times I got this once no sign of Oops though  :-


# nvme_trtype=tcp ./check nvme/012
nvme/012 (run mkfs and data verification fio job on NVMeOF block
device-backed ns) [failed]
    runtime  37.624s  ...  42.272s
    something found in dmesg:
    [   69.198819] run blktests nvme/012 at 2021-02-06 21:32:23
    [   69.330277] loop: module loaded
    [   69.333383] loop0: detected capacity change from 2097152 to 0
    [   69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
    [   69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
    [   69.396581] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a.
    [   69.399482] nvme nvme1: creating 64 I/O queues.
    [   69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues.
    [   69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1",
addr 127.0.0.1:4420
    [   72.381282] XFS (nvme1n1): Mounting V5 Filesystem
    ...
    (See '/root/blktests/results/nodev/nvme/012.dmesg' for the entire
message)
blktests (master) # dmesg  -c
[   38.347661] nvme nvme0: 63/0/0 default/read/poll queues
[   69.198819] run blktests nvme/012 at 2021-02-06 21:32:23
[   69.330277] loop: module loaded
[   69.333383] loop0: detected capacity change from 2097152 to 0
[   69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   69.396581] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a.
[   69.399482] nvme nvme1: creating 64 I/O queues.
[   69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues.
[   69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420
[   72.381282] XFS (nvme1n1): Mounting V5 Filesystem
[   72.394603] XFS (nvme1n1): Ending clean mount
[   72.395463] xfs filesystem being mounted at /mnt/blktests supports
timestamps until 2038 (0x7fffffff)

[   73.402838] ======================================================
[   73.403491] WARNING: possible circular locking dependency detected
[   73.404125] 5.11.0-rc6blk+ #166 Tainted: G           OE   
[   73.404690] ------------------------------------------------------
[   73.405332] fio/3671 is trying to acquire lock:
[   73.405803] ffff88881d110cb0 (sk_lock-AF_INET){+.+.}-{0:0}, at:
tcp_sendpage+0x23/0x50
[   73.406627]
but task is already holding lock:
[   73.407246] ffff888128cabcf0 (&xfs_dir_ilock_class/5){+.+.}-{3:3},
at: xfs_ilock+0xbf/0x250 [xfs]
[   73.408233]
which lock already depends on the new lock.

[   73.409096]
the existing dependency chain (in reverse order) is:
[   73.409885]
-> #3 (&xfs_dir_ilock_class/5){+.+.}-{3:3}:
[   73.410594]        lock_acquire+0xd2/0x390
[   73.411045]        down_write_nested+0x47/0x110
[   73.411530]        xfs_ilock+0xbf/0x250 [xfs]
[   73.412068]        xfs_create+0x1d9/0x6b0 [xfs]
[   73.412626]        xfs_generic_create+0x205/0x2c0 [xfs]
[   73.413255]        lookup_open+0x4f6/0x630
[   73.413710]        path_openat+0x298/0xa80
[   73.414158]        do_filp_open+0x93/0x100
[   73.414606]        do_sys_openat2+0x24b/0x310
[   73.415090]        do_sys_open+0x4b/0x80
[   73.415569]        do_syscall_64+0x33/0x40
[   73.416075]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.416747]
-> #2 (sb_internal){.+.+}-{0:0}:
[   73.417436]        lock_acquire+0xd2/0x390
[   73.417938]        xfs_trans_alloc+0x19e/0x2e0 [xfs]
[   73.418607]        xfs_vn_update_time+0xc8/0x250 [xfs]
[   73.419295]        touch_atime+0x16b/0x200
[   73.419814]        xfs_file_mmap+0xa7/0xb0 [xfs]
[   73.420447]        mmap_region+0x3f6/0x690
[   73.420955]        do_mmap+0x379/0x580
[   73.421415]        vm_mmap_pgoff+0xdf/0x170
[   73.421929]        ksys_mmap_pgoff+0x1dd/0x240
[   73.422477]        do_syscall_64+0x33/0x40
[   73.422980]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.423672]
-> #1 (&mm->mmap_lock#2){++++}-{3:3}:
[   73.424426]        lock_acquire+0xd2/0x390
[   73.424932]        __might_fault+0x5e/0x80
[   73.425433]        _copy_from_user+0x1e/0xa0
[   73.425966]        do_ip_setsockopt.isra.15+0xbba/0x12f0
[   73.426616]        ip_setsockopt+0x34/0x90
[   73.427120]        __sys_setsockopt+0x8f/0x110
[   73.427681]        __x64_sys_setsockopt+0x20/0x30
[   73.428252]        do_syscall_64+0x33/0x40
[   73.428751]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.429422]
-> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
[   73.430154]        validate_chain+0x8fa/0xec0
[   73.430681]        __lock_acquire+0x563/0x940
[   73.431212]        lock_acquire+0xd2/0x390
[   73.431719]        lock_sock_nested+0x72/0xa0
[   73.432252]        tcp_sendpage+0x23/0x50
[   73.432740]        inet_sendpage+0x4f/0x80
[   73.433240]        kernel_sendpage+0x57/0xc0
[   73.433760]        nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp]
[   73.434435]        nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp]
[   73.435104]        __blk_mq_try_issue_directly+0x109/0x1b0
[   73.435768]        blk_mq_request_issue_directly+0x4b/0x80
[   73.436433]        blk_mq_try_issue_list_directly+0x62/0xf0
[   73.437104]        blk_mq_sched_insert_requests+0x192/0x2b0
[   73.437774]        blk_mq_flush_plug_list+0x13c/0x260
[   73.438389]        blk_flush_plug_list+0xd7/0x100
[   73.438960]        blk_finish_plug+0x21/0x30
[   73.439485]        _xfs_buf_ioapply+0x2cc/0x3c0 [xfs]
[   73.440171]        __xfs_buf_submit+0x85/0x220 [xfs]
[   73.440833]        xfs_buf_read_map+0x105/0x2a0 [xfs]
[   73.441511]        xfs_trans_read_buf_map+0x2b2/0x610 [xfs]
[   73.442254]        xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.442874]        xfs_ialloc_read_agi+0x48/0x170 [xfs]
[   73.443568]        xfs_dialloc_select_ag+0x94/0x2a0 [xfs]
[   73.444277]        xfs_dir_ialloc+0x72/0x630 [xfs]
[   73.444927]        xfs_create+0x241/0x6b0 [xfs]
[   73.445550]        xfs_generic_create+0x205/0x2c0 [xfs]
[   73.446255]        lookup_open+0x4f6/0x630
[   73.446759]        path_openat+0x298/0xa80
[   73.447268]        do_filp_open+0x93/0x100
[   73.447770]        do_sys_openat2+0x24b/0x310
[   73.448301]        do_sys_open+0x4b/0x80
[   73.448779]        do_syscall_64+0x33/0x40
[   73.449275]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.449944]
other info that might help us debug this:

[   73.450899] Chain exists of:
  sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

[   73.452222]  Possible unsafe locking scenario:

[   73.452931]        CPU0                    CPU1
[   73.453479]        ----                    ----
[   73.454024]   lock(&xfs_dir_ilock_class/5);
[   73.454528]                                lock(sb_internal);
[   73.455217]                                lock(&xfs_dir_ilock_class/5);
[   73.456027]   lock(sk_lock-AF_INET);
[   73.456463]
 *** DEADLOCK ***

[   73.457175] 6 locks held by fio/3671:
[   73.457618]  #0: ffff8881185a9488 (sb_writers#9){.+.+}-{0:0}, at:
path_openat+0xa61/0xa80
[   73.458604]  #1: ffff888128cabfe0
(&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at:
path_openat+0x287/0xa80
[   73.459847]  #2: ffff8881185a96a8 (sb_internal){.+.+}-{0:0}, at:
xfs_create+0x1b4/0x6b0 [xfs]
[   73.460926]  #3: ffff888128cabcf0
(&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xbf/0x250 [xfs]
[   73.462100]  #4: ffff88881bd54c58 (hctx->srcu){....}-{0:0}, at:
hctx_lock+0x62/0xe0
[   73.463029]  #5: ffff888142ab29d0 (&queue->send_mutex){+.+.}-{3:3},
at: nvme_tcp_queue_rq+0x2fc/0x330 [nvme_tcp]
[   73.464288]
stack backtrace:
[   73.464824] CPU: 16 PID: 3671 Comm: fio Tainted: G           OE    
5.11.0-rc6blk+ #166
[   73.465785] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[   73.467176] Call Trace:
[   73.467483]  dump_stack+0x8d/0xb5
[   73.467910]  check_noncircular+0x119/0x130
[   73.468413]  ? save_trace+0x3d/0x2c0
[   73.468854]  validate_chain+0x8fa/0xec0
[   73.469329]  __lock_acquire+0x563/0x940
[   73.469800]  lock_acquire+0xd2/0x390
[   73.470240]  ? tcp_sendpage+0x23/0x50
[   73.470689]  lock_sock_nested+0x72/0xa0
[   73.471163]  ? tcp_sendpage+0x23/0x50
[   73.471624]  tcp_sendpage+0x23/0x50
[   73.472059]  inet_sendpage+0x4f/0x80
[   73.472504]  kernel_sendpage+0x57/0xc0
[   73.472965]  ? mark_held_locks+0x4f/0x70
[   73.473445]  nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp]
[   73.474062]  nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp]
[   73.474697]  __blk_mq_try_issue_directly+0x109/0x1b0
[   73.475324]  blk_mq_request_issue_directly+0x4b/0x80
[   73.475953]  blk_mq_try_issue_list_directly+0x62/0xf0
[   73.476587]  blk_mq_sched_insert_requests+0x192/0x2b0
[   73.477225]  blk_mq_flush_plug_list+0x13c/0x260
[   73.477802]  blk_flush_plug_list+0xd7/0x100
[   73.478338]  blk_finish_plug+0x21/0x30
[   73.478818]  _xfs_buf_ioapply+0x2cc/0x3c0 [xfs]
[   73.479457]  __xfs_buf_submit+0x85/0x220 [xfs]
[   73.480091]  xfs_buf_read_map+0x105/0x2a0 [xfs]
[   73.480728]  ? xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.481331]  xfs_trans_read_buf_map+0x2b2/0x610 [xfs]
[   73.482036]  ? xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.482635]  xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.483221]  xfs_ialloc_read_agi+0x48/0x170 [xfs]
[   73.483866]  xfs_dialloc_select_ag+0x94/0x2a0 [xfs]
[   73.484513]  xfs_dir_ialloc+0x72/0x630 [xfs]
[   73.485092]  ? xfs_ilock+0xbf/0x250 [xfs]
[   73.485648]  xfs_create+0x241/0x6b0 [xfs]
[   73.486198]  xfs_generic_create+0x205/0x2c0 [xfs]
[   73.486857]  lookup_open+0x4f6/0x630
[   73.487317]  path_openat+0x298/0xa80
[   73.487786]  ? __lock_acquire+0x581/0x940
[   73.488302]  do_filp_open+0x93/0x100
[   73.488769]  ? do_raw_spin_unlock+0x49/0xc0
[   73.489307]  ? _raw_spin_unlock+0x1f/0x30
[   73.489825]  do_sys_openat2+0x24b/0x310
[   73.490321]  do_sys_open+0x4b/0x80
[   73.490764]  do_syscall_64+0x33/0x40
[   73.491230]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.491890] RIP: 0033:0x7f7d6f2eaead
[   73.492353] Code: c5 20 00 00 75 10 b8 02 00 00 00 0f 05 48 3d 01 f0
ff ff 73 31 c3 48 83 ec 08 e8 7e f4 ff ff 48 89 04 24 b8 02 00 00 00 0f
05 <48> 8b 3c 24 48 89 c2 e8 c7 f4 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[   73.494713] RSP: 002b:00007ffc072e8910 EFLAGS: 00000293 ORIG_RAX:
0000000000000002
[   73.495671] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f7d6f2eaead
[   73.496526] RDX: 00000000000001a4 RSI: 0000000000000041 RDI:
00007f7d6d666690
[   73.497381] RBP: 0000000000000000 R08: 0000000000000041 R09:
0000000000000041
[   73.498232] R10: 00007ffc072e85a0 R11: 0000000000000293 R12:
0000000000000000
[   73.499121] R13: 000000003b600000 R14: 00007f7d47136000 R15:
00007f7d6d666510
[  111.135896] XFS (nvme1n1): Unmounting Filesystem
[  111.308034] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012
  2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
  2021-02-07  5:25     ` Chaitanya Kulkarni
  2021-02-07  5:48     ` Chaitanya Kulkarni
@ 2021-02-07  5:58     ` Chaitanya Kulkarni
  2 siblings, 0 replies; 27+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-07  5:58 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme@lists.infradead.org, linux-block
  Cc: axboe@kernel.dk, Rachel Sibley, CKI Project, Sagi Grimberg

On 2/6/21 20:51, Yi Zhang wrote:
> The issue was introduced after merge the NVMe updates
>
> commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD)
> Merge: 44d10e4b2f2c 0d7389718c32
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Tue Feb 2 07:12:06 2021 -0700
>
>      Merge branch 'for-5.12/drivers' into for-next
>
>      * for-5.12/drivers: (22 commits)
>        nvme-tcp: use cancel tagset helper for tear down
>        nvme-rdma: use cancel tagset helper for tear down
>        nvme-tcp: add clean action for failed reconnection
>        nvme-rdma: add clean action for failed reconnection
>        nvme-core: add cancel tagset helpers
>        nvme-core: get rid of the extra space
>        nvme: add tracing of zns commands
>        nvme: parse format nvm command details when tracing
>        nvme: update enumerations for status codes
>        nvmet: add lba to sect conversion helpers
>        nvmet: remove extra variable in identify ns
>        nvmet: remove extra variable in id-desclist
>        nvmet: remove extra variable in smart log nsid
>        nvme: refactor ns->ctrl by request
>        nvme-tcp: pass multipage bvec to request iov_iter
>        nvme-tcp: get rid of unused helper function
>        nvme-tcp: fix wrong setting of request iov_iter
>        nvme: support command retry delay for admin command
>        nvme: constify static attribute_group structs
>        nvmet-fc: use RCU proctection for assoc_list
>
>
> On 2/6/21 11:08 AM, Yi Zhang wrote:
>> blktests nvme-tcp/012
Can you try following patch and see in your Oops if you get that message
so that
we can at-least eliminate one case ?

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 619b0d8f6e38..13d44d155478 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2271,8 +2271,11 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct
nvme_ns *ns,
        req->data_len = blk_rq_nr_phys_segments(rq) ?
                                blk_rq_payload_bytes(rq) : 0;
        req->curr_bio = rq->bio;
-       if (req->curr_bio)
+       if (req->curr_bio) {
+               if (rq->bio != rq->biotail)
+                       printk(KERN_INFO"%s %d req->bio !=
req->biotail\n", __func__, __LINE__);
                nvme_tcp_init_iter(req, rq_data_dir(rq));
+       }
 
        if (rq_data_dir(rq) == WRITE &&
            req->data_len <= nvme_tcp_inline_data_size(queue))


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-06  3:08 ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Yi Zhang
  2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
@ 2021-02-07  7:14   ` Chaitanya Kulkarni
  2021-02-08  9:53     ` Sagi Grimberg
  2021-02-08  9:46   ` Sagi Grimberg
  2 siblings, 1 reply; 27+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-07  7:14 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Yi Zhang, linux-nvme@lists.infradead.org, linux-block,
	axboe@kernel.dk, Rachel Sibley, CKI Project

Sagi,

On 2/5/21 19:19, Yi Zhang wrote:
> Hello
>
> We found this kernel NULL pointer issue with latest linux-block/for-next and it's 100% reproduced, let me know if you need more info/testing, thanks 
>
> Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>
> Reproducer: blktests nvme-tcp/012
>
>
> [  124.458121] run blktests nvme/012 at 2021-02-05 21:53:34 
> [  125.525568] BUG: kernel NULL pointer dereference, address: 0000000000000008 
> [  125.532524] #PF: supervisor read access in kernel mode 
> [  125.537665] #PF: error_code(0x0000) - not-present page 
> [  125.542803] PGD 0 P4D 0  
> [  125.545343] Oops: 0000 [#1] SMP NOPTI 
> [  125.549009] CPU: 15 PID: 12069 Comm: kworker/15:2H Tainted: G S        I       5.11.0-rc6+ #1 
> [  125.557528] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 
> [  125.565093] Workqueue: kblockd blk_mq_run_work_fn 
> [  125.569797] RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] 
> [  125.575544] Code: 8b 75 68 44 8b 45 28 44 8b 7d 30 49 89 d4 48 c1 e2 04 4c 01 f2 45 89 fb 44 89 c7 85 ff 74 4d 44 89 e0 44 8b 55 10 48 c1 e0 04 <41> 8b 5c 06 08 45 0f b6 ca 89 d8 44 29 d8 39 f8 0f 47 c7 41 83 e9 
> [  125.594290] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246 
> [  125.599515] RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000 
> [  125.606648] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000 
> [  125.613781] RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000 
> [  125.620914] R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000 
> [  125.628045] R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000 
> [  125.635178] FS:  0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000 
> [  125.643264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> [  125.649009] CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0 
> [  125.656142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
> [  125.663274] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
> [  125.670407] PKRU: 55555554 
> [  125.673119] Call Trace: 
> [  125.675575]  nvme_tcp_queue_rq+0xef/0x330 [nvme_tcp] 
> [  125.680537]  blk_mq_dispatch_rq_list+0x11c/0x7c0 
> [  125.685157]  ? blk_mq_flush_busy_ctxs+0xf6/0x110 
> [  125.689775]  __blk_mq_sched_dispatch_requests+0x12b/0x170 
> [  125.695175]  blk_mq_sched_dispatch_requests+0x30/0x60 
> [  125.700227]  __blk_mq_run_hw_queue+0x2b/0x60 
> [  125.704500]  process_one_work+0x1cb/0x360 
> [  125.708513]  ? process_one_work+0x360/0x360 
> [  125.712699]  worker_thread+0x30/0x370 
> [  125.716365]  ? process_one_work+0x360/0x360 
> [  125.720550]  kthread+0x116/0x130 
> [  125.723782]  ? kthread_park+0x80/0x80 
> [  125.727448]  ret_from_fork+0x1f/0x30 
The NVMe TCP does support merging for non-admin queues
(nvme_tcp_alloc_tagset()).

Based on the what is been done for the bvecs in other places in
kernel especially when merging is enabled bio split case seems to be
missing from tcp when building bvec. What I mean is following
completely untested patch :-

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 619b0d8f6e38..dabb2633b28c 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -222,7 +222,9 @@ static void nvme_tcp_init_iter(struct
nvme_tcp_request *req,
         unsigned int dir)
 {
     struct request *rq = blk_mq_rq_from_pdu(req);
-    struct bio_vec *vec;
+    struct req_iterator rq_iter;
+    struct bio_vec *vec, *tvec;
+    struct bio_vec tmp;
     unsigned int size;
     int nr_bvec;
     size_t offset;
@@ -233,17 +235,29 @@ static void nvme_tcp_init_iter(struct
nvme_tcp_request *req,
         size = blk_rq_payload_bytes(rq);
         offset = 0;
     } else {
-        struct bio *bio = req->curr_bio;
-        struct bvec_iter bi;
-        struct bio_vec bv;
-
-        vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
-        nr_bvec = 0;
-        bio_for_each_bvec(bv, bio, bi) {
+        rq_for_each_bvec(tmp, rq, rq_iter)
             nr_bvec++;
+
+        if (rq->bio != rq->biotail) {
+            vec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
+                    GFP_NOIO);
+            if (!vec)
+                return;
+
+            tvec = vec;
+            rq_for_each_bvec(tmp, rq, rq_iter) {
+                *vec = tmp;
+                vec++;
+            }
+            vec = tvec;
+            offset = 0;
+        } else {
+            struct bio *bio = req->curr_bio;
+
+            vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+            size = bio->bi_iter.bi_size;
+            offset = bio->bi_iter.bi_bvec_done;
         }
-        size = bio->bi_iter.bi_size;
-        offset = bio->bi_iter.bi_bvec_done;
     }
 
     iov_iter_bvec(&req->iter, dir, vec, nr_bvec, size);
@@ -2271,8 +2285,11 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct
nvme_ns *ns,
     req->data_len = blk_rq_nr_phys_segments(rq) ?
                 blk_rq_payload_bytes(rq) : 0;
     req->curr_bio = rq->bio;
-    if (req->curr_bio)
+    if (req->curr_bio) {
+        if (rq->bio != rq->biotail)
+            printk(KERN_INFO"%s %d req->bio != req->biotail\n",
__func__, __LINE__);
         nvme_tcp_init_iter(req, rq_data_dir(rq));
+    }
 
     if (rq_data_dir(rq) == WRITE &&
         req->data_len <= nvme_tcp_inline_data_size(queue))
-- 
2.22.1


Feel free to ignore this as I might be *completely wrong* here since I don't
have a reproducer, everything is working just fine on my machine.








^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012
  2021-02-07  5:25     ` Chaitanya Kulkarni
@ 2021-02-08  6:48       ` Yi Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Yi Zhang @ 2021-02-08  6:48 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme@lists.infradead.org, linux-block
  Cc: axboe@kernel.dk, Rachel Sibley, Sagi Grimberg, CKI Project



On 2/7/21 1:25 PM, Chaitanya Kulkarni wrote:
> Yi,
>
> On 2/6/21 20:51, Yi Zhang wrote:
>> The issue was introduced after merge the NVMe updates
>>
>> commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD)
>> Merge: 44d10e4b2f2c 0d7389718c32
>> Author: Jens Axboe <axboe@kernel.dk>
>> Date:   Tue Feb 2 07:12:06 2021 -0700
>>
>>       Merge branch 'for-5.12/drivers' into for-next
>>
>>       * for-5.12/drivers: (22 commits)
>>         nvme-tcp: use cancel tagset helper for tear down
>>         nvme-rdma: use cancel tagset helper for tear down
>>         nvme-tcp: add clean action for failed reconnection
>>         nvme-rdma: add clean action for failed reconnection
>>         nvme-core: add cancel tagset helpers
>>         nvme-core: get rid of the extra space
>>         nvme: add tracing of zns commands
>>         nvme: parse format nvm command details when tracing
>>         nvme: update enumerations for status codes
>>         nvmet: add lba to sect conversion helpers
>>         nvmet: remove extra variable in identify ns
>>         nvmet: remove extra variable in id-desclist
>>         nvmet: remove extra variable in smart log nsid
>>         nvme: refactor ns->ctrl by request
>>         nvme-tcp: pass multipage bvec to request iov_iter
>>         nvme-tcp: get rid of unused helper function
>>         nvme-tcp: fix wrong setting of request iov_iter
>>         nvme: support command retry delay for admin command
>>         nvme: constify static attribute_group structs
>>         nvmet-fc: use RCU proctection for assoc_list
>>
>>
>> On 2/6/21 11:08 AM, Yi Zhang wrote:
>>> blktests nvme-tcp/012
> Thanks for reporting, you can further bisect this using following
> NVMe tree if is from NVMEe tree :- http://git.infradead.org/nvme.git
> Branch :- 5.12
I've tried that branch, but encountered another NULL pointer.

[  224.290720] run blktests nvme/012 at 2021-02-08 01:46:25
[  224.515479] BUG: kernel NULL pointer dereference, address: 
0000000000000368
[  224.522442] #PF: supervisor read access in kernel mode
[  224.527590] #PF: error_code(0x0000) - not-present page
[  224.532737] PGD 0 P4D 0
[  224.535276] Oops: 0000 [#1] SMP PTI
[  224.538770] CPU: 0 PID: 2403 Comm: multipath Tainted: G S I       
5.11.0-rc5+ #1
[  224.546776] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 
2.10.0 11/12/2020
[  224.554340] RIP: 0010:bio_associate_blkg_from_css+0xbd/0x2c0
[  224.560009] Code: 03 75 79 65 48 ff 00 e8 b1 94 cc ff e8 ac 94 cc ff 
49 89 5c 24 48 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 44 24 
08 <48> 8b 808
[  224.578761] RSP: 0018:ffffad748ec5bd78 EFLAGS: 00010246
[  224.583988] RAX: 0000000000000000 RBX: ffffa0e96452b3c0 RCX: 
0000000000000000
[  224.591120] RDX: ffffa11077088000 RSI: ffffffffa06edb20 RDI: 
ffffa0e96452b3c0
[  224.598253] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000002
[  224.605385] R10: 0000000000000022 R11: 0000000000000c10 R12: 
ffffa0e96452b3c0
[  224.612516] R13: ffffffffa06edb20 R14: 0000000000001000 R15: 
0000000000000000
[  224.619649] FS:  00007fb6647feb00(0000) GS:ffffa107f0000000(0000) 
knlGS:0000000000000000
[  224.627734] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  224.633479] CR2: 0000000000000368 CR3: 0000002884b54002 CR4: 
00000000007706f0
[  224.640611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  224.647751] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  224.654884] PKRU: 55555554
[  224.657595] Call Trace:
[  224.660043]  bio_associate_blkg+0x20/0x70
[  224.664055]  nvme_submit_user_cmd+0xd7/0x340 [nvme_core]
[  224.669375]  nvme_user_cmd.isra.82+0x120/0x190 [nvme_core]
[  224.674859]  nvme_dev_ioctl+0x13b/0x150 [nvme_core]
[  224.679738]  __x64_sys_ioctl+0x84/0xc0
[  224.683500]  do_syscall_64+0x33/0x40
[  224.687088]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  224.692149] RIP: 0033:0x7fb66338161b
[  224.695726] Code: 0f 1e fa 48 8b 05 6d b8 2c 00 64 c7 00 26 00 00 00 
48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 
05 <48> 3d 018
[  224.714469] RSP: 002b:00007ffeae59aba8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[  224.722036] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007fb66338161b
[  224.729168] RDX: 00007ffeae59abb0 RSI: 00000000c0484e41 RDI: 
0000000000000003
[  224.736300] RBP: 00007ffeae59ac10 R08: 0000559af0bb3d20 R09: 
00007fb66340e580
[  224.743431] R10: 0000000000000000 R11: 0000000000000246 R12: 
00007ffeae59bcb0
[  224.750563] R13: 0000559af0bb3d20 R14: 0000000000000003 R15: 
0000000000000000
[  224.757697] Modules linked in: loop nvme_tcp nvme_fabrics nvme_core 
nvmet_tcp nvmet rfkill sunrpc dm_multipath intel_rapl_msr 
intel_rapl_common isst_if_cod
[  224.833264] CR2: 0000000000000368
[  224.836604] ---[ end trace fa77e7cc26f8cfc1 ]---
[  224.883915] RIP: 0010:bio_associate_blkg_from_css+0xbd/0x2c0
[  224.889584] Code: 03 75 79 65 48 ff 00 e8 b1 94 cc ff e8 ac 94 cc ff 
49 89 5c 24 48 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 44 24 
08 <48> 8b 808
[  224.908329] RSP: 0018:ffffad748ec5bd78 EFLAGS: 00010246
[  224.913554] RAX: 0000000000000000 RBX: ffffa0e96452b3c0 RCX: 
0000000000000000
[  224.920686] RDX: ffffa11077088000 RSI: ffffffffa06edb20 RDI: 
ffffa0e96452b3c0
[  224.927818] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000002
[  224.934948] R10: 0000000000000022 R11: 0000000000000c10 R12: 
ffffa0e96452b3c0
[  224.942082] R13: ffffffffa06edb20 R14: 0000000000001000 R15: 
0000000000000000
[  224.949213] FS:  00007fb6647feb00(0000) GS:ffffa107f0000000(0000) 
knlGS:0000000000000000
[  224.957301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  224.963044] CR2: 0000000000000368 CR3: 0000002884b54002 CR4: 
00000000007706f0
[  224.970177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  224.977309] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  224.984441] PKRU: 55555554
[  224.987155] Kernel panic - not syncing: Fatal exception
[  225.425279] Kernel Offset: 0x1d400000 from 0xffffffff81000000 
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  225.478268] ---[ end Kernel panic - not syncing: Fatal exception ]---

> Looking at the commit log it seems like nvme-tcp commits might be
> the issue given that you got the problem in nvme_tcp_init_iter()
> and just eliminating by the looking at the code
> commit 9f99a29e5307 can lead to this behavior.
>
> *I may be completely wrong as I'm not expert in the nvme-tcp host.*
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-06  3:08 ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Yi Zhang
  2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
  2021-02-07  7:14   ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Chaitanya Kulkarni
@ 2021-02-08  9:46   ` Sagi Grimberg
       [not found]     ` <5848858e-239d-acb2-fa24-c371a3360557@redhat.com>
  2 siblings, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-08  9:46 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme, linux-block; +Cc: axboe, Rachel Sibley, CKI Project


> Hello
> 
> We found this kernel NULL pointer issue with latest linux-block/for-next and it's 100% reproduced, let me know if you need more info/testing, thanks
> 
> Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
> 
> Reproducer: blktests nvme-tcp/012

Thanks for reporting Ming, I've tried to reproduce this on my VM
but did not succeed. Given that you have it 100% reproducible,
can you try to revert commit:

0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter

Also, would it be possible to share your config?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-07  7:14   ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Chaitanya Kulkarni
@ 2021-02-08  9:53     ` Sagi Grimberg
  0 siblings, 0 replies; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-08  9:53 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: axboe@kernel.dk, Rachel Sibley, Yi Zhang,
	linux-nvme@lists.infradead.org, linux-block, CKI Project



On 2/6/21 11:14 PM, Chaitanya Kulkarni wrote:
> Sagi,
> 
> On 2/5/21 19:19, Yi Zhang wrote:
>> Hello
>>
>> We found this kernel NULL pointer issue with latest linux-block/for-next and it's 100% reproduced, let me know if you need more info/testing, thanks
>>
>> Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>>
>> Reproducer: blktests nvme-tcp/012
>>
>>
>> [  124.458121] run blktests nvme/012 at 2021-02-05 21:53:34
>> [  125.525568] BUG: kernel NULL pointer dereference, address: 0000000000000008
>> [  125.532524] #PF: supervisor read access in kernel mode
>> [  125.537665] #PF: error_code(0x0000) - not-present page
>> [  125.542803] PGD 0 P4D 0
>> [  125.545343] Oops: 0000 [#1] SMP NOPTI
>> [  125.549009] CPU: 15 PID: 12069 Comm: kworker/15:2H Tainted: G S        I       5.11.0-rc6+ #1
>> [  125.557528] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020
>> [  125.565093] Workqueue: kblockd blk_mq_run_work_fn
>> [  125.569797] RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
>> [  125.575544] Code: 8b 75 68 44 8b 45 28 44 8b 7d 30 49 89 d4 48 c1 e2 04 4c 01 f2 45 89 fb 44 89 c7 85 ff 74 4d 44 89 e0 44 8b 55 10 48 c1 e0 04 <41> 8b 5c 06 08 45 0f b6 ca 89 d8 44 29 d8 39 f8 0f 47 c7 41 83 e9
>> [  125.594290] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246
>> [  125.599515] RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000
>> [  125.606648] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000
>> [  125.613781] RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000
>> [  125.620914] R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000
>> [  125.628045] R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000
>> [  125.635178] FS:  0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000
>> [  125.643264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  125.649009] CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0
>> [  125.656142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  125.663274] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  125.670407] PKRU: 55555554
>> [  125.673119] Call Trace:
>> [  125.675575]  nvme_tcp_queue_rq+0xef/0x330 [nvme_tcp]
>> [  125.680537]  blk_mq_dispatch_rq_list+0x11c/0x7c0
>> [  125.685157]  ? blk_mq_flush_busy_ctxs+0xf6/0x110
>> [  125.689775]  __blk_mq_sched_dispatch_requests+0x12b/0x170
>> [  125.695175]  blk_mq_sched_dispatch_requests+0x30/0x60
>> [  125.700227]  __blk_mq_run_hw_queue+0x2b/0x60
>> [  125.704500]  process_one_work+0x1cb/0x360
>> [  125.708513]  ? process_one_work+0x360/0x360
>> [  125.712699]  worker_thread+0x30/0x370
>> [  125.716365]  ? process_one_work+0x360/0x360
>> [  125.720550]  kthread+0x116/0x130
>> [  125.723782]  ? kthread_park+0x80/0x80
>> [  125.727448]  ret_from_fork+0x1f/0x30
> The NVMe TCP does support merging for non-admin queues
> (nvme_tcp_alloc_tagset()).
> 
> Based on the what is been done for the bvecs in other places in
> kernel especially when merging is enabled bio split case seems to be
> missing from tcp when building bvec. What I mean is following
> completely untested patch :-
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 619b0d8f6e38..dabb2633b28c 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -222,7 +222,9 @@ static void nvme_tcp_init_iter(struct
> nvme_tcp_request *req,
>           unsigned int dir)
>   {
>       struct request *rq = blk_mq_rq_from_pdu(req);
> -    struct bio_vec *vec;
> +    struct req_iterator rq_iter;
> +    struct bio_vec *vec, *tvec;
> +    struct bio_vec tmp;
>       unsigned int size;
>       int nr_bvec;
>       size_t offset;
> @@ -233,17 +235,29 @@ static void nvme_tcp_init_iter(struct
> nvme_tcp_request *req,
>           size = blk_rq_payload_bytes(rq);
>           offset = 0;
>       } else {
> -        struct bio *bio = req->curr_bio;
> -        struct bvec_iter bi;
> -        struct bio_vec bv;
> -
> -        vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> -        nr_bvec = 0;
> -        bio_for_each_bvec(bv, bio, bi) {
> +        rq_for_each_bvec(tmp, rq, rq_iter)
>               nr_bvec++;
> +
> +        if (rq->bio != rq->biotail) {
> +            vec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
> +                    GFP_NOIO);
> +            if (!vec)
> +                return;
> +
> +            tvec = vec;
> +            rq_for_each_bvec(tmp, rq, rq_iter) {
> +                *vec = tmp;
> +                vec++;
> +            }
> +            vec = tvec;
> +            offset = 0;
> +        } else {
> +            struct bio *bio = req->curr_bio;
> +
> +            vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> +            size = bio->bi_iter.bi_size;
> +            offset = bio->bi_iter.bi_bvec_done;
>           }
> -        size = bio->bi_iter.bi_size;
> -        offset = bio->bi_iter.bi_bvec_done;
>       }
>   
>       iov_iter_bvec(&req->iter, dir, vec, nr_bvec, size);
> @@ -2271,8 +2285,11 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct
> nvme_ns *ns,
>       req->data_len = blk_rq_nr_phys_segments(rq) ?
>                   blk_rq_payload_bytes(rq) : 0;
>       req->curr_bio = rq->bio;
> -    if (req->curr_bio)
> +    if (req->curr_bio) {
> +        if (rq->bio != rq->biotail)
> +            printk(KERN_INFO"%s %d req->bio != req->biotail\n",
> __func__, __LINE__);
>           nvme_tcp_init_iter(req, rq_data_dir(rq));
> +    }
>   
>       if (rq_data_dir(rq) == WRITE &&
>           req->data_len <= nvme_tcp_inline_data_size(queue))
> 

That cannot work, nvme-tcp basically will initialize the iter based
on the request bios, as it iterates it may continue if the request
spans more bios. We could change that, but this would mean we need to
allocate in the fast path which is something I'd prefer to avoid.
This is a regression so it must be coming from one of the latest
patches, so that should be addressed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
       [not found]     ` <5848858e-239d-acb2-fa24-c371a3360557@redhat.com>
@ 2021-02-08 17:54       ` Sagi Grimberg
  2021-02-08 18:42         ` Sagi Grimberg
  2021-02-09 10:25       ` Sagi Grimberg
  1 sibling, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-08 17:54 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme, linux-block
  Cc: axboe, Rachel Sibley, CKI Project, Chaitanya.Kulkarni


> Hi Sagi
> 
> On 2/8/21 5:46 PM, Sagi Grimberg wrote:
>>
>>> Hello
>>>
>>> We found this kernel NULL pointer issue with latest 
>>> linux-block/for-next and it's 100% reproduced, let me know if you 
>>> need more info/testing, thanks
>>>
>>> Kernel repo: 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>>> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>>>
>>> Reproducer: blktests nvme-tcp/012
>>
>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>> but did not succeed. Given that you have it 100% reproducible,
>> can you try to revert commit:
>>
>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>
> 
> Revert this commit fixed the issue and I've attached the config. :)

Good to know,

I see some differences that I should probably change to hit this:
--
@@ -254,14 +256,15 @@ CONFIG_PERF_EVENTS=y
  # end of Kernel Performance Events And Counters

  CONFIG_VM_EVENT_COUNTERS=y
+CONFIG_SLUB_DEBUG=y
  # CONFIG_COMPAT_BRK is not set
-CONFIG_SLAB=y
-# CONFIG_SLUB is not set
-# CONFIG_SLOB is not set
-CONFIG_SLAB_MERGE_DEFAULT=y
-# CONFIG_SLAB_FREELIST_RANDOM is not set
+# CONFIG_SLAB is not set
+CONFIG_SLUB=y
+# CONFIG_SLAB_MERGE_DEFAULT is not set
+CONFIG_SLAB_FREELIST_RANDOM=y
  # CONFIG_SLAB_FREELIST_HARDENED is not set
-# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
+CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
+CONFIG_SLUB_CPU_PARTIAL=y
  CONFIG_SYSTEM_DATA_VERIFICATION=y
  CONFIG_PROFILING=y
  CONFIG_TRACEPOINTS=y
@@ -299,7 +302,8 @@ CONFIG_HAVE_INTEL_TXT=y
  CONFIG_X86_64_SMP=y
  CONFIG_ARCH_SUPPORTS_UPROBES=y
  CONFIG_FIX_EARLYCON_MEM=y
-CONFIG_PGTABLE_LEVELS=4
+CONFIG_DYNAMIC_PHYSICAL_MASK=y
+CONFIG_PGTABLE_LEVELS=5
  CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
--

Probably CONFIG_SLUB and CONFIG_SLUB_DEBUG should be used.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-08 17:54       ` Sagi Grimberg
@ 2021-02-08 18:42         ` Sagi Grimberg
  2021-02-09  4:21           ` Ming Lei
  0 siblings, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-08 18:42 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme, linux-block
  Cc: axboe, Rachel Sibley, CKI Project, Chaitanya.Kulkarni


>> Hi Sagi
>>
>> On 2/8/21 5:46 PM, Sagi Grimberg wrote:
>>>
>>>> Hello
>>>>
>>>> We found this kernel NULL pointer issue with latest 
>>>> linux-block/for-next and it's 100% reproduced, let me know if you 
>>>> need more info/testing, thanks
>>>>
>>>> Kernel repo: 
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>>>> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>>>>
>>>> Reproducer: blktests nvme-tcp/012
>>>
>>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>>> but did not succeed. Given that you have it 100% reproducible,
>>> can you try to revert commit:
>>>
>>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>>
>>
>> Revert this commit fixed the issue and I've attached the config. :)
> 
> Good to know,
> 
> I see some differences that I should probably change to hit this:
> -- 
> @@ -254,14 +256,15 @@ CONFIG_PERF_EVENTS=y
>   # end of Kernel Performance Events And Counters
> 
>   CONFIG_VM_EVENT_COUNTERS=y
> +CONFIG_SLUB_DEBUG=y
>   # CONFIG_COMPAT_BRK is not set
> -CONFIG_SLAB=y
> -# CONFIG_SLUB is not set
> -# CONFIG_SLOB is not set
> -CONFIG_SLAB_MERGE_DEFAULT=y
> -# CONFIG_SLAB_FREELIST_RANDOM is not set
> +# CONFIG_SLAB is not set
> +CONFIG_SLUB=y
> +# CONFIG_SLAB_MERGE_DEFAULT is not set
> +CONFIG_SLAB_FREELIST_RANDOM=y
>   # CONFIG_SLAB_FREELIST_HARDENED is not set
> -# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
> +CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> +CONFIG_SLUB_CPU_PARTIAL=y
>   CONFIG_SYSTEM_DATA_VERIFICATION=y
>   CONFIG_PROFILING=y
>   CONFIG_TRACEPOINTS=y
> @@ -299,7 +302,8 @@ CONFIG_HAVE_INTEL_TXT=y
>   CONFIG_X86_64_SMP=y
>   CONFIG_ARCH_SUPPORTS_UPROBES=y
>   CONFIG_FIX_EARLYCON_MEM=y
> -CONFIG_PGTABLE_LEVELS=4
> +CONFIG_DYNAMIC_PHYSICAL_MASK=y
> +CONFIG_PGTABLE_LEVELS=5
>   CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
> -- 
> 
> Probably CONFIG_SLUB and CONFIG_SLUB_DEBUG should be used.

Used your profile and this still does not happen :(

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-08 18:42         ` Sagi Grimberg
@ 2021-02-09  4:21           ` Ming Lei
  2021-02-09  7:21             ` Sagi Grimberg
  0 siblings, 1 reply; 27+ messages in thread
From: Ming Lei @ 2021-02-09  4:21 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Yi Zhang, linux-nvme, linux-block, axboe, Rachel Sibley,
	Chaitanya.Kulkarni, CKI Project

On Mon, Feb 08, 2021 at 10:42:28AM -0800, Sagi Grimberg wrote:
> 
> > > Hi Sagi
> > > 
> > > On 2/8/21 5:46 PM, Sagi Grimberg wrote:
> > > > 
> > > > > Hello
> > > > > 
> > > > > We found this kernel NULL pointer issue with latest
> > > > > linux-block/for-next and it's 100% reproduced, let me know
> > > > > if you need more info/testing, thanks
> > > > > 
> > > > > Kernel repo:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
> > > > > Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
> > > > > 
> > > > > Reproducer: blktests nvme-tcp/012
> > > > 
> > > > Thanks for reporting Ming, I've tried to reproduce this on my VM
> > > > but did not succeed. Given that you have it 100% reproducible,
> > > > can you try to revert commit:
> > > > 
> > > > 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
> > > > 
> > > 
> > > Revert this commit fixed the issue and I've attached the config. :)
> > 
> > Good to know,
> > 
> > I see some differences that I should probably change to hit this:
> > -- 
> > @@ -254,14 +256,15 @@ CONFIG_PERF_EVENTS=y
> >   # end of Kernel Performance Events And Counters
> > 
> >   CONFIG_VM_EVENT_COUNTERS=y
> > +CONFIG_SLUB_DEBUG=y
> >   # CONFIG_COMPAT_BRK is not set
> > -CONFIG_SLAB=y
> > -# CONFIG_SLUB is not set
> > -# CONFIG_SLOB is not set
> > -CONFIG_SLAB_MERGE_DEFAULT=y
> > -# CONFIG_SLAB_FREELIST_RANDOM is not set
> > +# CONFIG_SLAB is not set
> > +CONFIG_SLUB=y
> > +# CONFIG_SLAB_MERGE_DEFAULT is not set
> > +CONFIG_SLAB_FREELIST_RANDOM=y
> >   # CONFIG_SLAB_FREELIST_HARDENED is not set
> > -# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
> > +CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> > +CONFIG_SLUB_CPU_PARTIAL=y
> >   CONFIG_SYSTEM_DATA_VERIFICATION=y
> >   CONFIG_PROFILING=y
> >   CONFIG_TRACEPOINTS=y
> > @@ -299,7 +302,8 @@ CONFIG_HAVE_INTEL_TXT=y
> >   CONFIG_X86_64_SMP=y
> >   CONFIG_ARCH_SUPPORTS_UPROBES=y
> >   CONFIG_FIX_EARLYCON_MEM=y
> > -CONFIG_PGTABLE_LEVELS=4
> > +CONFIG_DYNAMIC_PHYSICAL_MASK=y
> > +CONFIG_PGTABLE_LEVELS=5
> >   CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
> > -- 
> > 
> > Probably CONFIG_SLUB and CONFIG_SLUB_DEBUG should be used.
> 
> Used your profile and this still does not happen :(

One obvious error is that nr_segments is computed wrong.

Yi, can you try the following patch?

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 881d28eb15e9..a393d99b74e1 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
 		offset = 0;
 	} else {
 		struct bio *bio = req->curr_bio;
+		struct bio_vec bv;
+		struct bvec_iter iter;
+
+		nsegs = 0;
+		bio_for_each_bvec(bv, bio, iter)
+			nsegs++;
 
 		vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
-		nsegs = bio_segments(bio);
 		size = bio->bi_iter.bi_size;
 		offset = bio->bi_iter.bi_bvec_done;
 	}

-- 
Ming


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09  4:21           ` Ming Lei
@ 2021-02-09  7:21             ` Sagi Grimberg
  2021-02-09  7:50               ` Ming Lei
  0 siblings, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-09  7:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, linux-nvme, linux-block, axboe, Rachel Sibley,
	Chaitanya.Kulkarni, CKI Project



On 2/8/21 8:21 PM, Ming Lei wrote:
> On Mon, Feb 08, 2021 at 10:42:28AM -0800, Sagi Grimberg wrote:
>>
>>>> Hi Sagi
>>>>
>>>> On 2/8/21 5:46 PM, Sagi Grimberg wrote:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> We found this kernel NULL pointer issue with latest
>>>>>> linux-block/for-next and it's 100% reproduced, let me know
>>>>>> if you need more info/testing, thanks
>>>>>>
>>>>>> Kernel repo:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>>>>>> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>>>>>>
>>>>>> Reproducer: blktests nvme-tcp/012
>>>>>
>>>>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>>>>> but did not succeed. Given that you have it 100% reproducible,
>>>>> can you try to revert commit:
>>>>>
>>>>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>>>>
>>>>
>>>> Revert this commit fixed the issue and I've attached the config. :)
>>>
>>> Good to know,
>>>
>>> I see some differences that I should probably change to hit this:
>>> -- 
>>> @@ -254,14 +256,15 @@ CONFIG_PERF_EVENTS=y
>>>    # end of Kernel Performance Events And Counters
>>>
>>>    CONFIG_VM_EVENT_COUNTERS=y
>>> +CONFIG_SLUB_DEBUG=y
>>>    # CONFIG_COMPAT_BRK is not set
>>> -CONFIG_SLAB=y
>>> -# CONFIG_SLUB is not set
>>> -# CONFIG_SLOB is not set
>>> -CONFIG_SLAB_MERGE_DEFAULT=y
>>> -# CONFIG_SLAB_FREELIST_RANDOM is not set
>>> +# CONFIG_SLAB is not set
>>> +CONFIG_SLUB=y
>>> +# CONFIG_SLAB_MERGE_DEFAULT is not set
>>> +CONFIG_SLAB_FREELIST_RANDOM=y
>>>    # CONFIG_SLAB_FREELIST_HARDENED is not set
>>> -# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
>>> +CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
>>> +CONFIG_SLUB_CPU_PARTIAL=y
>>>    CONFIG_SYSTEM_DATA_VERIFICATION=y
>>>    CONFIG_PROFILING=y
>>>    CONFIG_TRACEPOINTS=y
>>> @@ -299,7 +302,8 @@ CONFIG_HAVE_INTEL_TXT=y
>>>    CONFIG_X86_64_SMP=y
>>>    CONFIG_ARCH_SUPPORTS_UPROBES=y
>>>    CONFIG_FIX_EARLYCON_MEM=y
>>> -CONFIG_PGTABLE_LEVELS=4
>>> +CONFIG_DYNAMIC_PHYSICAL_MASK=y
>>> +CONFIG_PGTABLE_LEVELS=5
>>>    CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
>>> -- 
>>>
>>> Probably CONFIG_SLUB and CONFIG_SLUB_DEBUG should be used.
>>
>> Used your profile and this still does not happen :(
> 
> One obvious error is that nr_segments is computed wrong.
> 
> Yi, can you try the following patch?
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 881d28eb15e9..a393d99b74e1 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
>   		offset = 0;
>   	} else {
>   		struct bio *bio = req->curr_bio;
> +		struct bio_vec bv;
> +		struct bvec_iter iter;
> +
> +		nsegs = 0;
> +		bio_for_each_bvec(bv, bio, iter)
> +			nsegs++;
>   
>   		vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> -		nsegs = bio_segments(bio);

This was exactly the patch that caused the issue.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09  7:21             ` Sagi Grimberg
@ 2021-02-09  7:50               ` Ming Lei
  2021-02-09  8:34                 ` Chaitanya Kulkarni
  2021-02-09 10:07                 ` Sagi Grimberg
  0 siblings, 2 replies; 27+ messages in thread
From: Ming Lei @ 2021-02-09  7:50 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Yi Zhang, linux-nvme, linux-block, axboe, Rachel Sibley,
	Chaitanya.Kulkarni, CKI Project

On Mon, Feb 08, 2021 at 11:21:53PM -0800, Sagi Grimberg wrote:
> 
> 
> On 2/8/21 8:21 PM, Ming Lei wrote:
> > On Mon, Feb 08, 2021 at 10:42:28AM -0800, Sagi Grimberg wrote:
> > > 
> > > > > Hi Sagi
> > > > > 
> > > > > On 2/8/21 5:46 PM, Sagi Grimberg wrote:
> > > > > > 
> > > > > > > Hello
> > > > > > > 
> > > > > > > We found this kernel NULL pointer issue with latest
> > > > > > > linux-block/for-next and it's 100% reproduced, let me know
> > > > > > > if you need more info/testing, thanks
> > > > > > > 
> > > > > > > Kernel repo:
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
> > > > > > > Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
> > > > > > > 
> > > > > > > Reproducer: blktests nvme-tcp/012
> > > > > > 
> > > > > > Thanks for reporting Ming, I've tried to reproduce this on my VM
> > > > > > but did not succeed. Given that you have it 100% reproducible,
> > > > > > can you try to revert commit:
> > > > > > 
> > > > > > 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
> > > > > > 
> > > > > 
> > > > > Revert this commit fixed the issue and I've attached the config. :)
> > > > 
> > > > Good to know,
> > > > 
> > > > I see some differences that I should probably change to hit this:
> > > > -- 
> > > > @@ -254,14 +256,15 @@ CONFIG_PERF_EVENTS=y
> > > >    # end of Kernel Performance Events And Counters
> > > > 
> > > >    CONFIG_VM_EVENT_COUNTERS=y
> > > > +CONFIG_SLUB_DEBUG=y
> > > >    # CONFIG_COMPAT_BRK is not set
> > > > -CONFIG_SLAB=y
> > > > -# CONFIG_SLUB is not set
> > > > -# CONFIG_SLOB is not set
> > > > -CONFIG_SLAB_MERGE_DEFAULT=y
> > > > -# CONFIG_SLAB_FREELIST_RANDOM is not set
> > > > +# CONFIG_SLAB is not set
> > > > +CONFIG_SLUB=y
> > > > +# CONFIG_SLAB_MERGE_DEFAULT is not set
> > > > +CONFIG_SLAB_FREELIST_RANDOM=y
> > > >    # CONFIG_SLAB_FREELIST_HARDENED is not set
> > > > -# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
> > > > +CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
> > > > +CONFIG_SLUB_CPU_PARTIAL=y
> > > >    CONFIG_SYSTEM_DATA_VERIFICATION=y
> > > >    CONFIG_PROFILING=y
> > > >    CONFIG_TRACEPOINTS=y
> > > > @@ -299,7 +302,8 @@ CONFIG_HAVE_INTEL_TXT=y
> > > >    CONFIG_X86_64_SMP=y
> > > >    CONFIG_ARCH_SUPPORTS_UPROBES=y
> > > >    CONFIG_FIX_EARLYCON_MEM=y
> > > > -CONFIG_PGTABLE_LEVELS=4
> > > > +CONFIG_DYNAMIC_PHYSICAL_MASK=y
> > > > +CONFIG_PGTABLE_LEVELS=5
> > > >    CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
> > > > -- 
> > > > 
> > > > Probably CONFIG_SLUB and CONFIG_SLUB_DEBUG should be used.
> > > 
> > > Used your profile and this still does not happen :(
> > 
> > One obvious error is that nr_segments is computed wrong.
> > 
> > Yi, can you try the following patch?
> > 
> > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> > index 881d28eb15e9..a393d99b74e1 100644
> > --- a/drivers/nvme/host/tcp.c
> > +++ b/drivers/nvme/host/tcp.c
> > @@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
> >   		offset = 0;
> >   	} else {
> >   		struct bio *bio = req->curr_bio;
> > +		struct bio_vec bv;
> > +		struct bvec_iter iter;
> > +
> > +		nsegs = 0;
> > +		bio_for_each_bvec(bv, bio, iter)
> > +			nsegs++;
> >   		vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> > -		nsegs = bio_segments(bio);
> 
> This was exactly the patch that caused the issue.

What was the issue you are talking about? Any link or commit hash?

nvme-tcp builds iov_iter(BVEC) from __bvec_iter_bvec(), the segment
number has to be the actual bvec number. But bio_segment() just returns
number of the single-page segment, which is wrong for iov_iter.

Please see the same usage in lo_rw_aio().

-- 
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09  7:50               ` Ming Lei
@ 2021-02-09  8:34                 ` Chaitanya Kulkarni
  2021-02-09 10:09                   ` Sagi Grimberg
  2021-02-09 10:07                 ` Sagi Grimberg
  1 sibling, 1 reply; 27+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-09  8:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Sagi Grimberg, Yi Zhang, linux-nvme@lists.infradead.org,
	linux-block, axboe@kernel.dk, Rachel Sibley, CKI Project


> On Feb 8, 2021, at 11:50 PM, Ming Lei <ming.lei@redhat.com> wrote:
> 
> On Mon, Feb 08, 2021 at 11:21:53PM -0800, Sagi Grimberg wrote:
>> 
>> 
>>> On 2/8/21 8:21 PM, Ming Lei wrote:
>>> On Mon, Feb 08, 2021 at 10:42:28AM -0800, Sagi Grimberg wrote:
>>>> 
>>>>>> Hi Sagi
>>>>>> 
>>>>>> On 2/8/21 5:46 PM, Sagi Grimberg wrote:
>>>>>>> 
>>>>>>>> Hello
>>>>>>>> 
>>>>>>>> We found this kernel NULL pointer issue with latest
>>>>>>>> linux-block/for-next and it's 100% reproduced, let me know
>>>>>>>> if you need more info/testing, thanks
>>>>>>>> 
>>>>>>>> Kernel repo:
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>>>>>>>> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>>>>>>>> 
>>>>>>>> Reproducer: blktests nvme-tcp/012
>>>>>>> 
>>>>>>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>>>>>>> but did not succeed. Given that you have it 100% reproducible,
>>>>>>> can you try to revert commit:
>>>>>>> 
>>>>>>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>>>>>> 
>>>>>> 
>>>>>> Revert this commit fixed the issue and I've attached the config. :)
>>>>> 
>>>>> Good to know,
>>>>> 
>>>>> I see some differences that I should probably change to hit this:
>>>>> -- 
>>>>> @@ -254,14 +256,15 @@ CONFIG_PERF_EVENTS=y
>>>>>   # end of Kernel Performance Events And Counters
>>>>> 
>>>>>   CONFIG_VM_EVENT_COUNTERS=y
>>>>> +CONFIG_SLUB_DEBUG=y
>>>>>   # CONFIG_COMPAT_BRK is not set
>>>>> -CONFIG_SLAB=y
>>>>> -# CONFIG_SLUB is not set
>>>>> -# CONFIG_SLOB is not set
>>>>> -CONFIG_SLAB_MERGE_DEFAULT=y
>>>>> -# CONFIG_SLAB_FREELIST_RANDOM is not set
>>>>> +# CONFIG_SLAB is not set
>>>>> +CONFIG_SLUB=y
>>>>> +# CONFIG_SLAB_MERGE_DEFAULT is not set
>>>>> +CONFIG_SLAB_FREELIST_RANDOM=y
>>>>>   # CONFIG_SLAB_FREELIST_HARDENED is not set
>>>>> -# CONFIG_SHUFFLE_PAGE_ALLOCATOR is not set
>>>>> +CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
>>>>> +CONFIG_SLUB_CPU_PARTIAL=y
>>>>>   CONFIG_SYSTEM_DATA_VERIFICATION=y
>>>>>   CONFIG_PROFILING=y
>>>>>   CONFIG_TRACEPOINTS=y
>>>>> @@ -299,7 +302,8 @@ CONFIG_HAVE_INTEL_TXT=y
>>>>>   CONFIG_X86_64_SMP=y
>>>>>   CONFIG_ARCH_SUPPORTS_UPROBES=y
>>>>>   CONFIG_FIX_EARLYCON_MEM=y
>>>>> -CONFIG_PGTABLE_LEVELS=4
>>>>> +CONFIG_DYNAMIC_PHYSICAL_MASK=y
>>>>> +CONFIG_PGTABLE_LEVELS=5
>>>>>   CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
>>>>> -- 
>>>>> 
>>>>> Probably CONFIG_SLUB and CONFIG_SLUB_DEBUG should be used.
>>>> 
>>>> Used your profile and this still does not happen :(
>>> 
>>> One obvious error is that nr_segments is computed wrong.
>>> 
>>> Yi, can you try the following patch?
>>> 
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index 881d28eb15e9..a393d99b74e1 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
>>>          offset = 0;
>>>      } else {
>>>          struct bio *bio = req->curr_bio;
>>> +        struct bio_vec bv;
>>> +        struct bvec_iter iter;
>>> +
>>> +        nsegs = 0;
>>> +        bio_for_each_bvec(bv, bio, iter)
>>> +            nsegs++;
>>>          vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>>> -        nsegs = bio_segments(bio);
>> 
>> This was exactly the patch that caused the issue.
> 
> What was the issue you are talking about? Any link or commit hash?
> 
> nvme-tcp builds iov_iter(BVEC) from __bvec_iter_bvec(), the segment
> number has to be the actual bvec number. But bio_segment() just returns
> number of the single-page segment, which is wrong for iov_iter.
> 
> Please see the same usage in lo_rw_aio().
> 
That what I have suggested but I've also suggested the memory allocation part which Sagi explained why it is better to avoid. 

In my opinion we should at least try bvec calculation in lo_aio_rw() and see the problem can be fixed or not, unless reverting the commit it right approach for some reason. 
> -- 
> Ming
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09  7:50               ` Ming Lei
  2021-02-09  8:34                 ` Chaitanya Kulkarni
@ 2021-02-09 10:07                 ` Sagi Grimberg
  2021-02-09 10:33                   ` Ming Lei
  1 sibling, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-09 10:07 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, linux-nvme, linux-block, axboe, Rachel Sibley,
	Chaitanya.Kulkarni, CKI Project


>>>
>>> One obvious error is that nr_segments is computed wrong.
>>>
>>> Yi, can you try the following patch?
>>>
>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>> index 881d28eb15e9..a393d99b74e1 100644
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>> @@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
>>>    		offset = 0;
>>>    	} else {
>>>    		struct bio *bio = req->curr_bio;
>>> +		struct bio_vec bv;
>>> +		struct bvec_iter iter;
>>> +
>>> +		nsegs = 0;
>>> +		bio_for_each_bvec(bv, bio, iter)
>>> +			nsegs++;
>>>    		vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>>> -		nsegs = bio_segments(bio);
>>
>> This was exactly the patch that caused the issue.
> 
> What was the issue you are talking about? Any link or commit hash?

The commit that caused the crash is:
0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter

> 
> nvme-tcp builds iov_iter(BVEC) from __bvec_iter_bvec(), the segment
> number has to be the actual bvec number. But bio_segment() just returns
> number of the single-page segment, which is wrong for iov_iter.

That is what I thought, but its causing a crash, and was fine with
bio_segments. So I'm trying to understand why is that.

> Please see the same usage in lo_rw_aio().

nvme-tcp works on the bio basis to avoid bvec allocation
in the data path. Hence the iterator is fed directly by
the bio bvec and will re-initialize on every bio that
is spanned by the request.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09  8:34                 ` Chaitanya Kulkarni
@ 2021-02-09 10:09                   ` Sagi Grimberg
  0 siblings, 0 replies; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-09 10:09 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Ming Lei
  Cc: axboe@kernel.dk, Rachel Sibley, Yi Zhang,
	linux-nvme@lists.infradead.org, linux-block, CKI Project


>>>> Yi, can you try the following patch?
>>>>
>>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>>> index 881d28eb15e9..a393d99b74e1 100644
>>>> --- a/drivers/nvme/host/tcp.c
>>>> +++ b/drivers/nvme/host/tcp.c
>>>> @@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
>>>>           offset = 0;
>>>>       } else {
>>>>           struct bio *bio = req->curr_bio;
>>>> +        struct bio_vec bv;
>>>> +        struct bvec_iter iter;
>>>> +
>>>> +        nsegs = 0;
>>>> +        bio_for_each_bvec(bv, bio, iter)
>>>> +            nsegs++;
>>>>           vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>>>> -        nsegs = bio_segments(bio);
>>>
>>> This was exactly the patch that caused the issue.
>>
>> What was the issue you are talking about? Any link or commit hash?
>>
>> nvme-tcp builds iov_iter(BVEC) from __bvec_iter_bvec(), the segment
>> number has to be the actual bvec number. But bio_segment() just returns
>> number of the single-page segment, which is wrong for iov_iter.
>>
>> Please see the same usage in lo_rw_aio().
>>
> That what I have suggested but I've also suggested the memory allocation part which Sagi explained why it is better to avoid.
> 
> In my opinion we should at least try bvec calculation in lo_aio_rw() and see the problem can be fixed or not, unless reverting the commit it right approach for some reason.

I'm trying to understand what this is, but I'm failing to reproduce
this. I may ask Yi to add some debug code for this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
       [not found]     ` <5848858e-239d-acb2-fa24-c371a3360557@redhat.com>
  2021-02-08 17:54       ` Sagi Grimberg
@ 2021-02-09 10:25       ` Sagi Grimberg
  2021-02-09 12:57         ` Yi Zhang
  1 sibling, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-09 10:25 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme, linux-block
  Cc: axboe, Rachel Sibley, CKI Project, Chaitanya.Kulkarni


> Hi Sagi
> 
> On 2/8/21 5:46 PM, Sagi Grimberg wrote:
>>
>>> Hello
>>>
>>> We found this kernel NULL pointer issue with latest 
>>> linux-block/for-next and it's 100% reproduced, let me know if you 
>>> need more info/testing, thanks
>>>
>>> Kernel repo: 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>>> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>>>
>>> Reproducer: blktests nvme-tcp/012
>>
>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>> but did not succeed. Given that you have it 100% reproducible,
>> can you try to revert commit:
>>
>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>
> 
> Revert this commit fixed the issue and I've attached the config. :)

Hey Ming,

Instead of revert, does this patch makes the issue go away?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 619b0d8f6e38..69f59d2c5799 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2271,7 +2271,7 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct 
nvme_ns *ns,
         req->data_len = blk_rq_nr_phys_segments(rq) ?
                                 blk_rq_payload_bytes(rq) : 0;
         req->curr_bio = rq->bio;
-       if (req->curr_bio)
+       if (req->curr_bio && req->data_len)
                 nvme_tcp_init_iter(req, rq_data_dir(rq));

         if (rq_data_dir(rq) == WRITE &&
--

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09 10:07                 ` Sagi Grimberg
@ 2021-02-09 10:33                   ` Ming Lei
  2021-02-09 10:36                     ` Sagi Grimberg
  0 siblings, 1 reply; 27+ messages in thread
From: Ming Lei @ 2021-02-09 10:33 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Yi Zhang, linux-nvme, linux-block, axboe, Rachel Sibley,
	Chaitanya.Kulkarni, CKI Project

On Tue, Feb 09, 2021 at 02:07:15AM -0800, Sagi Grimberg wrote:
> 
> > > > 
> > > > One obvious error is that nr_segments is computed wrong.
> > > > 
> > > > Yi, can you try the following patch?
> > > > 
> > > > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> > > > index 881d28eb15e9..a393d99b74e1 100644
> > > > --- a/drivers/nvme/host/tcp.c
> > > > +++ b/drivers/nvme/host/tcp.c
> > > > @@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
> > > >    		offset = 0;
> > > >    	} else {
> > > >    		struct bio *bio = req->curr_bio;
> > > > +		struct bio_vec bv;
> > > > +		struct bvec_iter iter;
> > > > +
> > > > +		nsegs = 0;
> > > > +		bio_for_each_bvec(bv, bio, iter)
> > > > +			nsegs++;
> > > >    		vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> > > > -		nsegs = bio_segments(bio);
> > > 
> > > This was exactly the patch that caused the issue.
> > 
> > What was the issue you are talking about? Any link or commit hash?
> 
> The commit that caused the crash is:
> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter

Not found this commit in linus tree, :-(

> 
> > 
> > nvme-tcp builds iov_iter(BVEC) from __bvec_iter_bvec(), the segment
> > number has to be the actual bvec number. But bio_segment() just returns
> > number of the single-page segment, which is wrong for iov_iter.
> 
> That is what I thought, but its causing a crash, and was fine with
> bio_segments. So I'm trying to understand why is that.

I tested this patch, and it works just fine.

> 
> > Please see the same usage in lo_rw_aio().
> 
> nvme-tcp works on the bio basis to avoid bvec allocation
> in the data path. Hence the iterator is fed directly by
> the bio bvec and will re-initialize on every bio that
> is spanned by the request.

Yeah, I know that. What I meant is that rq_for_each_bvec() is used
to figure out bvec number in loop, which may feed the bio bvec
directly to fs via iov_iter too, just similar with nvme-tcp.

The difference is that loop will switch to allocate a new bvec
table and copy bios's bvec to the new table in case of bios merge.

-- 
Ming


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09 10:33                   ` Ming Lei
@ 2021-02-09 10:36                     ` Sagi Grimberg
  0 siblings, 0 replies; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-09 10:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yi Zhang, linux-nvme, linux-block, axboe, Rachel Sibley,
	Chaitanya.Kulkarni, CKI Project



On 2/9/21 2:33 AM, Ming Lei wrote:
> On Tue, Feb 09, 2021 at 02:07:15AM -0800, Sagi Grimberg wrote:
>>
>>>>>
>>>>> One obvious error is that nr_segments is computed wrong.
>>>>>
>>>>> Yi, can you try the following patch?
>>>>>
>>>>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>>>>> index 881d28eb15e9..a393d99b74e1 100644
>>>>> --- a/drivers/nvme/host/tcp.c
>>>>> +++ b/drivers/nvme/host/tcp.c
>>>>> @@ -239,9 +239,14 @@ static void nvme_tcp_init_iter(struct nvme_tcp_request *req,
>>>>>     		offset = 0;
>>>>>     	} else {
>>>>>     		struct bio *bio = req->curr_bio;
>>>>> +		struct bio_vec bv;
>>>>> +		struct bvec_iter iter;
>>>>> +
>>>>> +		nsegs = 0;
>>>>> +		bio_for_each_bvec(bv, bio, iter)
>>>>> +			nsegs++;
>>>>>     		vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
>>>>> -		nsegs = bio_segments(bio);
>>>>
>>>> This was exactly the patch that caused the issue.
>>>
>>> What was the issue you are talking about? Any link or commit hash?
>>
>> The commit that caused the crash is:
>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
> 
> Not found this commit in linus tree, :-(

The original report is on:
Kernel repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next

>>> nvme-tcp builds iov_iter(BVEC) from __bvec_iter_bvec(), the segment
>>> number has to be the actual bvec number. But bio_segment() just returns
>>> number of the single-page segment, which is wrong for iov_iter.
>>
>> That is what I thought, but its causing a crash, and was fine with
>> bio_segments. So I'm trying to understand why is that.
> 
> I tested this patch, and it works just fine.

Me too, but Yi hits this crash, I even recompiled with his
config, but still no luck.

>>> Please see the same usage in lo_rw_aio().
>>
>> nvme-tcp works on the bio basis to avoid bvec allocation
>> in the data path. Hence the iterator is fed directly by
>> the bio bvec and will re-initialize on every bio that
>> is spanned by the request.
> 
> Yeah, I know that. What I meant is that rq_for_each_bvec() is used
> to figure out bvec number in loop, which may feed the bio bvec
> directly to fs via iov_iter too, just similar with nvme-tcp.
> 
> The difference is that loop will switch to allocate a new bvec
> table and copy bios's bvec to the new table in case of bios merge.

So in nvme-tcp used bio_for_each_bvec which seems appropriate, just
need to understand what is causing this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09 10:25       ` Sagi Grimberg
@ 2021-02-09 12:57         ` Yi Zhang
  2021-02-09 18:01           ` Sagi Grimberg
  0 siblings, 1 reply; 27+ messages in thread
From: Yi Zhang @ 2021-02-09 12:57 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, linux-block
  Cc: axboe, Rachel Sibley, CKI Project, Chaitanya.Kulkarni, Ming Lei



On 2/9/21 6:25 PM, Sagi Grimberg wrote:
>
>> Hi Sagi
>>
>> On 2/8/21 5:46 PM, Sagi Grimberg wrote:
>>>
>>>> Hello
>>>>
>>>> We found this kernel NULL pointer issue with latest 
>>>> linux-block/for-next and it's 100% reproduced, let me know if you 
>>>> need more info/testing, thanks
>>>>
>>>> Kernel repo: 
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>>>> Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next
>>>>
>>>> Reproducer: blktests nvme-tcp/012
>>>
>>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>>> but did not succeed. Given that you have it 100% reproducible,
>>> can you try to revert commit:
>>>
>>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>>
>>
>> Revert this commit fixed the issue and I've attached the config. :)
>
> Hey Ming,
>
> Instead of revert, does this patch makes the issue go away?
Hi Sagi

Below patch fixed the issue, let me know if you need more testing. :)

Thanks
Yi

> -- 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 619b0d8f6e38..69f59d2c5799 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2271,7 +2271,7 @@ static blk_status_t 
> nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
>         req->data_len = blk_rq_nr_phys_segments(rq) ?
>                                 blk_rq_payload_bytes(rq) : 0;
>         req->curr_bio = rq->bio;
> -       if (req->curr_bio)
> +       if (req->curr_bio && req->data_len)
>                 nvme_tcp_init_iter(req, rq_data_dir(rq));
>
>         if (rq_data_dir(rq) == WRITE &&
> -- 
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09 12:57         ` Yi Zhang
@ 2021-02-09 18:01           ` Sagi Grimberg
  2021-02-10  2:51             ` Yi Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-09 18:01 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme, linux-block
  Cc: axboe, Rachel Sibley, CKI Project, Chaitanya.Kulkarni, Ming Lei


>>>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>>>> but did not succeed. Given that you have it 100% reproducible,
>>>> can you try to revert commit:
>>>>
>>>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>>>
>>>
>>> Revert this commit fixed the issue and I've attached the config. :)
>>
>> Hey Ming,
>>
>> Instead of revert, does this patch makes the issue go away?
> Hi Sagi
> 
> Below patch fixed the issue, let me know if you need more testing. :)

Thanks Yi,

I'll submit a proper patch, but can you run this change
to see what command has a bio but without any data?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 619b0d8f6e38..311f1b78a9d4 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2271,8 +2271,13 @@ static blk_status_t nvme_tcp_setup_cmd_pdu(struct 
nvme_ns *ns,
         req->data_len = blk_rq_nr_phys_segments(rq) ?
                                 blk_rq_payload_bytes(rq) : 0;
         req->curr_bio = rq->bio;
-       if (req->curr_bio)
+       if (req->curr_bio) {
+               if (!req->data_len) {
+                       pr_err("rq %d opcode %d\n", rq->tag, 
pdu->cmd.common.opcode);
+                       return BLK_STS_IOERR;
+               }
                 nvme_tcp_init_iter(req, rq_data_dir(rq));
+       }

         if (rq_data_dir(rq) == WRITE &&
             req->data_len <= nvme_tcp_inline_data_size(queue))
--

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-09 18:01           ` Sagi Grimberg
@ 2021-02-10  2:51             ` Yi Zhang
  2021-02-10  3:38               ` Keith Busch
  2021-02-10  5:03               ` Chaitanya Kulkarni
  0 siblings, 2 replies; 27+ messages in thread
From: Yi Zhang @ 2021-02-10  2:51 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, linux-block
  Cc: axboe, Rachel Sibley, CKI Project, Chaitanya.Kulkarni, Ming Lei



On 2/10/21 2:01 AM, Sagi Grimberg wrote:
>
>>>>> Thanks for reporting Ming, I've tried to reproduce this on my VM
>>>>> but did not succeed. Given that you have it 100% reproducible,
>>>>> can you try to revert commit:
>>>>>
>>>>> 0dc9edaf80ea nvme-tcp: pass multipage bvec to request iov_iter
>>>>>
>>>>
>>>> Revert this commit fixed the issue and I've attached the config. :)
>>>
>>> Hey Ming,
>>>
>>> Instead of revert, does this patch makes the issue go away?
>> Hi Sagi
>>
>> Below patch fixed the issue, let me know if you need more testing. :)
>
> Thanks Yi,
>
So it's nvme_admin_abort_cmd here

[   74.017450] run blktests nvme/012 at 2021-02-09 21:41:55
[   74.111311] loop: module loaded
[   74.125717] loop0: detected capacity change from 2097152 to 0
[   74.141026] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   74.149395] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   74.158298] nvmet: creating controller 1 for subsystem 
blktests-subsystem-1 for NQN 
nqn.2014-08.org.nvmexpress:uuid:41131d88-02ca-4ccc-87b3-6ca3f28b13a4.
[   74.158742] nvme nvme0: creating 48 I/O queues.
[   74.163391] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[   74.184623] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
127.0.0.1:4420
[   75.235059] nvme_tcp: rq 38 opcode 8
[   75.238653] blk_update_request: I/O error, dev nvme0c0n1, sector 
1048624 op 0x9:(WRITE_ZEROES) flags 0x2800800 phys_seg 0 prio class 0
[   75.380179] XFS (nvme0n1): Mounting V5 Filesystem
[   75.387457] XFS (nvme0n1): Ending clean mount
[   75.388555] xfs filesystem being mounted at /mnt/blktests supports 
timestamps until 2038 (0x7fffffff)
[   91.035659] XFS (nvme0n1): Unmounting Filesystem
[   91.043334] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"


> I'll submit a proper patch, but can you run this change
> to see what command has a bio but without any data?
> -- 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 619b0d8f6e38..311f1b78a9d4 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2271,8 +2271,13 @@ static blk_status_t 
> nvme_tcp_setup_cmd_pdu(struct nvme_ns *ns,
>         req->data_len = blk_rq_nr_phys_segments(rq) ?
>                                 blk_rq_payload_bytes(rq) : 0;
>         req->curr_bio = rq->bio;
> -       if (req->curr_bio)
> +       if (req->curr_bio) {
> +               if (!req->data_len) {
> +                       pr_err("rq %d opcode %d\n", rq->tag, 
> pdu->cmd.common.opcode);
> +                       return BLK_STS_IOERR;
> +               }
>                 nvme_tcp_init_iter(req, rq_data_dir(rq));
> +       }
>
>         if (rq_data_dir(rq) == WRITE &&
>             req->data_len <= nvme_tcp_inline_data_size(queue))
> -- 
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-10  2:51             ` Yi Zhang
@ 2021-02-10  3:38               ` Keith Busch
  2021-02-10  5:03               ` Chaitanya Kulkarni
  1 sibling, 0 replies; 27+ messages in thread
From: Keith Busch @ 2021-02-10  3:38 UTC (permalink / raw)
  To: Yi Zhang
  Cc: Sagi Grimberg, linux-nvme, linux-block, axboe, Rachel Sibley,
	Chaitanya.Kulkarni, CKI Project, Ming Lei

On Wed, Feb 10, 2021 at 10:51:56AM +0800, Yi Zhang wrote:
> On 2/10/21 2:01 AM, Sagi Grimberg wrote:
> So it's nvme_admin_abort_cmd here

The opcode 8 would be abort if this were an admin command, but we can't
tell that from the nvme print. The subsequent blk_update_request log
indicates this is the IO command for a Write Zeroes operation.
 
> [   75.235059] nvme_tcp: rq 38 opcode 8
> [   75.238653] blk_update_request: I/O error, dev nvme0c0n1, sector 1048624 op 0x9:(WRITE_ZEROES) flags 0x2800800 phys_seg 0 prio class 0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-10  2:51             ` Yi Zhang
  2021-02-10  3:38               ` Keith Busch
@ 2021-02-10  5:03               ` Chaitanya Kulkarni
  2021-02-10 22:06                 ` Sagi Grimberg
  1 sibling, 1 reply; 27+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-10  5:03 UTC (permalink / raw)
  To: Yi Zhang, Sagi Grimberg, linux-nvme@lists.infradead.org,
	linux-block
  Cc: axboe@kernel.dk, Rachel Sibley, CKI Project, Ming Lei

On 2/9/21 18:52, Yi Zhang wrote:
> So it's nvme_admin_abort_cmd here
>
> [   74.017450] run blktests nvme/012 at 2021-02-09 21:41:55
> [   74.111311] loop: module loaded
> [   74.125717] loop0: detected capacity change from 2097152 to 0
> [   74.141026] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [   74.149395] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [   74.158298] nvmet: creating controller 1 for subsystem 
> blktests-subsystem-1 for NQN 
> nqn.2014-08.org.nvmexpress:uuid:41131d88-02ca-4ccc-87b3-6ca3f28b13a4.
> [   74.158742] nvme nvme0: creating 48 I/O queues.
> [   74.163391] nvme nvme0: mapped 48/0/0 default/read/poll queues.
> [   74.184623] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
> 127.0.0.1:4420
> [   75.235059] nvme_tcp: rq 38 opcode 8
> [   75.238653] blk_update_request: I/O error, dev nvme0c0n1, sector 
> 1048624 op 0x9:(WRITE_ZEROES) flags 0x2800800 phys_seg 0 prio class 0
> [   75.380179] XFS (nvme0n1): Mounting V5 Filesystem
> [   75.387457] XFS (nvme0n1): Ending clean mount
> [   75.388555] xfs filesystem being mounted at /mnt/blktests supports 
> timestamps until 2038 (0x7fffffff)
> [   91.035659] XFS (nvme0n1): Unmounting Filesystem
> [   91.043334] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"

But write-zeores is also data less command and should not fail.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-10  5:03               ` Chaitanya Kulkarni
@ 2021-02-10 22:06                 ` Sagi Grimberg
  2021-02-10 22:15                   ` Chaitanya Kulkarni
  0 siblings, 1 reply; 27+ messages in thread
From: Sagi Grimberg @ 2021-02-10 22:06 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Yi Zhang, linux-nvme@lists.infradead.org,
	linux-block
  Cc: axboe@kernel.dk, Rachel Sibley, CKI Project, Ming Lei


>> So it's nvme_admin_abort_cmd here
>>
>> [   74.017450] run blktests nvme/012 at 2021-02-09 21:41:55
>> [   74.111311] loop: module loaded
>> [   74.125717] loop0: detected capacity change from 2097152 to 0
>> [   74.141026] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>> [   74.149395] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>> [   74.158298] nvmet: creating controller 1 for subsystem
>> blktests-subsystem-1 for NQN
>> nqn.2014-08.org.nvmexpress:uuid:41131d88-02ca-4ccc-87b3-6ca3f28b13a4.
>> [   74.158742] nvme nvme0: creating 48 I/O queues.
>> [   74.163391] nvme nvme0: mapped 48/0/0 default/read/poll queues.
>> [   74.184623] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>> 127.0.0.1:4420
>> [   75.235059] nvme_tcp: rq 38 opcode 8
>> [   75.238653] blk_update_request: I/O error, dev nvme0c0n1, sector
>> 1048624 op 0x9:(WRITE_ZEROES) flags 0x2800800 phys_seg 0 prio class 0
>> [   75.380179] XFS (nvme0n1): Mounting V5 Filesystem
>> [   75.387457] XFS (nvme0n1): Ending clean mount
>> [   75.388555] xfs filesystem being mounted at /mnt/blktests supports
>> timestamps until 2038 (0x7fffffff)
>> [   91.035659] XFS (nvme0n1): Unmounting Filesystem
>> [   91.043334] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
> 
> But write-zeores is also data less command and should not fail.

And it has a bio, which means that nvme-tcp tries to init an iter
for it when it shouldn't. So the actual offending commit is:
cb9b870fba3e, which cleaned up how the iter is initialized but 
introduced this issue.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
  2021-02-10 22:06                 ` Sagi Grimberg
@ 2021-02-10 22:15                   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 27+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-10 22:15 UTC (permalink / raw)
  To: Sagi Grimberg, Yi Zhang, linux-nvme@lists.infradead.org,
	linux-block
  Cc: axboe@kernel.dk, Rachel Sibley, CKI Project, Ming Lei

On 2/10/21 2:06 PM, Sagi Grimberg wrote:
>>> [   75.235059] nvme_tcp: rq 38 opcode 8
>>> [   75.238653] blk_update_request: I/O error, dev nvme0c0n1, sector
>>> 1048624 op 0x9:(WRITE_ZEROES) flags 0x2800800 phys_seg 0 prio class 0
>>> [   75.380179] XFS (nvme0n1): Mounting V5 Filesystem
>>> [   75.387457] XFS (nvme0n1): Ending clean mount
>>> [   75.388555] xfs filesystem being mounted at /mnt/blktests supports
>>> timestamps until 2038 (0x7fffffff)
>>> [   91.035659] XFS (nvme0n1): Unmounting Filesystem
>>> [   91.043334] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>> But write-zeores is also data less command and should not fail.
> And it has a bio, which means that nvme-tcp tries to init an iter
> for it when it shouldn't. So the actual offending commit is:
> cb9b870fba3e, which cleaned up how the iter is initialized but 
> introduced this issue.
>
Looking at cb9b870fba3e, that should work, but really surprised that it
never
got triggered even once in my testing since the issue is reported.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-02-10 22:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cki.F3E139361A.EN5MUSJKK9@redhat.com>
2021-02-06  3:08 ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Yi Zhang
2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
2021-02-07  5:25     ` Chaitanya Kulkarni
2021-02-08  6:48       ` Yi Zhang
2021-02-07  5:48     ` Chaitanya Kulkarni
2021-02-07  5:58     ` Chaitanya Kulkarni
2021-02-07  7:14   ` kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] Chaitanya Kulkarni
2021-02-08  9:53     ` Sagi Grimberg
2021-02-08  9:46   ` Sagi Grimberg
     [not found]     ` <5848858e-239d-acb2-fa24-c371a3360557@redhat.com>
2021-02-08 17:54       ` Sagi Grimberg
2021-02-08 18:42         ` Sagi Grimberg
2021-02-09  4:21           ` Ming Lei
2021-02-09  7:21             ` Sagi Grimberg
2021-02-09  7:50               ` Ming Lei
2021-02-09  8:34                 ` Chaitanya Kulkarni
2021-02-09 10:09                   ` Sagi Grimberg
2021-02-09 10:07                 ` Sagi Grimberg
2021-02-09 10:33                   ` Ming Lei
2021-02-09 10:36                     ` Sagi Grimberg
2021-02-09 10:25       ` Sagi Grimberg
2021-02-09 12:57         ` Yi Zhang
2021-02-09 18:01           ` Sagi Grimberg
2021-02-10  2:51             ` Yi Zhang
2021-02-10  3:38               ` Keith Busch
2021-02-10  5:03               ` Chaitanya Kulkarni
2021-02-10 22:06                 ` Sagi Grimberg
2021-02-10 22:15                   ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).