kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]

* kernel null pointer at nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
       [not found] <cki.F3E139361A.EN5MUSJKK9@redhat.com>
@ 2021-02-06  3:08 ` Yi Zhang
  2021-02-07  4:50   ` kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012 Yi Zhang
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Yi Zhang @ 2021-02-06  3:08 UTC (permalink / raw)
  To: linux-nvme, linux-block; +Cc: axboe, Rachel Sibley, CKI Project, Sagi Grimberg

Hello

We found this kernel NULL pointer issue with latest linux-block/for-next and it's 100% reproduced, let me know if you need more info/testing, thanks 

Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next

Reproducer: blktests nvme-tcp/012

[  124.458121] run blktests nvme/012 at 2021-02-05 21:53:34 
[  125.525568] BUG: kernel NULL pointer dereference, address: 0000000000000008 
[  125.532524] #PF: supervisor read access in kernel mode 
[  125.537665] #PF: error_code(0x0000) - not-present page 
[  125.542803] PGD 0 P4D 0  
[  125.545343] Oops: 0000 [#1] SMP NOPTI 
[  125.549009] CPU: 15 PID: 12069 Comm: kworker/15:2H Tainted: G S        I       5.11.0-rc6+ #1 
[  125.557528] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 
[  125.565093] Workqueue: kblockd blk_mq_run_work_fn 
[  125.569797] RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] 
[  125.575544] Code: 8b 75 68 44 8b 45 28 44 8b 7d 30 49 89 d4 48 c1 e2 04 4c 01 f2 45 89 fb 44 89 c7 85 ff 74 4d 44 89 e0 44 8b 55 10 48 c1 e0 04 <41> 8b 5c 06 08 45 0f b6 ca 89 d8 44 29 d8 39 f8 0f 47 c7 41 83 e9 
[  125.594290] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246 
[  125.599515] RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000 
[  125.606648] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000 
[  125.613781] RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000 
[  125.620914] R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000 
[  125.628045] R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000 
[  125.635178] FS:  0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000 
[  125.643264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  125.649009] CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0 
[  125.656142] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[  125.663274] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[  125.670407] PKRU: 55555554 
[  125.673119] Call Trace: 
[  125.675575]  nvme_tcp_queue_rq+0xef/0x330 [nvme_tcp] 
[  125.680537]  blk_mq_dispatch_rq_list+0x11c/0x7c0 
[  125.685157]  ? blk_mq_flush_busy_ctxs+0xf6/0x110 
[  125.689775]  __blk_mq_sched_dispatch_requests+0x12b/0x170 
[  125.695175]  blk_mq_sched_dispatch_requests+0x30/0x60 
[  125.700227]  __blk_mq_run_hw_queue+0x2b/0x60 
[  125.704500]  process_one_work+0x1cb/0x360 
[  125.708513]  ? process_one_work+0x360/0x360 
[  125.712699]  worker_thread+0x30/0x370 
[  125.716365]  ? process_one_work+0x360/0x360 
[  125.720550]  kthread+0x116/0x130 
[  125.723782]  ? kthread_park+0x80/0x80 
[  125.727448]  ret_from_fork+0x1f/0x30 
[  125.731028] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp nvmet loop nvme nvme_core rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif mgag200 i2c_algo_bit drm_kms_helper irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support syscopyarea crc32_pclmul sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops dcdbas rapl drm intel_cstate acpi_ipmi ipmi_si mei_me dell_smbios intel_uncore i2c_i801 mei dax_pmem_compat wmi_bmof dell_wmi_descriptor ipmi_devintf device_dax intel_pch_thermal pcspkr i2c_smbus lpc_ich ipmi_msghandler dax_pmem_core acpi_power_meter ip_tables xfs libcrc32c nd_pmem nd_btt sd_mod t10_pi sg ahci libahci libata tg3 megaraid_sas crc32c_intel nfit wmi libnvdimm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvmet] 
[  125.806140] CR2: 0000000000000008 
[  125.809467] ---[ end trace 312795dd33fab339 ]--- 
[  125.824717] RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp] 
[  125.830465] Code: 8b 75 68 44 8b 45 28 44 8b 7d 30 49 89 d4 48 c1 e2 04 4c 01 f2 45 89 fb 44 89 c7 85 ff 74 4d 44 89 e0 44 8b 55 10 48 c1 e0 04 <41> 8b 5c 06 08 45 0f b6 ca 89 d8 44 29 d8 39 f8 0f 47 c7 41 83 e9 
[  125.849212] RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246 
[  125.854436] RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000 
[  125.861560] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000 
[  125.868692] RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000 
[  125.875817] R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000 
[  125.882948] R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000 
[  125.890072] FS:  0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000 
[  125.898158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  125.903897] CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0 
[  125.911029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[  125.918160] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[  125.925292] PKRU: 55555554 
[  125.927998] Kernel panic - not syncing: Fatal exception 
[  126.309099] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 
[  126.330633] ---[ end Kernel panic - not syncing: Fatal exception ]--- 

Best Regards,
  Yi Zhang

----- Original Message -----
From: "CKI Project" <cki-project@redhat.com>
To: axboe@kernel.dk
Cc: "Yi Zhang" <yizhan@redhat.com>
Sent: Saturday, February 6, 2021 7:36:43 AM
Subject: ✅ PASS: Test report for kernel 5.11.0-rc6 (block)

Hello,

We ran automated tests on a recent commit from this kernel tree:

       Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
            Commit: 11f8b6fd0db9 - Merge branch 'for-5.12/io_uring' into for-next

The results of these automated tests are provided below.

    Overall result: PASSED
             Merge: OK
           Compile: OK
             Tests: OK

All kernel binaries, config files, and logs are available for download here:

  https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/02/05/623209

Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.

        ,-.   ,-.
       ( C ) ( K )  Continuous
        `-',-.`-'   Kernel
          ( I )     Integration
           `-'
______________________________________________________________________________

Compile testing
---------------

We compiled the kernel for 4 architectures:

    aarch64:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg

    ppc64le:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg

    s390x:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg

    x86_64:
      make options: make  -j30 INSTALL_MOD_STRIP=1 targz-pkg

Hardware testing
----------------
We booted each kernel and ran the following tests:

  aarch64:
    Host 1:
       ✅ Boot test
       ✅ ACPI table test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       ✅ storage: SCSI VPD
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

    Host 2:
       ✅ Boot test
       ✅ storage: software RAID testing
       🚧 ✅ xfstests - ext4
       🚧 ✅ xfstests - xfs
       🚧 ✅ xfstests - btrfs
       🚧 ✅ Storage blktests
       🚧 ✅ Storage block - filesystem fio test
       🚧 ✅ Storage block - queue scheduler test
       🚧 💥 Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
       🚧 ⚡⚡⚡ stress: stress-ng

  ppc64le:
    Host 1:

       ⚡ Internal infrastructure issues prevented one or more tests (marked
       with ⚡⚡⚡) from running on this architecture.
       This is not the fault of the kernel that was tested.

       ⚡⚡⚡ Boot test
       ⚡⚡⚡ LTP
       ⚡⚡⚡ Loopdev Sanity
       ⚡⚡⚡ Memory: fork_mem
       ⚡⚡⚡ Memory function: memfd_create
       ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
       🚧 ⚡⚡⚡ CIFS Connectathon
       🚧 ⚡⚡⚡ POSIX pjd-fstest suites
       🚧 ⚡⚡⚡ Ethernet drivers sanity

    Host 2:
       ✅ Boot test
       ✅ storage: software RAID testing
       🚧 ✅ xfstests - ext4
       🚧 ✅ xfstests - xfs
       🚧 ✅ xfstests - btrfs
       🚧 ✅ Storage blktests
       🚧 ✅ Storage block - filesystem fio test
       🚧 ✅ Storage block - queue scheduler test
       🚧 💥 Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test

    Host 3:
       ✅ Boot test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

  s390x:
    Host 1:

       ⚡ Internal infrastructure issues prevented one or more tests (marked
       with ⚡⚡⚡) from running on this architecture.
       This is not the fault of the kernel that was tested.

       ✅ Boot test
       🚧 ✅ Storage blktests
       🚧 ⚡⚡⚡ Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
       🚧 ⚡⚡⚡ stress: stress-ng

    Host 2:
       ✅ Boot test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

  x86_64:
    Host 1:
       ✅ Boot test
       ✅ Storage SAN device stress - qedf driver

    Host 2:
       ✅ Boot test
       ✅ Storage SAN device stress - qla2xxx driver

    Host 3:
       ✅ Boot test
       ✅ storage: software RAID testing
       🚧 ✅ xfstests - ext4
       🚧 ✅ xfstests - xfs
       🚧 ✅ xfstests - btrfs
       🚧 ✅ xfstests - nfsv4.2
       🚧 ✅ xfstests - cifsv3.11
       🚧 ✅ Storage blktests
       🚧 ✅ Storage block - filesystem fio test
       🚧 ✅ Storage block - queue scheduler test
       🚧 💥 Storage nvme - tcp
       🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
       🚧 ⚡⚡⚡ stress: stress-ng

    Host 4:
       ✅ Boot test
       ✅ Storage SAN device stress - mpt3sas_gen1

    Host 5:
       ✅ Boot test
       ✅ ACPI table test
       ✅ LTP
       ✅ Loopdev Sanity
       ✅ Memory: fork_mem
       ✅ Memory function: memfd_create
       ✅ AMTU (Abstract Machine Test Utility)
       ✅ storage: SCSI VPD
       🚧 ✅ CIFS Connectathon
       🚧 ✅ POSIX pjd-fstest suites
       🚧 ✅ Ethernet drivers sanity

    Host 6:
       ✅ Boot test
       ✅ Storage SAN device stress - lpfc driver

  Test sources: https://gitlab.com/cki-project/kernel-tests
    💚 Pull requests are welcome for new tests or improvements to existing tests!

Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.

Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.

Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.

^ permalink raw reply	[flat|nested] 27+ messages in thread