Hi,
I am experiencing a repeatable (potential deadlock) issue when
hotplugging NVMe devices on kernel 5.15. The problem occurs on a Ubuntu
22.04.4 server environment when Intel IOMMU support is disabled in the
kernel configuration (CONFIG_INTEL_IOMMU is not set). This issue is also
reproducible on the native kernel 5.15.161 without any Ubuntu patches.
System Information:
Distribution: Ubuntu 22.04.4 server
Kernel Version: Ubuntu 5.15.152 (also repeated on native 5.15.161)
Hardware: NVME Micron SSDPE2KX020T8, Intel(R) Xeon(R) Gold 5222
Description of the Issue:
When hotplugging an NVMe device, the system experiences a potential
deadlock. The issue is 100% repeatable. The following log entries
indicate where the issue occurs (also https://pastebin.com/4CKCqhbJ,
https://pastebin.com/zadENEPt):
Jul 5 10:05:58 tarox1 kernel: [ 1151.450374] pci 0000:8c:00.0: BAR 0:
assigned [mem 0xd3a10000-0xd3a13fff 64bit]
Jul 5 10:05:58 tarox1 kernel: [ 1151.450470] pcieport 0000:88:03.0: PCI
bridge to [bus 8c]
Jul 5 10:05:58 tarox1 kernel: [ 1151.450525] pcieport 0000:88:03.0:
bridge window [mem 0xd3a00000-0xd3afffff]
Jul 5 10:05:58 tarox1 kernel: [ 1151.451213] nvme nvme11: pci function
0000:8c:00.0
Jul 5 10:05:58 tarox1 kernel: [ 1151.451294] nvme 0000:8c:00.0:
enabling device (0100 -> 0102)
Jul 5 10:06:00 tarox1 kernel: [ 1152.999223] ------------[ cut here
]------------
Jul 5 10:06:00 tarox1 kernel: [ 1152.999227] WARNING: CPU: 13 PID: 381
at arch/x86/kernel/apic/apic.c:2540 __irq_msi_compose_msg+0xe6/0xf0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999241] Modules linked in:
ipmi_ssif intel_rapl_msr intel_rapl_common isst_if_common binfmt_misc
skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
rapl intel_cstate irdma ice ib_uverbs ib_core input_leds joydev
switchtec acpi_ipmi ipmi_si mei_me ipmi_devintf ioatdma mei
intel_pch_thermal ipmi_msghandler acpi_power_meter acpi_pad mac_hid
sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr
efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
dax_pmem_compat nd_pmem device_dax nd_btt dax_pmem_core ast i2c_algo_bit
drm_vram_helper drm_ttm_helper hid_generic ttm usbhid drm_kms_helper
syscopyarea uas crct10dif_pclmul hid sysfillrect crc32_pclmul sysimgblt
usb_storage fb_sys_fops ghash_clmulni_intel sha256_ssse3 cec ixgbe
sha1_ssse3 xfrm_algo rc_core mpt3sas aesni_intel dca raid_class nvme
crypto_simd mdio
Jul 5 10:06:00 tarox1 kernel: [ 1152.999364] cryptd scsi_transport_sas
bnxt_en i40e drm i2c_i801 nvme_core xhci_pci lpc_ich ahci i2c_smbus
xhci_pci_renesas libahci wmi nfit
Jul 5 10:06:00 tarox1 kernel: [ 1152.999387] CPU: 13 PID: 381 Comm:
kworker/u34:10 Not tainted 5.15.152 #2
Jul 5 10:06:00 tarox1 kernel: [ 1152.999393] Hardware name: TAROX ParX
R2242i G6 Server/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218
07/07/2020
Jul 5 10:06:00 tarox1 kernel: [ 1152.999396] Workqueue: nvme-reset-wq
nvme_reset_work [nvme]
Jul 5 10:06:00 tarox1 kernel: [ 1152.999408] RIP:
0010:__irq_msi_compose_msg+0xe6/0xf0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999413] Code: 0f b7 13 c1 e8 08 83
e0 7f c1 e0 05 66 81 e2 1f f0 09 d0 66 89 03 5b 41 5c 41 5d 41 5e 5d c3
cc cc cc cc 3d ff 00 00 00 76 a6 <0f> 0b eb a2 66 0f 1f 44 00 00 0f 1f
44 00 00 8b 05 a1 98 b6 02 85
Jul 5 10:06:00 tarox1 kernel: [ 1152.999417] RSP: 0018:ffffaa1a8e8fbb58
EFLAGS: 00010016
Jul 5 10:06:00 tarox1 kernel: [ 1152.999422] RAX: 0000000000020010 RBX:
ffffaa1a8e8fbba8 RCX: 00000000fffffff0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999426] RDX: 0000000000010000 RSI:
ffffaa1a8e8fbba8 RDI: ffff910eaa3aad80
Jul 5 10:06:00 tarox1 kernel: [ 1152.999429] RBP: ffffaa1a8e8fbb78 R08:
ffff910ea7467978 R09: 0000000000000001
Jul 5 10:06:00 tarox1 kernel: [ 1152.999431] R10: 000000000000002c R11:
fffff80000000000 R12: 0000000000000000
Jul 5 10:06:00 tarox1 kernel: [ 1152.999434] R13: ffff910eaa3aad80 R14:
0000000000000000 R15: ffff910ebe436200
Jul 5 10:06:00 tarox1 kernel: [ 1152.999437] FS:
0000000000000000(0000) GS:ffff9125e1540000(0000) knlGS:0000000000000000
Jul 5 10:06:00 tarox1 kernel: [ 1152.999440] CS: 0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jul 5 10:06:00 tarox1 kernel: [ 1152.999443] CR2: 0000562482ffe480 CR3:
000000d6e8010001 CR4: 00000000007706e0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999447] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jul 5 10:06:00 tarox1 kernel: [ 1152.999449] DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400
Jul 5 10:06:00 tarox1 kernel: [ 1152.999451] PKRU: 55555554
Jul 5 10:06:00 tarox1 kernel: [ 1152.999453] Call Trace:
Jul 5 10:06:00 tarox1 kernel: [ 1152.999456]
Jul 5 10:06:00 tarox1 kernel: [ 1152.999459] ? show_regs.cold+0x1a/0x1f
Jul 5 10:06:00 tarox1 kernel: [ 1152.999471] ?
__irq_msi_compose_msg+0xe6/0xf0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999474] ? __warn+0x8c/0x100
Jul 5 10:06:00 tarox1 kernel: [ 1152.999483] ?
__irq_msi_compose_msg+0xe6/0xf0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999487] ? report_bug+0xa4/0xd0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999497] ? handle_bug+0x39/0x90
Jul 5 10:06:00 tarox1 kernel: [ 1152.999503] ? exc_invalid_op+0x19/0x70
Jul 5 10:06:00 tarox1 kernel: [ 1152.999507] ?
asm_exc_invalid_op+0x1b/0x20
Jul 5 10:06:00 tarox1 kernel: [ 1152.999516] ?
__irq_msi_compose_msg+0xe6/0xf0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999519]
x86_vector_msi_compose_msg+0x25/0x30
Jul 5 10:06:00 tarox1 kernel: [ 1152.999525]
irq_chip_compose_msi_msg+0x2f/0x50
Jul 5 10:06:00 tarox1 kernel: [ 1152.999532] msi_domain_activate+0x45/0xa0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999540]
__irq_domain_activate_irq+0x55/0x90
Jul 5 10:06:00 tarox1 kernel: [ 1152.999546]
irq_domain_activate_irq+0x2d/0x50
Jul 5 10:06:00 tarox1 kernel: [ 1152.999551] irq_activate+0x29/0x30
Jul 5 10:06:00 tarox1 kernel: [ 1152.999554] __setup_irq+0x2c3/0x780
Jul 5 10:06:00 tarox1 kernel: [ 1152.999561] ?
kmem_cache_alloc_trace+0x19e/0x2e0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999568]
request_threaded_irq+0x112/0x180
Jul 5 10:06:00 tarox1 kernel: [ 1152.999575] ?
nvme_poll_irqdisable+0x70/0x70 [nvme]
Jul 5 10:06:00 tarox1 kernel: [ 1152.999582] pci_request_irq+0xa1/0xf0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999591] ?
nvme_wait_ready+0x7e/0xf0 [nvme_core]
Jul 5 10:06:00 tarox1 kernel: [ 1152.999612]
queue_request_irq+0x74/0x80 [nvme]
Jul 5 10:06:00 tarox1 kernel: [ 1152.999619]
nvme_reset_work+0x355/0x1270 [nvme]
Jul 5 10:06:00 tarox1 kernel: [ 1152.999627] ? try_to_wake_up+0x200/0x5a0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999637] process_one_work+0x228/0x3d0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999644] worker_thread+0x53/0x420
Jul 5 10:06:00 tarox1 kernel: [ 1152.999648] ?
process_one_work+0x3d0/0x3d0
Jul 5 10:06:00 tarox1 kernel: [ 1152.999652] kthread+0x127/0x150
Jul 5 10:06:00 tarox1 kernel: [ 1152.999658] ?
set_kthread_struct+0x50/0x50
Jul 5 10:06:00 tarox1 kernel: [ 1152.999665] ret_from_fork+0x1f/0x30
Jul 5 10:06:00 tarox1 kernel: [ 1152.999678]
Jul 5 10:06:00 tarox1 kernel: [ 1152.999680] ---[ end trace
8fcd74d7a264655a ]---
Jul 5 10:06:00 tarox1 kernel: [ 1153.000136] DMAR: DRHD: handling fault
status reg 2
Jul 5 10:06:00 tarox1 kernel: [ 1153.000177] DMAR: [INTR-REMAP] Request
device [8b:00.0] fault index 0x8800 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:06:00 tarox1 kernel: [ 1153.025748] DMAR: DRHD: handling fault
status reg 102
Jul 5 10:06:00 tarox1 kernel: [ 1153.025790] DMAR: [INTR-REMAP] Request
device [8a:00.0] fault index 0x8200 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:06:00 tarox1 kernel: [ 1153.437571] DMAR: DRHD: handling fault
status reg 202
Jul 5 10:06:00 tarox1 kernel: [ 1153.437608] DMAR: [INTR-REMAP] Request
device [8d:00.0] fault index 0x9000 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:06:01 tarox1 kernel: [ 1153.903652] DMAR: DRHD: handling fault
status reg 302
Jul 5 10:06:03 tarox1 kernel: [ 1155.770154] pcieport 0000:88:05.0:
pciehp: Slot(6-2): Card present
Jul 5 10:06:04 tarox1 kernel: [ 1157.099007] pci 0000:8e:00.0:
[8086:0a54] type 00 class 0x010802
Jul 5 10:06:04 tarox1 kernel: [ 1157.099230] pci 0000:8e:00.0: reg
0x10: [mem 0x00000000-0x00003fff 64bit]
Jul 5 10:06:04 tarox1 kernel: [ 1157.100466] pci 0000:8e:00.0: reg
0x30: [mem 0x00000000-0x0000ffff pref]
Jul 5 10:06:04 tarox1 kernel: [ 1157.100547] pci 0000:8e:00.0: Max
Payload Size set to 256 (was 128, max 512)
Jul 5 10:06:04 tarox1 kernel: [ 1157.100574] pci 0000:8e:00.0: enabling
Extended Tags
Jul 5 10:06:04 tarox1 kernel: [ 1157.106549] pcieport 0000:88:05.0:
bridge window [io 0x1000-0x0fff] to [bus 8e] add_size 1000
Jul 5 10:06:04 tarox1 kernel: [ 1157.106556] pcieport 0000:88:05.0:
bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 8e]
add_size 200000 add_align 100000
Jul 5 10:06:04 tarox1 kernel: [ 1157.106567] pcieport 0000:88:05.0: BAR
15: no space for [mem size 0x00200000 64bit pref]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106571] pcieport 0000:88:05.0: BAR
15: failed to assign [mem size 0x00200000 64bit pref]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106575] pcieport 0000:88:05.0: BAR
13: no space for [io size 0x1000]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106578] pcieport 0000:88:05.0: BAR
13: failed to assign [io size 0x1000]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106583] pcieport 0000:88:05.0: BAR
15: no space for [mem size 0x00200000 64bit pref]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106586] pcieport 0000:88:05.0: BAR
15: failed to assign [mem size 0x00200000 64bit pref]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106589] pcieport 0000:88:05.0: BAR
13: no space for [io size 0x1000]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106592] pcieport 0000:88:05.0: BAR
13: failed to assign [io size 0x1000]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106597] pci 0000:8e:00.0: BAR 6:
assigned [mem 0xd3800000-0xd380ffff pref]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106603] pci 0000:8e:00.0: BAR 0:
assigned [mem 0xd3810000-0xd3813fff 64bit]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106709] pcieport 0000:88:05.0: PCI
bridge to [bus 8e]
Jul 5 10:06:04 tarox1 kernel: [ 1157.106765] pcieport 0000:88:05.0:
bridge window [mem 0xd3800000-0xd38fffff]
Jul 5 10:06:04 tarox1 kernel: [ 1157.107303] nvme nvme12: pci function
0000:8e:00.0
Jul 5 10:06:04 tarox1 kernel: [ 1157.107365] nvme 0000:8e:00.0:
enabling device (0100 -> 0102)
Jul 5 10:06:06 tarox1 kernel: [ 1159.456731] dmar_fault: 2 callbacks
suppressed
Jul 5 10:06:06 tarox1 kernel: [ 1159.456736] DMAR: DRHD: handling fault
status reg 402
Jul 5 10:06:06 tarox1 kernel: [ 1159.456771] DMAR: [INTR-REMAP] Request
device [8e:00.0] fault index 0x8000 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:07:01 tarox1 kernel: [ 1214.063901] nvme nvme8: I/O 24 QID 0
timeout, completion polled
Jul 5 10:07:01 tarox1 kernel: [ 1214.063903] nvme nvme11: I/O 20 QID 0
timeout, completion polled
Jul 5 10:07:01 tarox1 kernel: [ 1214.063907] nvme nvme10: I/O 16 QID 0
timeout, completion polled
Jul 5 10:07:01 tarox1 kernel: [ 1214.063908] nvme nvme9: I/O 0 QID 0
timeout, completion polled
Jul 5 10:07:01 tarox1 kernel: [ 1214.063964] DMAR: DRHD: handling fault
status reg 502
Jul 5 10:07:01 tarox1 kernel: [ 1214.063996] DMAR: [INTR-REMAP] Request
device [8d:00.0] fault index 0x9000 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:07:01 tarox1 kernel: [ 1214.064045] DMAR: [INTR-REMAP] Request
device [8b:00.0] fault index 0x8800 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:07:01 tarox1 kernel: [ 1214.064092] DMAR: [INTR-REMAP] Request
device [8a:00.0] fault index 0x8200 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:07:01 tarox1 kernel: [ 1214.064735] DMAR: [INTR-REMAP] Request
device [8c:00.0] fault index 0x8400 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:07:07 tarox1 kernel: [ 1220.207204] nvme nvme12: I/O 16 QID 0
timeout, completion polled
Jul 5 10:07:07 tarox1 kernel: [ 1220.207262] DMAR: DRHD: handling fault
status reg 102
Jul 5 10:07:07 tarox1 kernel: [ 1220.208237] DMAR: [INTR-REMAP] Request
device [8e:00.0] fault index 0x8000 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:07:51 tarox1 kernel: [ 1264.288236] RPC: Registered named UNIX
socket transport module.
Jul 5 10:07:51 tarox1 kernel: [ 1264.288240] RPC: Registered udp
transport module.
Jul 5 10:07:51 tarox1 kernel: [ 1264.288241] RPC: Registered tcp
transport module.
Jul 5 10:07:51 tarox1 kernel: [ 1264.288242] RPC: Registered tcp
NFSv4.1 backchannel transport module.
Jul 5 10:08:03 tarox1 kernel: [ 1275.496971] nvme nvme10: I/O 0 QID 0
timeout, completion polled
Jul 5 10:08:03 tarox1 kernel: [ 1275.496991] nvme nvme9: I/O 24 QID 0
timeout, completion polled
Jul 5 10:08:03 tarox1 kernel: [ 1275.497046] nvme nvme8: I/O 20 QID 0
timeout, completion polled
Jul 5 10:08:03 tarox1 kernel: [ 1275.497062] nvme nvme11: I/O 24 QID 0
timeout, completion polled
Jul 5 10:08:03 tarox1 kernel: [ 1275.497185] DMAR: DRHD: handling fault
status reg 202
Jul 5 10:08:03 tarox1 kernel: [ 1275.497973] DMAR: [INTR-REMAP] Request
device [8d:00.0] fault index 0x9000 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:08:03 tarox1 kernel: [ 1275.499108] DMAR: [INTR-REMAP] Request
device [8a:00.0] fault index 0x8200 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:08:03 tarox1 kernel: [ 1275.500145] DMAR: [INTR-REMAP] Request
device [8b:00.0] fault index 0x8800 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:08:03 tarox1 kernel: [ 1275.502092] DMAR: [INTR-REMAP] Request
device [8c:00.0] fault index 0x8400 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:08:09 tarox1 kernel: [ 1281.640331] nvme nvme12: I/O 17 QID 0
timeout, completion polled
Jul 5 10:08:09 tarox1 kernel: [ 1281.640524] DMAR: DRHD: handling fault
status reg 602
Jul 5 10:08:09 tarox1 kernel: [ 1281.641711] DMAR: [INTR-REMAP] Request
device [8e:00.0] fault index 0x8000 [fault reason 0x25] Blocked a
compatibility format interrupt request
Jul 5 10:08:58 tarox1 kernel: [ 1330.787095] INFO: task
kworker/u34:11:382 blocked for more than 120 seconds.
Jul 5 10:08:58 tarox1 kernel: [ 1330.788331] Tainted: G W
5.15.152 #2
Jul 5 10:08:58 tarox1 kernel: [ 1330.789536] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 5 10:08:58 tarox1 kernel: [ 1330.790748] task:kworker/u34:11
state:D stack: 0 pid: 382 ppid: 2 flags:0x00004000
Jul 5 10:08:58 tarox1 kernel: [ 1330.790757] Workqueue: events_unbound
async_run_entry_fn
Jul 5 10:08:58 tarox1 kernel: [ 1330.790768] Call Trace:
Jul 5 10:08:58 tarox1 kernel: [ 1330.790770]
Jul 5 10:08:58 tarox1 kernel: [ 1330.790775] __schedule+0x351/0x1740
Jul 5 10:08:58 tarox1 kernel: [ 1330.790783] ?
default_wake_function+0x1a/0x40
Jul 5 10:08:58 tarox1 kernel: [ 1330.790793] ? pollwake+0x72/0x90
Jul 5 10:08:58 tarox1 kernel: [ 1330.790804] ? wake_up_q+0x90/0x90
Jul 5 10:08:58 tarox1 kernel: [ 1330.790811] schedule+0x69/0x110
Jul 5 10:08:58 tarox1 kernel: [ 1330.790815] schedule_timeout+0x105/0x140
Jul 5 10:08:58 tarox1 kernel: [ 1330.790821] ? cpumask_next+0x23/0x30
Jul 5 10:08:58 tarox1 kernel: [ 1330.790827] __wait_for_common+0xab/0x150
Jul 5 10:08:58 tarox1 kernel: [ 1330.790831] ?
usleep_range_state+0x90/0x90
Jul 5 10:08:58 tarox1 kernel: [ 1330.790836] wait_for_completion+0x24/0x30
Jul 5 10:08:58 tarox1 kernel: [ 1330.790840]
__flush_work.isra.0+0x175/0x280
Jul 5 10:08:58 tarox1 kernel: [ 1330.790846] ?
worker_detach_from_pool+0xc0/0xc0
Jul 5 10:08:58 tarox1 kernel: [ 1330.790858] flush_work+0xe/0x20
Jul 5 10:08:58 tarox1 kernel: [ 1330.790863]
nvme_async_probe+0x19/0x40 [nvme]
Jul 5 10:08:58 tarox1 kernel: [ 1330.790873] async_run_entry_fn+0x30/0x120
Jul 5 10:08:58 tarox1 kernel: [ 1330.790878] process_one_work+0x228/0x3d0
Jul 5 10:08:58 tarox1 kernel: [ 1330.790883] worker_thread+0x53/0x420
Jul 5 10:08:58 tarox1 kernel: [ 1330.790887] ?
process_one_work+0x3d0/0x3d0
Jul 5 10:08:58 tarox1 kernel: [ 1330.790891] kthread+0x127/0x150
Jul 5 10:08:58 tarox1 kernel: [ 1330.790897] ?
set_kthread_struct+0x50/0x50
Jul 5 10:08:58 tarox1 kernel: [ 1330.790905] ret_from_fork+0x1f/0x30
Jul 5 10:08:58 tarox1 kernel: [ 1330.790915]