* Interesting 'list _add double add' with nvme drives
@ 2019-03-06 18:48 Alex_Gagniuc
2019-03-06 18:53 ` Keith Busch
0 siblings, 1 reply; 2+ messages in thread
From: Alex_Gagniuc @ 2019-03-06 18:48 UTC (permalink / raw)
To: linux-nvme, linux-block; +Cc: keith.busch
Hi,
I'm seeing a list error when we take away, then add back a bunch of nvme
drives. It's not very easy to repro, and the one surviving log is pasted
below.
Alex
[ 111.808900] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down
[ 117.496424] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up
[ 117.508144] pciehp 0000:3c:06.0:pcie204: Slot(180): Link Up
[ 117.521525] pciehp 0000:b0:05.0:pcie204: Slot(179): Link Up
[ 117.764856] pci 0000:3f:00.0: [144d:a822] type 00 class 0x010802
[ 117.764897] pci 0000:3f:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[ 117.764948] pci 0000:3f:00.0: Max Payload Size set to 256 (was 128,
max 256)
[ 117.765671] pcieport 0000:3c:06.0: bridge window [io 0x1000-0x0fff]
to [bus 3f] add_size 1000
[ 117.765679] pcieport 0000:3c:06.0: BAR 13: no space for [io size 0x1000]
[ 117.765682] pcieport 0000:3c:06.0: BAR 13: failed to assign [io size
0x1000]
[ 117.765686] pcieport 0000:3c:06.0: BAR 13: no space for [io size 0x1000]
[ 117.765689] pcieport 0000:3c:06.0: BAR 13: failed to assign [io size
0x1000]
[ 117.765696] pci 0000:3f:00.0: BAR 0: assigned [mem
0xab500000-0xab503fff 64bit]
[ 117.765710] pcieport 0000:3c:06.0: PCI bridge to [bus 3f]
[ 117.765717] pcieport 0000:3c:06.0: bridge window [mem
0xab500000-0xab5fffff]
[ 117.765723] pcieport 0000:3c:06.0: bridge window [mem
0x382000400000-0x3820005fffff 64bit pref]
[ 117.766944] nvme nvme2: pci function 0000:3f:00.0
[ 117.767060] nvme 0000:3f:00.0: enabling device (0000 -> 0002)
[ 117.780851] pci 0000:b2:00.0: [144d:a822] type 00 class 0x010802
[ 117.780889] pci 0000:b2:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[ 117.780938] pci 0000:b2:00.0: Max Payload Size set to 256 (was 128,
max 256)
[ 117.781576] pcieport 0000:b0:05.0: bridge window [io 0x1000-0x0fff]
to [bus b2] add_size 1000
[ 117.781583] pcieport 0000:b0:05.0: BAR 13: no space for [io size 0x1000]
[ 117.781586] pcieport 0000:b0:05.0: BAR 13: failed to assign [io size
0x1000]
[ 117.781590] pcieport 0000:b0:05.0: BAR 13: no space for [io size 0x1000]
[ 117.781593] pcieport 0000:b0:05.0: BAR 13: failed to assign [io size
0x1000]
[ 117.781600] pci 0000:b2:00.0: BAR 0: assigned [mem
0xe1400000-0xe1403fff 64bit]
[ 117.781613] pcieport 0000:b0:05.0: PCI bridge to [bus b2]
[ 117.781620] pcieport 0000:b0:05.0: bridge window [mem
0xe1400000-0xe14fffff]
[ 117.781626] pcieport 0000:b0:05.0: bridge window [mem
0x386000200000-0x3860003fffff 64bit pref]
[ 117.782498] nvme nvme3: pci function 0000:b2:00.0
[ 117.782530] nvme 0000:b2:00.0: enabling device (0000 -> 0002)
[ 117.800846] pci 0000:b1:00.0: [8086:0a55] type 00 class 0x010802
[ 117.800883] pci 0000:b1:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[ 117.800927] pci 0000:b1:00.0: Max Payload Size set to 256 (was 128,
max 512)
[ 117.800932] pci 0000:b1:00.0: enabling Extended Tags
[ 117.801564] pcieport 0000:b0:04.0: bridge window [io 0x1000-0x0fff]
to [bus b1] add_size 1000
[ 117.801571] pcieport 0000:b0:04.0: BAR 13: no space for [io size 0x1000]
[ 117.801574] pcieport 0000:b0:04.0: BAR 13: failed to assign [io size
0x1000]
[ 117.801577] pcieport 0000:b0:04.0: BAR 13: no space for [io size 0x1000]
[ 117.801580] pcieport 0000:b0:04.0: BAR 13: failed to assign [io size
0x1000]
[ 117.801587] pci 0000:b1:00.0: BAR 0: assigned [mem
0xe1500000-0xe1503fff 64bit]
[ 117.801599] pcieport 0000:b0:04.0: PCI bridge to [bus b1]
[ 117.801606] pcieport 0000:b0:04.0: bridge window [mem
0xe1500000-0xe15fffff]
[ 117.801612] pcieport 0000:b0:04.0: bridge window [mem
0x386000000000-0x3860001fffff 64bit pref]
[ 117.802362] nvme nvme4: pci function 0000:b1:00.0
[ 117.802390] nvme 0000:b1:00.0: enabling device (0000 -> 0002)
[ 117.896666] pciehp 0000:b0:04.0:pcie204: Slot(178): Card not present
[ 117.896844] pciehp 0000:b0:05.0:pcie204: Slot(179): Card not present
[ 117.896944] pciehp 0000:3c:06.0:pcie204: Slot(180): Card not present
[ 120.225239] nvme nvme2: Shutdown timeout set to 10 seconds
[ 120.225299] nvme nvme3: Shutdown timeout set to 10 seconds
[ 121.336917] nvme nvme4: failed to mark controller CONNECTING
[ 121.336922] nvme nvme4: Removing after probe failure status: 0
[ 121.353534] pciehp 0000:b0:04.0:pcie204: Slot(178): Card present
[ 121.353538] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up
[ 121.368290] list_add double add: new=ffff956b64c0c658,
prev=ffff956b64c0c658, next=ffff956f6f2ddfe0.
[ 121.368310] ------------[ cut here ]------------
[ 121.368312] kernel BUG at lib/list_debug.c:31!
[ 121.372769] invalid opcode: 0000 [#1] SMP PTI
[ 121.377132] CPU: 7 PID: 628 Comm: irq/45-pciehp Not tainted 5.0.0 #216
[ 121.383662] Hardware name: Dell Inc. PowerEdge R740xd/07X9K0, BIOS
1.4.4 [Recoverable-Unmask] 03/09/2018
[ 121.393137] RIP: 0010:__list_add_valid+0x41/0x50
[ 121.397751] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01
00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 50 25 12 af e8 1d 2e c9
ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
[ 121.416495] RSP: 0018:ffffbe9708f9bbf0 EFLAGS: 00010046
[ 121.421723] RAX: 0000000000000058 RBX: ffff956f6f2ddfe0 RCX:
0000000000000000
[ 121.428854] RDX: 0000000000000000 RSI: ffff956f6f2d6908 RDI:
ffff956f6f2d6908
[ 121.435986] RBP: ffff956b64c0c600 R08: 000000000000087c R09:
0000000000000003
[ 121.443118] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff956b64c0c658
[ 121.450250] R13: 0000000000000282 R14: ffff956b64c0c658 R15:
0000000000000000
[ 121.457383] FS: 0000000000000000(0000) GS:ffff956f6f2c0000(0000)
knlGS:0000000000000000
[ 121.465468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 121.471212] CR2: 00007f77870ba068 CR3: 000000083389e004 CR4:
00000000007606e0
[ 121.478345] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 121.485477] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 121.492610] PKRU: 55555554
[ 121.495320] Call Trace:
[ 121.497780] __blk_complete_request+0x74/0x110
[ 121.502222] blk_mq_complete_request+0xb6/0x100
[ 121.506759] nvme_cancel_request+0x27/0x70 [nvme_core]
[ 121.511896] blk_mq_tagset_busy_iter+0x203/0x270
[ 121.516510] ? nvme_complete_rq+0x210/0x210 [nvme_core]
[ 121.521736] ? nvme_complete_rq+0x210/0x210 [nvme_core]
[ 121.526964] nvme_dev_disable+0xfb/0x1d0 [nvme]
[ 121.531493] nvme_remove+0x12c/0x170 [nvme]
[ 121.535681] pci_device_remove+0x3b/0xc0
[ 121.539608] device_release_driver_internal+0x183/0x240
[ 121.544834] pci_stop_bus_device+0x69/0x90
[ 121.548931] pci_stop_and_remove_bus_device+0xe/0x20
[ 121.553899] pciehp_unconfigure_device+0x84/0x140
[ 121.558608] pciehp_disable_slot+0x67/0x110
[ 121.562796] pciehp_handle_presence_or_link_change+0x25f/0x400
[ 121.568630] ? __synchronize_hardirq+0x43/0x50
[ 121.573074] pciehp_ist+0x1bb/0x1c0
[ 121.576567] ? irq_finalize_oneshot.part.43+0xe0/0xe0
[ 121.581617] irq_thread_fn+0x1f/0x60
[ 121.585198] irq_thread+0xe7/0x170
[ 121.588602] ? irq_forced_thread_fn+0x70/0x70
[ 121.592963] ? irq_thread_check_affinity+0x90/0x90
[ 121.597754] kthread+0x112/0x130
[ 121.600987] ? kthread_create_on_node+0x60/0x60
[ 121.605521] ret_from_fork+0x35/0x40
[ 121.609098] Modules linked in: xt_CHECKSUM ipt_MASQUERADE tun bridge
stp llc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack
ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat_ipv4 nf_nat devlink iptable_mangle
iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
sunrpc f2fs vfat fat intel_rapl skx_edac nfit x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm ses enclosure irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate joydev
iTCO_wdt iTCO_vendor_support ipmi_ssif dcdbas intel_uncore
intel_rapl_perf mei_me pcspkr i2c_i801 mei ioatdma lpc_ich ipmi_si
ipmi_devintf ipmi_msghandler acpi_power_meter raid1 dm_raid raid456
libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx
raid6_pq mgag200 drm_kms_helper ttm drm mpt3sas igb nvme crc32c_intel
raid_class nvme_core uas scsi_transport_sas usb_storage
[ 121.609134] dca i2c_algo_bit
[ 121.699398] ---[ end trace 8704317f268b2403 ]---
[ 121.743228] RIP: 0010:__list_add_valid+0x41/0x50
[ 121.747858] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01
00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 50 25 12 af e8 1d 2e c9
ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
[ 121.766601] RSP: 0018:ffffbe9708f9bbf0 EFLAGS: 00010046
[ 121.771827] RAX: 0000000000000058 RBX: ffff956f6f2ddfe0 RCX:
0000000000000000
[ 121.778958] RDX: 0000000000000000 RSI: ffff956f6f2d6908 RDI:
ffff956f6f2d6908
[ 121.786089] RBP: ffff956b64c0c600 R08: 000000000000087c R09:
0000000000000003
[ 121.793222] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff956b64c0c658
[ 121.800353] R13: 0000000000000282 R14: ffff956b64c0c658 R15:
0000000000000000
[ 121.807487] FS: 0000000000000000(0000) GS:ffff956f6f2c0000(0000)
knlGS:0000000000000000
[ 121.815573] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 121.821319] CR2: 00007f77870ba068 CR3: 000000083389e004 CR4:
00000000007606e0
[ 121.828448] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 121.835583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 121.842713] PKRU: 55555554
[ 121.845506] nvme nvme3: IO queues not created
[ 121.849891] nvme nvme3: failed to mark controller state 2
[ 121.855298] nvme nvme3: Removing after probe failure status: 0
[ 124.879721] md/raid1:md126: active with 1 out of 2 mirrors
[ 124.885245] md126: failed to create bitmap (-5)
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Interesting 'list _add double add' with nvme drives
2019-03-06 18:48 Interesting 'list _add double add' with nvme drives Alex_Gagniuc
@ 2019-03-06 18:53 ` Keith Busch
0 siblings, 0 replies; 2+ messages in thread
From: Keith Busch @ 2019-03-06 18:53 UTC (permalink / raw)
To: Alex_Gagniuc; +Cc: linux-nvme, linux-block, keith.busch
On Wed, Mar 06, 2019 at 06:48:28PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> Hi,
>
> I'm seeing a list error when we take away, then add back a bunch of nvme
> drives. It's not very easy to repro, and the one surviving log is pasted
> below.
This looks like a double completion coming from the busy request
iterator. I'm suspcious it's because that iterator considers
MQ_RQ_COMPLETE requests as "started". That doesn't really make much sense,
and I can't find a single user of this interface that actually wants to
see such requests in their callbacks.
I know you said it's difficult to repro, but could you see if the
following makes it go away?
---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 54535f4c4570..0ddcac44f912 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -659,7 +659,7 @@ EXPORT_SYMBOL(blk_mq_complete_request);
int blk_mq_request_started(struct request *rq)
{
- return blk_mq_rq_state(rq) != MQ_RQ_IDLE;
+ return blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT;
}
EXPORT_SYMBOL_GPL(blk_mq_request_started);
--
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-03-06 18:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-03-06 18:48 Interesting 'list _add double add' with nvme drives Alex_Gagniuc
2019-03-06 18:53 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox