* Should I submit bugs against RC kernels?
@ 2018-11-21 18:21 Alex_Gagniuc
2018-11-21 19:19 ` Keith Busch
2018-11-22 8:38 ` Ming Lei
0 siblings, 2 replies; 4+ messages in thread
From: Alex_Gagniuc @ 2018-11-21 18:21 UTC (permalink / raw)
Hi,
I'm not sure if submitting bugs against RC is a good idea. I am seeing
more issues with SURPRISE!!! removal of drives under v4.20-rc3.
Alex
APPENDIX A: Example issue
[ 1042.415286] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down
[ 1042.415346] pciehp 0000:b0:05.0:pcie204: Slot(179): Link Down
[ 1042.415431] pciehp 0000:3c:07.0:pcie204: Slot(181): Link Down
[ 1042.447734] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[ 1042.455586] PGD 0 P4D 0
[ 1042.458135] Oops: 0000 [#1] SMP PTI
[ 1042.461625] CPU: 5 PID: 622 Comm: irq/44-pciehp Not tainted
4.20.0-rc3+ #66
[ 1042.468582] Hardware name: Dell Inc. PowerEdge R740xd/07X9K0, BIOS
1.4.5 [HPX test BIOS 2] 03/30/2018
[ 1042.477801] RIP: 0010:sbitmap_any_bit_set+0xb/0x40
[ 1042.482586] Code: c8 83 c2 01 45 89 ca 4c 89 54 01 08 48 8b 4f 10 2b
74 01 08 39 57 08 77 d8 c3 0f 1f 44 00 00 8b 57 08 85 d2 74 2a 48 8b 47
10 <48> 83 38 00 75 23 83 ea 01 48 83 c0 40 48 c1 e2 06 48 01 c2 eb 0b
[ 1042.501341] RSP: 0018:ffffbb1ec8f6bc90 EFLAGS: 00010206
[ 1042.506566] RAX: 0000000000000000 RBX: ffff94409e924000 RCX:
0000000000000000
[ 1042.513696] RDX: 000000000000000a RSI: 0000000000000000 RDI:
ffff94409e9240d8
[ 1042.520839] RBP: 0000000000000001 R08: ffff9440aec00b68 R09:
ffff9440aec00ca0
[ 1042.527971] R10: 0000000000000000 R11: ffffffffb624a418 R12:
0000000000000001
[ 1042.535102] R13: ffffffffc00f60d0 R14: 0000000000000060 R15:
ffff9440a2deca80
[ 1042.542236] FS: 0000000000000000(0000) GS:ffff9444af280000(0000)
knlGS:0000000000000000
[ 1042.550318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1042.556066] CR2: 0000000000000000 CR3: 00000005f420a003 CR4:
00000000007606e0
[ 1042.563199] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1042.570329] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1042.577462] PKRU: 55555554
[ 1042.580176] Call Trace:
[ 1042.582631] blk_mq_run_hw_queue+0xdd/0x120
[ 1042.586815] blk_mq_run_hw_queues+0x3a/0x50
[ 1042.591006] nvme_kill_queues+0x26/0x50 [nvme_core]
[ 1042.595888] nvme_remove_namespaces+0xbf/0xd0 [nvme_core]
[ 1042.601295] nvme_remove+0x60/0x170 [nvme]
[ 1042.605395] pci_device_remove+0x3b/0xc0
[ 1042.609329] device_release_driver_internal+0x180/0x240
[ 1042.614555] pci_stop_bus_device+0x69/0x90
[ 1042.618652] pci_stop_and_remove_bus_device+0xe/0x20
[ 1042.623620] pciehp_unconfigure_device+0x84/0x140
[ 1042.628324] pciehp_disable_slot+0x67/0x110
[ 1042.632512] pciehp_handle_presence_or_link_change+0xd8/0x400
[ 1042.638264] ? __synchronize_hardirq+0x43/0x50
[ 1042.642709] pciehp_ist+0x1bb/0x1c0
[ 1042.646201] ? irq_forced_thread_fn+0x70/0x70
[ 1042.650559] irq_thread_fn+0x1f/0x60
[ 1042.654140] irq_thread+0xf3/0x190
[ 1042.657554] ? irq_thread_fn+0x60/0x60
[ 1042.661306] ? irq_thread_check_affinity.part.45+0x80/0x80
[ 1042.666802] kthread+0x112/0x130
[ 1042.670036] ? kthread_park+0x80/0x80
[ 1042.673709] ret_from_fork+0x35/0x40
[ 1042.677286] Modules linked in: xt_CHECKSUM ipt_MASQUERADE tun
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_nat_ipv6
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4
nf_nat nf_conntrack devlink nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle
iptable_raw iptable_security ebtable_filter ebtables ip6table_filter
ip6_tables vfat fat sunrpc intel_rapl skx_edac nfit x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore
intel_rapl_perf iTCO_wdt iTCO_vendor_support dcdbas pcc_cpufreq ses
joydev enclosure pcspkr mei_me ioatdma mei dca lpc_ich i2c_i801
ipmi_si(+) ipmi_devintf ipmi_msghandler acpi_power_meter raid1 dm_raid
raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor
async_tx raid6_pq mgag200 i2c_algo_bit drm_kms_helper mpt3sas ttm nvme
raid_class drm crc32c_intel nvme_core uas
[ 1042.677321] scsi_transport_sas usb_storage tg3
[ 1042.768888] CR2: 0000000000000000
[ 1042.772207] ---[ end trace fba458b21588ceca ]---
[ 1042.820816] RIP: 0010:sbitmap_any_bit_set+0xb/0x40
[ 1042.825613] Code: c8 83 c2 01 45 89 ca 4c 89 54 01 08 48 8b 4f 10 2b
74 01 08 39 57 08 77 d8 c3 0f 1f 44 00 00 8b 57 08 85 d2 74 2a 48 8b 47
10 <48> 83 38 00 75 23 83 ea 01 48 83 c0 40 48 c1 e2 06 48 01 c2 eb 0b
[ 1042.844358] RSP: 0018:ffffbb1ec8f6bc90 EFLAGS: 00010206
[ 1042.849608] RAX: 0000000000000000 RBX: ffff94409e924000 RCX:
0000000000000000
[ 1042.856739] RDX: 000000000000000a RSI: 0000000000000000 RDI:
ffff94409e9240d8
[ 1042.863872] RBP: 0000000000000001 R08: ffff9440aec00b68 R09:
ffff9440aec00ca0
[ 1042.871003] R10: 0000000000000000 R11: ffffffffb624a418 R12:
0000000000000001
[ 1042.878136] R13: ffffffffc00f60d0 R14: 0000000000000060 R15:
ffff9440a2deca80
[ 1042.885269] FS: 0000000000000000(0000) GS:ffff9444af280000(0000)
knlGS:0000000000000000
[ 1042.893352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1042.899102] CR2: 0000000000000000 CR3: 00000005f420a003 CR4:
00000000007606e0
[ 1042.906241] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1042.913372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1042.920505] PKRU: 55555554
^ permalink raw reply [flat|nested] 4+ messages in thread
* Should I submit bugs against RC kernels?
2018-11-21 18:21 Should I submit bugs against RC kernels? Alex_Gagniuc
@ 2018-11-21 19:19 ` Keith Busch
2018-11-23 16:08 ` Igor Konopko
2018-11-22 8:38 ` Ming Lei
1 sibling, 1 reply; 4+ messages in thread
From: Keith Busch @ 2018-11-21 19:19 UTC (permalink / raw)
On Wed, Nov 21, 2018@06:21:49PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> Hi,
>
> I'm not sure if submitting bugs against RC is a good idea. I am seeing
> more issues with SURPRISE!!! removal of drives under v4.20-rc3.
>
> Alex
Hi Alex,
Yes, you can (and should!) report sightings for issues on rc kernels to
the appropriate subsystems.
The following should resolve the issue you're seeing. I won't be able to
test it till next Monday, though.
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4f504e8f0669..b1ce747411de 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1495,6 +1495,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
blk_mq_unquiesce_queue(dev->ctrl.admin_q);
blk_cleanup_queue(dev->ctrl.admin_q);
blk_mq_free_tag_set(&dev->admin_tagset);
+ dev->ctrl.admin_q = NULL;
}
}
--
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Should I submit bugs against RC kernels?
2018-11-21 18:21 Should I submit bugs against RC kernels? Alex_Gagniuc
2018-11-21 19:19 ` Keith Busch
@ 2018-11-22 8:38 ` Ming Lei
1 sibling, 0 replies; 4+ messages in thread
From: Ming Lei @ 2018-11-22 8:38 UTC (permalink / raw)
On Wed, Nov 21, 2018@06:21:49PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> Hi,
>
> I'm not sure if submitting bugs against RC is a good idea. I am seeing
> more issues with SURPRISE!!! removal of drives under v4.20-rc3.
>
> Alex
>
> APPENDIX A: Example issue
>
> [ 1042.415286] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down
> [ 1042.415346] pciehp 0000:b0:05.0:pcie204: Slot(179): Link Down
> [ 1042.415431] pciehp 0000:3c:07.0:pcie204: Slot(181): Link Down
> [ 1042.447734] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000000
> [ 1042.455586] PGD 0 P4D 0
> [ 1042.458135] Oops: 0000 [#1] SMP PTI
> [ 1042.461625] CPU: 5 PID: 622 Comm: irq/44-pciehp Not tainted
> 4.20.0-rc3+ #66
> [ 1042.468582] Hardware name: Dell Inc. PowerEdge R740xd/07X9K0, BIOS
> 1.4.5 [HPX test BIOS 2] 03/30/2018
> [ 1042.477801] RIP: 0010:sbitmap_any_bit_set+0xb/0x40
> [ 1042.482586] Code: c8 83 c2 01 45 89 ca 4c 89 54 01 08 48 8b 4f 10 2b
> 74 01 08 39 57 08 77 d8 c3 0f 1f 44 00 00 8b 57 08 85 d2 74 2a 48 8b 47
> 10 <48> 83 38 00 75 23 83 ea 01 48 83 c0 40 48 c1 e2 06 48 01 c2 eb 0b
> [ 1042.501341] RSP: 0018:ffffbb1ec8f6bc90 EFLAGS: 00010206
> [ 1042.506566] RAX: 0000000000000000 RBX: ffff94409e924000 RCX:
> 0000000000000000
> [ 1042.513696] RDX: 000000000000000a RSI: 0000000000000000 RDI:
> ffff94409e9240d8
> [ 1042.520839] RBP: 0000000000000001 R08: ffff9440aec00b68 R09:
> ffff9440aec00ca0
> [ 1042.527971] R10: 0000000000000000 R11: ffffffffb624a418 R12:
> 0000000000000001
> [ 1042.535102] R13: ffffffffc00f60d0 R14: 0000000000000060 R15:
> ffff9440a2deca80
> [ 1042.542236] FS: 0000000000000000(0000) GS:ffff9444af280000(0000)
> knlGS:0000000000000000
> [ 1042.550318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1042.556066] CR2: 0000000000000000 CR3: 00000005f420a003 CR4:
> 00000000007606e0
> [ 1042.563199] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1042.570329] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 1042.577462] PKRU: 55555554
> [ 1042.580176] Call Trace:
> [ 1042.582631] blk_mq_run_hw_queue+0xdd/0x120
> [ 1042.586815] blk_mq_run_hw_queues+0x3a/0x50
> [ 1042.591006] nvme_kill_queues+0x26/0x50 [nvme_core]
> [ 1042.595888] nvme_remove_namespaces+0xbf/0xd0 [nvme_core]
> [ 1042.601295] nvme_remove+0x60/0x170 [nvme]
> [ 1042.605395] pci_device_remove+0x3b/0xc0
> [ 1042.609329] device_release_driver_internal+0x180/0x240
> [ 1042.614555] pci_stop_bus_device+0x69/0x90
> [ 1042.618652] pci_stop_and_remove_bus_device+0xe/0x20
> [ 1042.623620] pciehp_unconfigure_device+0x84/0x140
> [ 1042.628324] pciehp_disable_slot+0x67/0x110
> [ 1042.632512] pciehp_handle_presence_or_link_change+0xd8/0x400
> [ 1042.638264] ? __synchronize_hardirq+0x43/0x50
> [ 1042.642709] pciehp_ist+0x1bb/0x1c0
> [ 1042.646201] ? irq_forced_thread_fn+0x70/0x70
> [ 1042.650559] irq_thread_fn+0x1f/0x60
> [ 1042.654140] irq_thread+0xf3/0x190
> [ 1042.657554] ? irq_thread_fn+0x60/0x60
> [ 1042.661306] ? irq_thread_check_affinity.part.45+0x80/0x80
> [ 1042.666802] kthread+0x112/0x130
> [ 1042.670036] ? kthread_park+0x80/0x80
> [ 1042.673709] ret_from_fork+0x35/0x40
> [ 1042.677286] Modules linked in: xt_CHECKSUM ipt_MASQUERADE tun
> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
> ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4
> nf_nat nf_conntrack devlink nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle
> iptable_raw iptable_security ebtable_filter ebtables ip6table_filter
> ip6_tables vfat fat sunrpc intel_rapl skx_edac nfit x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore
> intel_rapl_perf iTCO_wdt iTCO_vendor_support dcdbas pcc_cpufreq ses
> joydev enclosure pcspkr mei_me ioatdma mei dca lpc_ich i2c_i801
> ipmi_si(+) ipmi_devintf ipmi_msghandler acpi_power_meter raid1 dm_raid
> raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor
> async_tx raid6_pq mgag200 i2c_algo_bit drm_kms_helper mpt3sas ttm nvme
> raid_class drm crc32c_intel nvme_core uas
> [ 1042.677321] scsi_transport_sas usb_storage tg3
> [ 1042.768888] CR2: 0000000000000000
> [ 1042.772207] ---[ end trace fba458b21588ceca ]---
> [ 1042.820816] RIP: 0010:sbitmap_any_bit_set+0xb/0x40
> [ 1042.825613] Code: c8 83 c2 01 45 89 ca 4c 89 54 01 08 48 8b 4f 10 2b
> 74 01 08 39 57 08 77 d8 c3 0f 1f 44 00 00 8b 57 08 85 d2 74 2a 48 8b 47
> 10 <48> 83 38 00 75 23 83 ea 01 48 83 c0 40 48 c1 e2 06 48 01 c2 eb 0b
> [ 1042.844358] RSP: 0018:ffffbb1ec8f6bc90 EFLAGS: 00010206
> [ 1042.849608] RAX: 0000000000000000 RBX: ffff94409e924000 RCX:
> 0000000000000000
> [ 1042.856739] RDX: 000000000000000a RSI: 0000000000000000 RDI:
> ffff94409e9240d8
> [ 1042.863872] RBP: 0000000000000001 R08: ffff9440aec00b68 R09:
> ffff9440aec00ca0
> [ 1042.871003] R10: 0000000000000000 R11: ffffffffb624a418 R12:
> 0000000000000001
> [ 1042.878136] R13: ffffffffc00f60d0 R14: 0000000000000060 R15:
> ffff9440a2deca80
> [ 1042.885269] FS: 0000000000000000(0000) GS:ffff9444af280000(0000)
> knlGS:0000000000000000
> [ 1042.893352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1042.899102] CR2: 0000000000000000 CR3: 00000005f420a003 CR4:
> 00000000007606e0
> [ 1042.906241] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1042.913372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 1042.920505] PKRU: 55555554
It is same with recent SCSI's report, and I guess the following patch
should work:
--
diff --git a/block/blk-core.c b/block/blk-core.c
index 04f5be473638..f6943f4a4d16 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -355,7 +355,7 @@ void blk_cleanup_queue(struct request_queue *q)
* We rely on driver to deal with the race in case that queue
* initialization isn't done.
*/
- if (queue_is_mq(q) && blk_queue_init_done(q))
+ if (queue_is_mq(q))
blk_mq_quiesce_queue(q);
/* for synchronous bio-based driver finish in-flight integrity i/o */
--
Ming
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Should I submit bugs against RC kernels?
2018-11-21 19:19 ` Keith Busch
@ 2018-11-23 16:08 ` Igor Konopko
0 siblings, 0 replies; 4+ messages in thread
From: Igor Konopko @ 2018-11-23 16:08 UTC (permalink / raw)
On 21.11.2018 20:19, Keith Busch wrote:
> On Wed, Nov 21, 2018@06:21:49PM +0000, Alex_Gagniuc@Dellteam.com wrote:
>> Hi,
>>
>> I'm not sure if submitting bugs against RC is a good idea. I am seeing
>> more issues with SURPRISE!!! removal of drives under v4.20-rc3.
>>
>> Alex
>
> Hi Alex,
>
> Yes, you can (and should!) report sightings for issues on rc kernels to
> the appropriate subsystems.
>
> The following should resolve the issue you're seeing. I won't be able to
> test it till next Monday, though.
>
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4f504e8f0669..b1ce747411de 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1495,6 +1495,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
> blk_mq_unquiesce_queue(dev->ctrl.admin_q);
> blk_cleanup_queue(dev->ctrl.admin_q);
> blk_mq_free_tag_set(&dev->admin_tagset);
> + dev->ctrl.admin_q = NULL;
> }
> }
>
> --
Hi
I also hit the same issue as Alex. I also tried the proposed fix and it
helps for surprise removal scenario, but causes another issue - when
during hot removal scenarios we have some ongoing calls to
blk_execute_rq() - which uses admin queue, then we hit null pointer
dereference on that path.
I just submit other fix, which on my side helps for both hot removal
scenario and also does not brake blk_execute_rq() path.
Igor
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-11-23 16:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-21 18:21 Should I submit bugs against RC kernels? Alex_Gagniuc
2018-11-21 19:19 ` Keith Busch
2018-11-23 16:08 ` Igor Konopko
2018-11-22 8:38 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox