BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
       [not found] <799322399.34367785.1512659684033.JavaMail.zimbra@redhat.com>
@ 2017-12-08  7:24 ` Yi Zhang
  2017-12-11  3:58   ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2017-12-08  7:24 UTC (permalink / raw)


Hi 
I found this issue during nvme blk-mq io scheduler test on 4.15.0-rc2, let me know if you need more info, thanks.

Reproduce steps
MQ_IOSCHEDS=`sed 's/[][]//g' /sys/block/nvme0n1/queue/scheduler
dd if=/dev/nvme0n1p1 of=/dev/null bs=4096 &
while kill -0 $! 2>/dev/null; do
	for SCHEDULER in $MQ_IOSCHEDS; do
		echo "INFO: BLK-MQ IO SCHEDULER:$SCHEDULER testing during IO"
		echo $SCHEDULER > /sys/block/nvme0n1/queue/scheduler
		echo 1 >/sys/bus/pci/devices/0000\:84\:00.0/reset
		sleep 0.5
	done
done

Kernel log:
[  101.202734] BUG: unable to handle kernel NULL pointer dereference at 0000000094d3013f
[  101.211487] IP: blk_mq_map_swqueue+0xbc/0x200
[  101.216346] PGD 0 P4D 0 
[  101.219171] Oops: 0000 [#1] SMP
[  101.222674] Modules linked in: sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mxm_wmi intel_rapl_perf iTCO_wdt ipmi_si ipmi_devintf pcspkr iTCO_vendor_support sg dcdbas ipmi_msghandler wmi mei_me lpc_ich shpchp mei acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci crc32c_intel libata tg3 nvme nvme_core megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[  101.284881] CPU: 0 PID: 504 Comm: kworker/u25:5 Not tainted 4.15.0-rc2 #1
[  101.292455] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[  101.301001] Workqueue: nvme-wq nvme_reset_work [nvme]
[  101.306636] task: 00000000f2c53190 task.stack: 000000002da874f9
[  101.313241] RIP: 0010:blk_mq_map_swqueue+0xbc/0x200
[  101.318681] RSP: 0018:ffffc9000234fd70 EFLAGS: 00010282
[  101.324511] RAX: ffff88047ffc9480 RBX: ffff88047e130850 RCX: 0000000000000000
[  101.332471] RDX: ffffe8ffffd40580 RSI: ffff88047e509b40 RDI: ffff88046f37a008
[  101.340432] RBP: 000000000000000b R08: ffff88046f37a008 R09: 0000000011f94280
[  101.348392] R10: ffff88047ffd4d00 R11: 0000000000000000 R12: ffff88046f37a008
[  101.356353] R13: ffff88047e130f38 R14: 000000000000000b R15: ffff88046f37a558
[  101.364314] FS:  0000000000000000(0000) GS:ffff880277c00000(0000) knlGS:0000000000000000
[  101.373342] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  101.379753] CR2: 0000000000000098 CR3: 000000047f409004 CR4: 00000000001606f0
[  101.387714] Call Trace:
[  101.390445]  blk_mq_update_nr_hw_queues+0xbf/0x130
[  101.395791]  nvme_reset_work+0x6f4/0xc06 [nvme]
[  101.400848]  ? pick_next_task_fair+0x290/0x5f0
[  101.405807]  ? __switch_to+0x1f5/0x430
[  101.409988]  ? put_prev_entity+0x2f/0xd0
[  101.414365]  process_one_work+0x141/0x340
[  101.418836]  worker_thread+0x47/0x3e0
[  101.422921]  kthread+0xf5/0x130
[  101.426424]  ? rescuer_thread+0x380/0x380
[  101.430896]  ? kthread_associate_blkcg+0x90/0x90
[  101.436048]  ret_from_fork+0x1f/0x30
[  101.440034] Code: 48 83 3c ca 00 0f 84 2b 01 00 00 48 63 cd 48 8b 93 10 01 00 00 8b 0c 88 48 8b 83 20 01 00 00 4a 03 14 f5 60 04 af 81 48 8b 0c c8 <48> 8b 81 98 00 00 00 f0 4c 0f ab 30 8b 81 f8 00 00 00 89 42 44 
[  101.461116] RIP: blk_mq_map_swqueue+0xbc/0x200 RSP: ffffc9000234fd70
[  101.468205] CR2: 0000000000000098
[  101.471907] ---[ end trace 5fe710f98228a3ca ]---
[  101.482489] Kernel panic - not syncing: Fatal exception
[  101.488505] Kernel Offset: disabled
[  101.497752] ---[ end Kernel panic - not syncing: Fatal exception
[  101.504491] ------------[ cut here ]------------
[  101.509641] sched: Unexpected reschedule of offline CPU#1!
[  101.515773] WARNING: CPU: 0 PID: 504 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x36/0x40
[  101.526157] Modules linked in: sunrpc ipmi_ssif vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mxm_wmi intel_rapl_perf iTCO_wdt ipmi_si ipmi_devintf pcspkr iTCO_vendor_support sg dcdbas ipmi_msghandler wmi mei_me lpc_ich shpchp mei acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci crc32c_intel libata tg3 nvme nvme_core megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[  101.588366] CPU: 0 PID: 504 Comm: kworker/u25:5 Tainted: G      D          4.15.0-rc2 #1
[  101.597395] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.5.5 08/16/2017
[  101.605942] Workqueue: nvme-wq nvme_reset_work [nvme]
[  101.611578] task: 00000000f2c53190 task.stack: 000000002da874f9
[  101.618182] RIP: 0010:native_smp_send_reschedule+0x36/0x40
[  101.624301] RSP: 0018:ffff880277c03ed0 EFLAGS: 00010086
[  101.630130] RAX: 0000000000000000 RBX: ffff880277c1b640 RCX: 0000000000000006
[  101.638091] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff880277c0e050
[  101.646051] RBP: ffff880277c1b640 R08: 0000000000000000 R09: 0000000000000637
[  101.654012] R10: 0000000000000000 R11: ffff880277c03c38 R12: 0000000000000000
[  101.661973] R13: ffff880271680000 R14: 0000000000000001 R15: ffff880277c14240
[  101.669934] FS:  0000000000000000(0000) GS:ffff880277c00000(0000) knlGS:0000000000000000
[  101.678962] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  101.685373] CR2: 0000000000000098 CR3: 000000047f409004 CR4: 00000000001606f0
[  101.693334] Call Trace:
[  101.696062]  <IRQ>
[  101.698307]  scheduler_tick+0xa4/0xd0
[  101.702390]  ? tick_sched_do_timer+0x60/0x60
[  101.707153]  update_process_times+0x40/0x50
[  101.711820]  tick_sched_handle+0x26/0x60
[  101.716196]  tick_sched_timer+0x34/0x70
[  101.720474]  __hrtimer_run_queues+0xdc/0x220
[  101.725237]  hrtimer_interrupt+0x99/0x190
[  101.729715]  smp_apic_timer_interrupt+0x56/0x120
[  101.734868]  apic_timer_interrupt+0x96/0xa0
[  101.739532]  </IRQ>
[  101.741872] RIP: 0010:panic+0x1fa/0x23c
[  101.746150] RSP: 0018:ffffc9000234fb30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
[  101.754597] RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000006
[  101.762558] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff880277c0e050
[  101.770518] RBP: ffffc9000234fba0 R08: 0000000000000000 R09: 0000000000000635
[  101.778479] R10: 0000000000000000 R11: ffffc9000234f8a0 R12: ffffffff81a3e998
[  101.786439] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[  101.794405]  oops_end+0xaf/0xc0
[  101.797911]  no_context+0x1a9/0x3f0
[  101.801793]  __do_page_fault+0x97/0x4d0
[  101.806073]  do_page_fault+0x33/0x120
[  101.810158]  page_fault+0x22/0x30
[  101.813859] RIP: 0010:blk_mq_map_swqueue+0xbc/0x200
[  101.819299] RSP: 0018:ffffc9000234fd70 EFLAGS: 00010282
[  101.825128] RAX: ffff88047ffc9480 RBX: ffff88047e130850 RCX: 0000000000000000
[  101.833089] RDX: ffffe8ffffd40580 RSI: ffff88047e509b40 RDI: ffff88046f37a008
[  101.841050] RBP: 000000000000000b R08: ffff88046f37a008 R09: 0000000011f94280
[  101.849010] R10: ffff88047ffd4d00 R11: 0000000000000000 R12: ffff88046f37a008
[  101.856971] R13: ffff88047e130f38 R14: 000000000000000b R15: ffff88046f37a558
[  101.864935]  blk_mq_update_nr_hw_queues+0xbf/0x130
[  101.870281]  nvme_reset_work+0x6f4/0xc06 [nvme]
[  101.875337]  ? pick_next_task_fair+0x290/0x5f0
[  101.880297]  ? __switch_to+0x1f5/0x430
[  101.884478]  ? put_prev_entity+0x2f/0xd0
[  101.888854]  process_one_work+0x141/0x340
[  101.893328]  worker_thread+0x47/0x3e0
[  101.897412]  kthread+0xf5/0x130
[  101.900915]  ? rescuer_thread+0x380/0x380
[  101.905387]  ? kthread_associate_blkcg+0x90/0x90
[  101.910538]  ret_from_fork+0x1f/0x30
[  101.914524] Code: d1 67 dc 00 0f 92 c0 84 c0 74 12 48 8b 05 63 a1 ab 00 be fd 00 00 00 48 8b 40 30 ff e0 89 fe 48 c7 c7 10 5a a4 81 e8 4a d5 05 00 <0f> ff c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 ec 20 65 48 
[  101.935597] ---[ end trace 5fe710f98228a3cb ]---

Best Regards,
  Yi Zhang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
  2017-12-08  7:24 ` BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2 Yi Zhang
@ 2017-12-11  3:58   ` Ming Lei
  2017-12-11 13:29     ` Yi Zhang
  2017-12-12  8:35     ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Ming Lei @ 2017-12-11  3:58 UTC (permalink / raw)


Hi Zhang Yi,

On Fri, Dec 08, 2017@02:24:29AM -0500, Yi Zhang wrote:
> Hi 
> I found this issue during nvme blk-mq io scheduler test on 4.15.0-rc2, let me know if you need more info, thanks.
> 
> Reproduce steps
> MQ_IOSCHEDS=`sed 's/[][]//g' /sys/block/nvme0n1/queue/scheduler
> dd if=/dev/nvme0n1p1 of=/dev/null bs=4096 &
> while kill -0 $! 2>/dev/null; do
> 	for SCHEDULER in $MQ_IOSCHEDS; do
> 		echo "INFO: BLK-MQ IO SCHEDULER:$SCHEDULER testing during IO"
> 		echo $SCHEDULER > /sys/block/nvme0n1/queue/scheduler
> 		echo 1 >/sys/bus/pci/devices/0000\:84\:00.0/reset
> 		sleep 0.5
> 	done
> done
> 
> Kernel log:
> [  101.202734] BUG: unable to handle kernel NULL pointer dereference at 0000000094d3013f
> [  101.211487] IP: blk_mq_map_swqueue+0xbc/0x200

As we talked offline, this IP points to cpumask_set_cpu(), seems this
case may happen when one CPU isn't mapped to any hw queue, could you test
the following patch to see if it helps your issue?

--
diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 76944e3271bf..c60d06bfa76e 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -33,6 +33,9 @@ int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
 	const struct cpumask *mask;
 	unsigned int queue, cpu;
 
+	for_each_possible_cpu(cpu)
+		set->mq_map[cpu] = 0;
+
 	for (queue = 0; queue < set->nr_hw_queues; queue++) {
 		mask = pci_irq_get_affinity(pdev, queue);
 		if (!mask)
Thanks,
Ming

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
  2017-12-11  3:58   ` Ming Lei
@ 2017-12-11 13:29     ` Yi Zhang
  2017-12-11 13:47       ` Ming Lei
  2017-12-12  8:35     ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2017-12-11 13:29 UTC (permalink / raw)




On 12/11/2017 11:58 AM, Ming Lei wrote:
> Hi Zhang Yi,
>
> On Fri, Dec 08, 2017@02:24:29AM -0500, Yi Zhang wrote:
>> Hi
>> I found this issue during nvme blk-mq io scheduler test on 4.15.0-rc2, let me know if you need more info, thanks.
>>
>> Reproduce steps
>> MQ_IOSCHEDS=`sed 's/[][]//g' /sys/block/nvme0n1/queue/scheduler
>> dd if=/dev/nvme0n1p1 of=/dev/null bs=4096 &
>> while kill -0 $! 2>/dev/null; do
>> 	for SCHEDULER in $MQ_IOSCHEDS; do
>> 		echo "INFO: BLK-MQ IO SCHEDULER:$SCHEDULER testing during IO"
>> 		echo $SCHEDULER > /sys/block/nvme0n1/queue/scheduler
>> 		echo 1 >/sys/bus/pci/devices/0000\:84\:00.0/reset
>> 		sleep 0.5
>> 	done
>> done
>>
>> Kernel log:
>> [  101.202734] BUG: unable to handle kernel NULL pointer dereference at 0000000094d3013f
>> [  101.211487] IP: blk_mq_map_swqueue+0xbc/0x200
> As we talked offline, this IP points to cpumask_set_cpu(), seems this
> case may happen when one CPU isn't mapped to any hw queue, could you test
> the following patch to see if it helps your issue?

Hi Ming
With this patch, I reproduced another BUG, here is part for the log

[?? 93.263237] ------------[ cut here ]------------
[?? 93.268391] kernel BUG at drivers/nvme/host/pci.c:408!
[?? 93.274146] invalid opcode: 0000 [#1] SMP
[?? 93.278618] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 
auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc ipmi_ssif 
vfat fat intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp 
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel iTCO_wdt intel_cstate ipmi_si iTCO_vendor_support 
intel_uncore mxm_wmi mei_me ipmi_devintf intel_rapl_perf pcspkr sg 
ipmi_msghandler lpc_ich dcdbas mei shpchp acpi_power_meter wmi 
dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
ahci libahci nvme libata crc32c_intel nvme_core tg3 megaraid_sas ptp 
i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[?? 93.349071] CPU: 5 PID: 1842 Comm: sh Not tainted 4.15.0-rc2.ming+ #4
[?? 93.356256] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 
2.5.5 08/16/2017
[?? 93.364801] task: 00000000fb8abf2a task.stack: 0000000028bd82d1
[?? 93.371408] RIP: 0010:nvme_init_request+0x36/0x40 [nvme]
[?? 93.377333] RSP: 0018:ffffc90002537ca8 EFLAGS: 00010246
[?? 93.383161] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000008
[?? 93.391122] RDX: 0000000000000000 RSI: ffff880276ae0000 RDI: 
ffff88047bae9008
[?? 93.399084] RBP: ffff88047bae9008 R08: ffff88047bae9008 R09: 
0000000009dabc00
[?? 93.407045] R10: 0000000000000004 R11: 000000000000299c R12: 
ffff880186bc1f00
[?? 93.415007] R13: ffff880276ae0000 R14: 0000000000000000 R15: 
0000000000000071
[?? 93.422969] FS:? 00007f33cf288740(0000) GS:ffff88047ba80000(0000) 
knlGS:0000000000000000
[?? 93.431996] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[?? 93.438407] CR2: 00007f33cf28e000 CR3: 000000047e5bb006 CR4: 
00000000001606e0
[?? 93.446368] Call Trace:
[?? 93.449103]? blk_mq_alloc_rqs+0x231/0x2a0
[?? 93.453579]? blk_mq_sched_alloc_tags.isra.8+0x42/0x80
[?? 93.459214]? blk_mq_init_sched+0x7e/0x140
[?? 93.463687]? elevator_switch+0x5a/0x1f0
[?? 93.467966]? ? elevator_get.isra.17+0x52/0xc0
[?? 93.472826]? elv_iosched_store+0xde/0x150
[?? 93.477299]? queue_attr_store+0x4e/0x90
[?? 93.481580]? kernfs_fop_write+0xfa/0x180
[?? 93.485958]? __vfs_write+0x33/0x170
[?? 93.489851]? ? __inode_security_revalidate+0x4c/0x60
[?? 93.495390]? ? selinux_file_permission+0xda/0x130
[?? 93.500641]? ? _cond_resched+0x15/0x30
[?? 93.504815]? vfs_write+0xad/0x1a0
[?? 93.508512]? SyS_write+0x52/0xc0
[?? 93.512113]? do_syscall_64+0x61/0x1a0
[?? 93.516199]? entry_SYSCALL64_slow_path+0x25/0x25
[?? 93.521351] RIP: 0033:0x7f33ce96aab0
[?? 93.525337] RSP: 002b:00007ffe57570238 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[?? 93.533785] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 
00007f33ce96aab0
[?? 93.541746] RDX: 0000000000000006 RSI: 00007f33cf28e000 RDI: 
0000000000000001
[?? 93.549707] RBP: 00007f33cf28e000 R08: 000000000000000a R09: 
00007f33cf288740
[?? 93.557669] R10: 00007f33cf288740 R11: 0000000000000246 R12: 
00007f33cec42400
[?? 93.565630] R13: 0000000000000006 R14: 0000000000000001 R15: 
0000000000000000
[?? 93.573592] Code: 4c 8d 40 08 4c 39 c7 74 16 48 8b 00 48 8b 04 08 48 
85 c0 74 16 48 89 86 78 01 00 00 31 c0 c3 8d 4a 01 48 63 c9 48 c1 e1 03 
eb de <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 f6 53 48 89
[?? 93.594676] RIP: nvme_init_request+0x36/0x40 [nvme] RSP: ffffc90002537ca8
[?? 93.602273] ---[ end trace 810dde3993e5f14e ]---

Full log:
https://pastebin.com/iafzB2DE

> --
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..c60d06bfa76e 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -33,6 +33,9 @@ int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
>   	const struct cpumask *mask;
>   	unsigned int queue, cpu;
>   
> +	for_each_possible_cpu(cpu)
> +		set->mq_map[cpu] = 0;
> +
>   	for (queue = 0; queue < set->nr_hw_queues; queue++) {
>   		mask = pci_irq_get_affinity(pdev, queue);
>   		if (!mask)
> Thanks,
> Ming
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
  2017-12-11 13:29     ` Yi Zhang
@ 2017-12-11 13:47       ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2017-12-11 13:47 UTC (permalink / raw)


On Mon, Dec 11, 2017@09:29:40PM +0800, Yi Zhang wrote:
> 
> 
> On 12/11/2017 11:58 AM, Ming Lei wrote:
> > Hi Zhang Yi,
> > 
> > On Fri, Dec 08, 2017@02:24:29AM -0500, Yi Zhang wrote:
> > > Hi
> > > I found this issue during nvme blk-mq io scheduler test on 4.15.0-rc2, let me know if you need more info, thanks.
> > > 
> > > Reproduce steps
> > > MQ_IOSCHEDS=`sed 's/[][]//g' /sys/block/nvme0n1/queue/scheduler
> > > dd if=/dev/nvme0n1p1 of=/dev/null bs=4096 &
> > > while kill -0 $! 2>/dev/null; do
> > > 	for SCHEDULER in $MQ_IOSCHEDS; do
> > > 		echo "INFO: BLK-MQ IO SCHEDULER:$SCHEDULER testing during IO"
> > > 		echo $SCHEDULER > /sys/block/nvme0n1/queue/scheduler
> > > 		echo 1 >/sys/bus/pci/devices/0000\:84\:00.0/reset
> > > 		sleep 0.5
> > > 	done
> > > done
> > > 
> > > Kernel log:
> > > [  101.202734] BUG: unable to handle kernel NULL pointer dereference at 0000000094d3013f
> > > [  101.211487] IP: blk_mq_map_swqueue+0xbc/0x200
> > As we talked offline, this IP points to cpumask_set_cpu(), seems this
> > case may happen when one CPU isn't mapped to any hw queue, could you test
> > the following patch to see if it helps your issue?
> 
> Hi Ming
> With this patch, I reproduced another BUG, here is part for the log
> 
> [?? 93.263237] ------------[ cut here ]------------
> [?? 93.268391] kernel BUG at drivers/nvme/host/pci.c:408!

Hi Zhang Yi,

Thanks for your test!

That is the race between updating hw queue and switching io scheduler,
especially on q->nr_hw_queues. Could you run the following patch to see
if it fixes both?

--
diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 76944e3271bf..c60d06bfa76e 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -33,6 +33,9 @@ int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
 	const struct cpumask *mask;
 	unsigned int queue, cpu;
 
+	for_each_possible_cpu(cpu)
+		set->mq_map[cpu] = 0;
+
 	for (queue = 0; queue < set->nr_hw_queues; queue++) {
 		mask = pci_irq_get_affinity(pdev, queue);
 		if (!mask)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..3e91819fc8e8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2415,6 +2415,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 		}
 		blk_mq_hctx_kobj_init(hctxs[i]);
 	}
+	mutex_lock(&q->sysfs_lock);
 	for (j = i; j < q->nr_hw_queues; j++) {
 		struct blk_mq_hw_ctx *hctx = hctxs[j];
 
@@ -2428,6 +2429,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 		}
 	}
 	q->nr_hw_queues = i;
+	mutex_unlock(&q->sysfs_lock);
 	blk_mq_sysfs_register(q);
 }
 

Thanks,
Ming

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
  2017-12-11  3:58   ` Ming Lei
  2017-12-11 13:29     ` Yi Zhang
@ 2017-12-12  8:35     ` Christoph Hellwig
  2017-12-12  9:16       ` Ming Lei
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2017-12-12  8:35 UTC (permalink / raw)


On Mon, Dec 11, 2017@11:58:45AM +0800, Ming Lei wrote:
> As we talked offline, this IP points to cpumask_set_cpu(), seems this
> case may happen when one CPU isn't mapped to any hw queue, could you test
> the following patch to see if it helps your issue?

This looks odd as mq_map should be pre-zeroed.  If it isn't we have
a few more problems..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
  2017-12-12  8:35     ` Christoph Hellwig
@ 2017-12-12  9:16       ` Ming Lei
  2018-01-05 16:39         ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2017-12-12  9:16 UTC (permalink / raw)


On Tue, Dec 12, 2017@12:35:02AM -0800, Christoph Hellwig wrote:
> On Mon, Dec 11, 2017@11:58:45AM +0800, Ming Lei wrote:
> > As we talked offline, this IP points to cpumask_set_cpu(), seems this
> > case may happen when one CPU isn't mapped to any hw queue, could you test
> > the following patch to see if it helps your issue?
> 
> This looks odd as mq_map should be pre-zeroed.  If it isn't we have
> a few more problems..

This happens in case of updating nr_hw_queue, for example, when the
number becomes 5 from 12, then the table has to be zeroed for avoiding
obsolete mapping if one CPU isn't mapped to any hw queue.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
  2017-12-12  9:16       ` Ming Lei
@ 2018-01-05 16:39         ` Jens Axboe
  2018-01-06  8:38           ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2018-01-05 16:39 UTC (permalink / raw)


On 12/12/17 2:16 AM, Ming Lei wrote:
> On Tue, Dec 12, 2017@12:35:02AM -0800, Christoph Hellwig wrote:
>> On Mon, Dec 11, 2017@11:58:45AM +0800, Ming Lei wrote:
>>> As we talked offline, this IP points to cpumask_set_cpu(), seems this
>>> case may happen when one CPU isn't mapped to any hw queue, could you test
>>> the following patch to see if it helps your issue?
>>
>> This looks odd as mq_map should be pre-zeroed.  If it isn't we have
>> a few more problems..
> 
> This happens in case of updating nr_hw_queue, for example, when the
> number becomes 5 from 12, then the table has to be zeroed for avoiding
> obsolete mapping if one CPU isn't mapped to any hw queue.

Do you have a proper complete patch for this?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2
  2018-01-05 16:39         ` Jens Axboe
@ 2018-01-06  8:38           ` Ming Lei
  0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2018-01-06  8:38 UTC (permalink / raw)


On Fri, Jan 05, 2018@09:39:48AM -0700, Jens Axboe wrote:
> On 12/12/17 2:16 AM, Ming Lei wrote:
> > On Tue, Dec 12, 2017@12:35:02AM -0800, Christoph Hellwig wrote:
> >> On Mon, Dec 11, 2017@11:58:45AM +0800, Ming Lei wrote:
> >>> As we talked offline, this IP points to cpumask_set_cpu(), seems this
> >>> case may happen when one CPU isn't mapped to any hw queue, could you test
> >>> the following patch to see if it helps your issue?
> >>
> >> This looks odd as mq_map should be pre-zeroed.  If it isn't we have
> >> a few more problems..
> > 
> > This happens in case of updating nr_hw_queue, for example, when the
> > number becomes 5 from 12, then the table has to be zeroed for avoiding
> > obsolete mapping if one CPU isn't mapped to any hw queue.
> 
> Do you have a proper complete patch for this?

Please see the latest version:

	https://marc.info/?l=linux-block&m=151522728023803&w=2

Thanks,
Ming

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-06  8:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <799322399.34367785.1512659684033.JavaMail.zimbra@redhat.com>
2017-12-08  7:24 ` BUG: NULL pointer at IP: blk_mq_map_swqueue+0xbc/0x200 on 4.15.0-rc2 Yi Zhang
2017-12-11  3:58   ` Ming Lei
2017-12-11 13:29     ` Yi Zhang
2017-12-11 13:47       ` Ming Lei
2017-12-12  8:35     ` Christoph Hellwig
2017-12-12  9:16       ` Ming Lei
2018-01-05 16:39         ` Jens Axboe
2018-01-06  8:38           ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox