[BUG] USB xHCI driver NULL pointer dereference

linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] USB xHCI driver NULL pointer dereference
@ 2024-08-10 22:11 Karel Balej
  2024-08-13 11:49 ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Karel Balej @ 2024-08-10 22:11 UTC (permalink / raw)
  To: linux-usb

Hello,

my machine crashed twice in the past week, the second time I have been
able to recover the log output (including the stack trace run through
scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI
driver:

	[44193.556677] usb 2-1-port5: disabled by hub (EMI?), re-enabling...
	[44193.556692] usb 2-1.5: USB disconnect, device number 6
	[44193.558532] cdc_ncm 2-1.5:1.0 enp0s29u1u5: unregister 'cdc_ncm' usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP)
	[44193.739545] usb 2-1.5: new high-speed USB device number 7 using ehci-pci
	[44193.819628] usb 2-1.5: New USB device found, idVendor=18d1, idProduct=d001, bcdDevice= 6.10
	[44193.819637] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
	[44193.819641] usb 2-1.5: Product: Samsung Galaxy Core Prime VE LTE
	[44193.819644] usb 2-1.5: Manufacturer: Samsung
	[44193.819646] usb 2-1.5: SerialNumber: postmarketOS
	[44193.842472] cdc_ncm 2-1.5:1.0: MAC-Address: [...]
	[44193.842770] cdc_ncm 2-1.5:1.0 usb0: register 'cdc_ncm' at usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP), [...]
	[44193.845829] cdc_ncm 2-1.5:1.0 enp0s29u1u5: renamed from usb0
	[46253.017991] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
	[46709.344533] usb 3-1: new full-speed USB device number 3 using xhci_hcd
	[46709.458560] usb 3-1: device descriptor read/64, error -71
	[46709.679562] usb 3-1: device descriptor read/64, error -71
	[46709.895544] usb 3-1: new full-speed USB device number 4 using xhci_hcd
	[46710.009563] usb 3-1: device descriptor read/64, error -71
	[46710.231579] usb 3-1: device descriptor read/64, error -71
	[46710.333629] usb usb3-port1: attempt power cycle
	[46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd
	[46710.713699] usb 3-1: Device not responding to setup address.
	[46710.917684] usb 3-1: Device not responding to setup address.
	[46711.125536] usb 3-1: device not accepting address 5, error -71
	[46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008
	[46711.125600] #PF: supervisor read access in kernel mode
	[46711.125603] #PF: error_code(0x0000) - not-present page
	[46711.125606] PGD 0 P4D 0
	[46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
	[46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1
	[46711.125620] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77-D3H, BIOS F18 08/21/2012
	[46711.125623] Workqueue: usb_hub_wq hub_event [usbcore]
	[46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2392 drivers/usb/host/xhci.c:2758) xhci_hcd
	[46711.125701] Code: 24 30 b8 47 06 00 00 0f 45 c2 48 c7 c2 08 67 45 c0 89 44 24 08 0f 94 c0 0f b6 c0 8d 44 40 01 89 44 24 10 48 8b 85 90 11 00 00 <8b> 48 08 83 c1 01 48 83 bd a0 11 00 00 00 0f 84 4e 03 00 00 e8 3b
	All code
	========
	   0:	24 30                	and    $0x30,%al
	   2:	b8 47 06 00 00       	mov    $0x647,%eax
	   7:	0f 45 c2             	cmovne %edx,%eax
	   a:	48 c7 c2 08 67 45 c0 	mov    $0xffffffffc0456708,%rdx
	  11:	89 44 24 08          	mov    %eax,0x8(%rsp)
	  15:	0f 94 c0             	sete   %al
	  18:	0f b6 c0             	movzbl %al,%eax
	  1b:	8d 44 40 01          	lea    0x1(%rax,%rax,2),%eax
	  1f:	89 44 24 10          	mov    %eax,0x10(%rsp)
	  23:	48 8b 85 90 11 00 00 	mov    0x1190(%rbp),%rax
	  2a:*	8b 48 08             	mov    0x8(%rax),%ecx		<-- trapping instruction
	  2d:	83 c1 01             	add    $0x1,%ecx
	  30:	48 83 bd a0 11 00 00 	cmpq   $0x0,0x11a0(%rbp)
	  37:	00 
	  38:	0f 84 4e 03 00 00    	je     0x38c
	  3e:	e8                   	.byte 0xe8
	  3f:	3b                   	.byte 0x3b
	
	Code starting with the faulting instruction
	===========================================
	   0:	8b 48 08             	mov    0x8(%rax),%ecx
	   3:	83 c1 01             	add    $0x1,%ecx
	   6:	48 83 bd a0 11 00 00 	cmpq   $0x0,0x11a0(%rbp)
	   d:	00 
	   e:	0f 84 4e 03 00 00    	je     0x362
	  14:	e8                   	.byte 0xe8
	  15:	3b                   	.byte 0x3b
	[46711.125705] RSP: 0018:ffffbb3d88c4f938 EFLAGS: 00010097
	[46711.125709] RAX: 0000000000000000 RBX: ffffbb3d80187000 RCX: 000000000000001f
	[46711.125712] RDX: ffffffffc0456708 RSI: ffffffffc040c810 RDI: ffff96b1813f3250
	[46711.125715] RBP: ffff96b23c08a000 R08: ffff96b23c08a020 R09: 0000000000000000
	[46711.125718] R10: ffff96b08884c5c0 R11: 0000000000000001 R12: ffffbb3d88c4f970
	[46711.125720] R13: ffff96b1813f3250 R14: 0000000000000000 R15: 000000000000001f
	[46711.125723] FS:  0000000000000000(0000) GS:ffff96b296c80000(0000) knlGS:0000000000000000
	[46711.125727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[46711.125730] CR2: 0000000000000008 CR3: 0000000030c20002 CR4: 00000000001706f0
	[46711.125733] Call Trace:
	[46711.125736]  <TASK>
	[46711.125739] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) 
	[46711.125747] ? page_fault_oops (arch/x86/mm/fault.c:715 (discriminator 1)) 
	[46711.125754] ? xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2392 drivers/usb/host/xhci.c:2758) xhci_hcd
	[46711.125781] ? search_bpf_extables (kernel/bpf/core.c:799) 
	[46711.125790] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) 
	[46711.125796] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) 
	[46711.125803] ? __pfx_trace_xhci_dbg_quirks (drivers/usb/host/xhci-trace.h:48) xhci_hcd
	[46711.125830] ? xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2392 drivers/usb/host/xhci.c:2758) xhci_hcd
	[46711.125857] ? xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2750) xhci_hcd
	[46711.125885] ? update_load_avg (kernel/sched/fair.c:4410 kernel/sched/fair.c:4747) 
	[46711.125891] ? local_clock (./arch/x86/include/asm/preempt.h:94 (discriminator 1) kernel/sched/clock.c:316 (discriminator 1)) 
	[46711.125897] ? metadata_update_state (mm/kfence/core.c:298 (discriminator 1)) 
	[46711.125911] xhci_configure_endpoint (drivers/usb/host/xhci.c:2840 (discriminator 1)) xhci_hcd
	[46711.125940] xhci_endpoint_reset (drivers/usb/host/xhci.c:1525 drivers/usb/host/xhci.c:3144) xhci_hcd
	[46711.125969] ? hub_port_init (drivers/usb/core/hub.c:5182) usbcore
	[46711.126004] ? preempt_count_add (./include/linux/ftrace.h:975 kernel/sched/core.c:5850 kernel/sched/core.c:5847 kernel/sched/core.c:5875) 
	[46711.126011] usb_enable_endpoint (drivers/usb/core/message.c:1461) usbcore
	[46711.126052] hub_event (drivers/usb/core/hub.c:5548 drivers/usb/core/hub.c:5661 drivers/usb/core/hub.c:5821 drivers/usb/core/hub.c:5903) usbcore
	[46711.126090] ? __mod_timer (kernel/time/timer.c:1189) 
	[46711.126096] process_one_work (kernel/workqueue.c:3257) 
	[46711.126104] worker_thread (kernel/workqueue.c:3327 (discriminator 2) kernel/workqueue.c:3413 (discriminator 2)) 
	[46711.126110] ? _raw_spin_lock_irqsave (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:111 (discriminator 4) kernel/locking/spinlock.c:162 (discriminator 4)) 
	[46711.126115] ? __pfx_worker_thread (kernel/workqueue.c:3360) 
	[46711.126121] kthread (kernel/kthread.c:389) 
	[46711.126126] ? __pfx_kthread (kernel/kthread.c:342) 
	[46711.126130] ret_from_fork (arch/x86/kernel/process.c:153) 
	[46711.126135] ? __pfx_kthread (kernel/kthread.c:342) 
	[46711.126139] ret_from_fork_asm (arch/x86/entry/entry_64.S:257) 
	[46711.126146]  </TASK>
	[46711.126148] Modules linked in: cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet tls cfg80211 8021q garp mrp stp llc ip6table_filter ip6_tables xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables nls_iso8859_1 nls_cp437 vfat fat raid1 intel_rapl_msr mei_hdcp mei_pxp intel_rapl_common md_mod x86_pkg_temp_thermal intel_powerclamp at24 snd_hda_codec_via iTCO_wdt intel_pmc_bxt coretemp snd_hda_codec_generic iTCO_vendor_support snd_hda_codec_hdmi rapl intel_cstate snd_hda_intel intel_uncore snd_intel_dspcfg psmouse snd_intel_sdw_acpi pcspkr snd_hda_codec mei_me joydev i2c_i801 evdev mei input_leds snd_hda_core i2c_smbus mac_hid snd_hwdep alx i2c_mux snd_pcm lpc_ich mdio thermal fan tiny_power_button button sg snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap hci_vhci bluetooth rfkill vfio_iommu_type1 vfio iommufd uhid uinput userio ppp_generic slhc tun loop nvram btrfs blake2b_generic xor ...
	[46711.126249]  libcrc32c cuse fuse ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt encrypted_keys trusted asn1_encoder tee tpm rng_core libaescfb ecdh_generic dm_mod ecc amdgpu amdxcp drm_exec gpu_sched drm_buddy radeon drm_ttm_helper ttm agpgart hid_generic i2c_algo_bit sr_mod sd_mod drm_suballoc_helper cdrom drm_display_helper ata_generic usbhid pata_acpi crct10dif_pclmul uas crc32_pclmul hid usb_storage crc32c_intel drm_kms_helper polyval_clmulni xhci_pci polyval_generic gf128mul ghash_clmulni_intel ata_piix xhci_pci_renesas drm libata sha512_ssse3 sha256_ssse3 xhci_hcd ehci_pci sha1_ssse3 aesni_intel ehci_hcd crypto_simd cryptd scsi_mod serio_raw usbcore scsi_common usb_common video wmi
	[46711.126321] CR2: 0000000000000008
	[46711.126325] ---[ end trace 0000000000000000 ]---
	[46711.126328] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2392 drivers/usb/host/xhci.c:2758) xhci_hcd
	[46711.126357] Code: 24 30 b8 47 06 00 00 0f 45 c2 48 c7 c2 08 67 45 c0 89 44 24 08 0f 94 c0 0f b6 c0 8d 44 40 01 89 44 24 10 48 8b 85 90 11 00 00 <8b> 48 08 83 c1 01 48 83 bd a0 11 00 00 00 0f 84 4e 03 00 00 e8 3b
	All code
	========
	   0:	24 30                	and    $0x30,%al
	   2:	b8 47 06 00 00       	mov    $0x647,%eax
	   7:	0f 45 c2             	cmovne %edx,%eax
	   a:	48 c7 c2 08 67 45 c0 	mov    $0xffffffffc0456708,%rdx
	  11:	89 44 24 08          	mov    %eax,0x8(%rsp)
	  15:	0f 94 c0             	sete   %al
	  18:	0f b6 c0             	movzbl %al,%eax
	  1b:	8d 44 40 01          	lea    0x1(%rax,%rax,2),%eax
	  1f:	89 44 24 10          	mov    %eax,0x10(%rsp)
	  23:	48 8b 85 90 11 00 00 	mov    0x1190(%rbp),%rax
	  2a:*	8b 48 08             	mov    0x8(%rax),%ecx		<-- trapping instruction
	  2d:	83 c1 01             	add    $0x1,%ecx
	  30:	48 83 bd a0 11 00 00 	cmpq   $0x0,0x11a0(%rbp)
	  37:	00 
	  38:	0f 84 4e 03 00 00    	je     0x38c
	  3e:	e8                   	.byte 0xe8
	  3f:	3b                   	.byte 0x3b
	
	Code starting with the faulting instruction
	===========================================
	   0:	8b 48 08             	mov    0x8(%rax),%ecx
	   3:	83 c1 01             	add    $0x1,%ecx
	   6:	48 83 bd a0 11 00 00 	cmpq   $0x0,0x11a0(%rbp)
	   d:	00 
	   e:	0f 84 4e 03 00 00    	je     0x362
	  14:	e8                   	.byte 0xe8
	  15:	3b                   	.byte 0x3b
	[46711.126360] RSP: 0018:ffffbb3d88c4f938 EFLAGS: 00010097
	[46711.126364] RAX: 0000000000000000 RBX: ffffbb3d80187000 RCX: 000000000000001f
	[46711.126367] RDX: ffffffffc0456708 RSI: ffffffffc040c810 RDI: ffff96b1813f3250
	[46711.126369] RBP: ffff96b23c08a000 R08: ffff96b23c08a020 R09: 0000000000000000
	[46711.126372] R10: ffff96b08884c5c0 R11: 0000000000000001 R12: ffffbb3d88c4f970
	[46711.126374] R13: ffff96b1813f3250 R14: 0000000000000000 R15: 000000000000001f
	[46711.126377] FS:  0000000000000000(0000) GS:ffff96b296c80000(0000) knlGS:0000000000000000
	[46711.126380] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[46711.126383] CR2: 0000000000000008 CR3: 0000000030c20002 CR4: 00000000001706f0
	[46711.126386] note: kworker/1:2[25760] exited with irqs disabled
	[46711.126388] note: kworker/1:2[25760] exited with preempt_count 1

This second crash ocurred upon plugging in an USB stick into one of the
front ports of my machine. While doing so, I also had my phone connected
to the computer (apart from the usual peripherals such as the keyboard),
the phone uses the NCM gadget for a interfacing the computer, as can be
seen from the above. There seems to have been some error with the port
getting disabled some time prior to the crash, causing disconnection of
the phone, I don't know if this is related and relevant. Immediately
preceeding the crash (with the stick already attached) however there are
some device descriptor read errors.

I am running Linux 6.10.3 (stable) as packaged by my distribution Void
Linux. The build configuration can thus be found here [1].

I have not been able to reproduce the bug at will, hence I have not
attempted to narrow down in which version of the kernel the problem
might have been introduced, I have however never encountered it before.
I hope the report will be at least of some use though, please let me
know if I should provide some additional information.

[1] https://raw.githubusercontent.com/void-linux/void-packages/e0334d3395b2/srcpkgs/linux6.10/files/x86_64-dotconfig

Kind regards,
K. B.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] USB xHCI driver NULL pointer dereference
  2024-08-10 22:11 [BUG] USB xHCI driver NULL pointer dereference Karel Balej
@ 2024-08-13 11:49 ` Mathias Nyman
  2024-08-14 13:28   ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Mathias Nyman @ 2024-08-13 11:49 UTC (permalink / raw)
  To: Karel Balej, linux-usb

On 11.8.2024 1.11, Karel Balej wrote:
> Hello,
> 
> my machine crashed twice in the past week, the second time I have been
> able to recover the log output (including the stack trace run through
> scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI
> driver:
> 
> 	[44193.556677] usb 2-1-port5: disabled by hub (EMI?), re-enabling...
> 	[44193.556692] usb 2-1.5: USB disconnect, device number 6
> 	[44193.558532] cdc_ncm 2-1.5:1.0 enp0s29u1u5: unregister 'cdc_ncm' usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP)
> 	[44193.739545] usb 2-1.5: new high-speed USB device number 7 using ehci-pci
> 	[44193.819628] usb 2-1.5: New USB device found, idVendor=18d1, idProduct=d001, bcdDevice= 6.10
> 	[44193.819637] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> 	[44193.819641] usb 2-1.5: Product: Samsung Galaxy Core Prime VE LTE
> 	[44193.819644] usb 2-1.5: Manufacturer: Samsung
> 	[44193.819646] usb 2-1.5: SerialNumber: postmarketOS
> 	[44193.842472] cdc_ncm 2-1.5:1.0: MAC-Address: [...]
> 	[44193.842770] cdc_ncm 2-1.5:1.0 usb0: register 'cdc_ncm' at usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP), [...]
> 	[44193.845829] cdc_ncm 2-1.5:1.0 enp0s29u1u5: renamed from usb0
> 	[46253.017991] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
> 	[46709.344533] usb 3-1: new full-speed USB device number 3 using xhci_hcd
> 	[46709.458560] usb 3-1: device descriptor read/64, error -71
> 	[46709.679562] usb 3-1: device descriptor read/64, error -71
> 	[46709.895544] usb 3-1: new full-speed USB device number 4 using xhci_hcd
> 	[46710.009563] usb 3-1: device descriptor read/64, error -71
> 	[46710.231579] usb 3-1: device descriptor read/64, error -71
> 	[46710.333629] usb usb3-port1: attempt power cycle
> 	[46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd
> 	[46710.713699] usb 3-1: Device not responding to setup address.
> 	[46710.917684] usb 3-1: Device not responding to setup address.
> 	[46711.125536] usb 3-1: device not accepting address 5, error -71
> 	[46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008
> 	[46711.125600] #PF: supervisor read access in kernel mode
> 	[46711.125603] #PF: error_code(0x0000) - not-present page
> 	[46711.125606] PGD 0 P4D 0
> 	[46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> 	[46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1
> 	[46711.125620] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77-D3H, BIOS F18 08/21/2012
> 	[46711.125623] Workqueue: usb_hub_wq hub_event [usbcore]
> 	[46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2392 drivers/usb/host/xhci.c:2758) xhci_hcd

Thanks for the report.

You have a unlucky setup here.
This could only happen when a full speed device fails enumeration while connected to a
Pantherpoint xHC.

Only Pantherpoint xHC (PCI_ID 0x1e31) does bandwidth calculation in software and
calls xhci_reserve_bandwidth(). In this case we unintentionally end up calling it
after a failed  address device attempt when usb core re-inits endpoint 0 before retry.
At this point the xhci side of the device isn't properly allocated or set up so
we hit a NULL pointer dereference.

I'll look into it more.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] USB xHCI driver NULL pointer dereference
  2024-08-13 11:49 ` Mathias Nyman
@ 2024-08-14 13:28   ` Mathias Nyman
  2024-08-15 13:10     ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Mathias Nyman @ 2024-08-14 13:28 UTC (permalink / raw)
  To: Karel Balej, linux-usb

On 13.8.2024 14.49, Mathias Nyman wrote:
> On 11.8.2024 1.11, Karel Balej wrote:
>> Hello,
>>
>> my machine crashed twice in the past week, the second time I have been
>> able to recover the log output (including the stack trace run through
>> scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI
>> driver:
>>
>>     [44193.556677] usb 2-1-port5: disabled by hub (EMI?), re-enabling...
>>     [44193.556692] usb 2-1.5: USB disconnect, device number 6
>>     [44193.558532] cdc_ncm 2-1.5:1.0 enp0s29u1u5: unregister 'cdc_ncm' usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP)
>>     [44193.739545] usb 2-1.5: new high-speed USB device number 7 using ehci-pci
>>     [44193.819628] usb 2-1.5: New USB device found, idVendor=18d1, idProduct=d001, bcdDevice= 6.10
>>     [44193.819637] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
>>     [44193.819641] usb 2-1.5: Product: Samsung Galaxy Core Prime VE LTE
>>     [44193.819644] usb 2-1.5: Manufacturer: Samsung
>>     [44193.819646] usb 2-1.5: SerialNumber: postmarketOS
>>     [44193.842472] cdc_ncm 2-1.5:1.0: MAC-Address: [...]
>>     [44193.842770] cdc_ncm 2-1.5:1.0 usb0: register 'cdc_ncm' at usb-0000:00:1d.0-1.5, CDC NCM (NO ZLP), [...]
>>     [44193.845829] cdc_ncm 2-1.5:1.0 enp0s29u1u5: renamed from usb0
>>     [46253.017991] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
>>     [46709.344533] usb 3-1: new full-speed USB device number 3 using xhci_hcd
>>     [46709.458560] usb 3-1: device descriptor read/64, error -71
>>     [46709.679562] usb 3-1: device descriptor read/64, error -71
>>     [46709.895544] usb 3-1: new full-speed USB device number 4 using xhci_hcd
>>     [46710.009563] usb 3-1: device descriptor read/64, error -71
>>     [46710.231579] usb 3-1: device descriptor read/64, error -71
>>     [46710.333629] usb usb3-port1: attempt power cycle
>>     [46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd
>>     [46710.713699] usb 3-1: Device not responding to setup address.
>>     [46710.917684] usb 3-1: Device not responding to setup address.
>>     [46711.125536] usb 3-1: device not accepting address 5, error -71
>>     [46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>     [46711.125600] #PF: supervisor read access in kernel mode
>>     [46711.125603] #PF: error_code(0x0000) - not-present page
>>     [46711.125606] PGD 0 P4D 0
>>     [46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>>     [46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1
>>     [46711.125620] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77-D3H, BIOS F18 08/21/2012
>>     [46711.125623] Workqueue: usb_hub_wq hub_event [usbcore]
>>     [46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c:2392 drivers/usb/host/xhci.c:2758) xhci_hcd
> 
> Thanks for the report.
> 
> You have a unlucky setup here.
> This could only happen when a full speed device fails enumeration while connected to a
> Pantherpoint xHC.
> 
> Only Pantherpoint xHC (PCI_ID 0x1e31) does bandwidth calculation in software and
> calls xhci_reserve_bandwidth(). In this case we unintentionally end up calling it
> after a failed  address device attempt when usb core re-inits endpoint 0 before retry.
> At this point the xhci side of the device isn't properly allocated or set up so
> we hit a NULL pointer dereference.
> 
> I'll look into it more.

The following code should resolve this issue, any chance you could try it out?

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 9a8627e42898..a69245074395 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2837,7 +2837,7 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci,
                                 xhci->num_active_eps);
                 return -ENOMEM;
         }
-       if ((xhci->quirks & XHCI_SW_BW_CHECKING) &&
+       if ((xhci->quirks & XHCI_SW_BW_CHECKING) && !ctx_change &&
             xhci_reserve_bandwidth(xhci, virt_dev, command->in_ctx)) {
                 if ((xhci->quirks & XHCI_EP_LIMIT_QUIRK))
                         xhci_free_host_resources(xhci, ctrl_ctx);
@@ -4200,8 +4200,10 @@ static int xhci_setup_device(struct usb_hcd *hcd, struct usb_device *udev,
                 mutex_unlock(&xhci->mutex);
                 ret = xhci_disable_slot(xhci, udev->slot_id);
                 xhci_free_virt_device(xhci, udev->slot_id);
-               if (!ret)
-                       xhci_alloc_dev(hcd, udev);
+               if (!ret) {
+                       if (xhci_alloc_dev(hcd, udev) == 1)
+                               xhci_setup_addressable_virt_dev(xhci, udev);
+               }
                 kfree(command->completion);
                 kfree(command);
                 return -EPROTO;

Thanks
Mathias

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [BUG] USB xHCI driver NULL pointer dereference
  2024-08-14 13:28   ` Mathias Nyman
@ 2024-08-15 13:10     ` Mathias Nyman
  2024-08-16  7:35       ` Karel Balej
  0 siblings, 1 reply; 5+ messages in thread
From: Mathias Nyman @ 2024-08-15 13:10 UTC (permalink / raw)
  To: Karel Balej, linux-usb

On 14.8.2024 16.28, Mathias Nyman wrote:
> On 13.8.2024 14.49, Mathias Nyman wrote:
>> On 11.8.2024 1.11, Karel Balej wrote:
>>> Hello,
>>>
>>> my machine crashed twice in the past week, the second time I have been
>>> able to recover the log output (including the stack trace run through
>>> scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI
>>> driver:
>
>>
>> You have a unlucky setup here.
>> This could only happen when a full speed device fails enumeration while connected to a
>> Pantherpoint xHC.
>>
>> Only Pantherpoint xHC (PCI_ID 0x1e31) does bandwidth calculation in software and
>> calls xhci_reserve_bandwidth(). In this case we unintentionally end up calling it
>> after a failed  address device attempt when usb core re-inits endpoint 0 before retry.
>> At this point the xhci side of the device isn't properly allocated or set up so
>> we hit a NULL pointer dereference.
>>
>> I'll look into it more.
> 
> The following code should resolve this issue, any chance you could try it out?

I was able to trigger this myself by forcing XHCI_SW_BW_CHECKING and faking failure on
address device command:

[  270.538134] usb 3-6: new full-speed USB device number 3 using xhci_hcd
[  270.670313] xhci_hcd 0000:00:14.0: Faking a Device not respoinding to setup address
[  270.886142] usb 3-6: device not accepting address 3, error -71
[  270.892091] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  270.899034] #PF: supervisor read access in kernel mode
[  270.904150] #PF: error_code(0x0000) - not-present page
[  270.909267] PGD 0 P4D 0
[  270.911799] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[  270.916660] CPU: 3 UID: 0 PID: 301 Comm: kworker/3:2 Tainted: G        W          6.11.0-rc1+ #4291
[  270.925651] Tainted: [W]=WARN
[  270.928615] Workqueue: usb_hub_wq hub_event
[  270.932787] RIP: 0010:xhci_reserve_bandwidth+0x243/0x6d0 [xhci_hcd]

The codesnippet I suggested did fix the null pointer dereference.

I'll turn it into a proper patch

Thanks
Mathias


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] USB xHCI driver NULL pointer dereference
  2024-08-15 13:10     ` Mathias Nyman
@ 2024-08-16  7:35       ` Karel Balej
  0 siblings, 0 replies; 5+ messages in thread
From: Karel Balej @ 2024-08-16  7:35 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

Mathias Nyman, 2024-08-15T16:10:32+03:00:
> On 14.8.2024 16.28, Mathias Nyman wrote:
> > On 13.8.2024 14.49, Mathias Nyman wrote:
> >> On 11.8.2024 1.11, Karel Balej wrote:
> >>> Hello,
> >>>
> >>> my machine crashed twice in the past week, the second time I have been
> >>> able to recover the log output (including the stack trace run through
> >>> scripts/decode_stacktrace.sh) which seems to suggest a bug in the xHCI
> >>> driver:
> >
> >>
> >> You have a unlucky setup here.
> >> This could only happen when a full speed device fails enumeration while connected to a
> >> Pantherpoint xHC.
> >>
> >> Only Pantherpoint xHC (PCI_ID 0x1e31) does bandwidth calculation in software and
> >> calls xhci_reserve_bandwidth(). In this case we unintentionally end up calling it
> >> after a failed  address device attempt when usb core re-inits endpoint 0 before retry.
> >> At this point the xhci side of the device isn't properly allocated or set up so
> >> we hit a NULL pointer dereference.
> >>
> >> I'll look into it more.
> > 
> > The following code should resolve this issue, any chance you could try it out?
>
> I was able to trigger this myself by forcing XHCI_SW_BW_CHECKING and faking failure on
> address device command:
>
> [  270.538134] usb 3-6: new full-speed USB device number 3 using xhci_hcd
> [  270.670313] xhci_hcd 0000:00:14.0: Faking a Device not respoinding to setup address
> [  270.886142] usb 3-6: device not accepting address 3, error -71
> [  270.892091] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  270.899034] #PF: supervisor read access in kernel mode
> [  270.904150] #PF: error_code(0x0000) - not-present page
> [  270.909267] PGD 0 P4D 0
> [  270.911799] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  270.916660] CPU: 3 UID: 0 PID: 301 Comm: kworker/3:2 Tainted: G        W          6.11.0-rc1+ #4291
> [  270.925651] Tainted: [W]=WARN
> [  270.928615] Workqueue: usb_hub_wq hub_event
> [  270.932787] RIP: 0010:xhci_reserve_bandwidth+0x243/0x6d0 [xhci_hcd]
>
> The codesnippet I suggested did fix the null pointer dereference.
>
> I'll turn it into a proper patch

It seems that I'm too late with a Tested-by tag but for what it's worth,
I have been running the machine with your patch the whole day yesterday
and didn't observe any regression. I have not been able to verify if it
fixed the issue as I haven't found a way to deliberately trigger it, but
it seems that you were able to do that.

Thank you very much for looking into this.

Kind regards,
K. B.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-08-16  7:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-10 22:11 [BUG] USB xHCI driver NULL pointer dereference Karel Balej
2024-08-13 11:49 ` Mathias Nyman
2024-08-14 13:28   ` Mathias Nyman
2024-08-15 13:10     ` Mathias Nyman
2024-08-16  7:35       ` Karel Balej

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).