linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
@ 2025-08-29 18:13 David Wang
  2025-08-30  9:48 ` Michał Pecio
  0 siblings, 1 reply; 13+ messages in thread
From: David Wang @ 2025-08-29 18:13 UTC (permalink / raw)
  To: WeitaoWang-oc, mathias.nyman, gregkh
  Cc: linux-usb, regressions, linux-kernel, surenb, kent.overstreet

Hi,

I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:

Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:

       40960        5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
        1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
         320       10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
        1920       15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
         112       12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
        1792       28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59

But with rc3, the memory usage increase after each suspend/resume cycle: 

#1:
       49152        6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
        1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
         384       12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
        2176       17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
         128       14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
        2048       32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
#2:
       57344        7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
        1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
         448       14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
        2432       19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
         144       16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
        2304       36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
#3:
       65536        8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
        1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
         512       16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
        2688       21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
         160       18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
        2560       40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103

The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
sure whether those memory would be released sometime later.

And in kernel log, two lines of error always showed up after suspend/resume:

	[  295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
	[  295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy

And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
"usb: xhci: Fix slot_id resource race conflict"):

	git bisect start 'drivers/usb'
	# status: waiting for both good and bad commits
	# good: [c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9] Linux 6.17-rc2
	git bisect good c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9
	# status: waiting for bad commit, 1 good commit known
	# bad: [1b237f190eb3d36f52dffe07a40b5eb210280e00] Linux 6.17-rc3
	git bisect bad 1b237f190eb3d36f52dffe07a40b5eb210280e00
	# good: [e86ba12cf84ab9cf42fbc2382235fa7ba616e18b] Merge tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
	git bisect good e86ba12cf84ab9cf42fbc2382235fa7ba616e18b
	# good: [471b25a2fcbb25dccd7c9bece30313f2440a554e] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
	git bisect good 471b25a2fcbb25dccd7c9bece30313f2440a554e
	# good: [52025b8fc992972168128be40bffee7eafa532b5] Merge tag 'driver-core-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
	git bisect good 52025b8fc992972168128be40bffee7eafa532b5
	# bad: [8004d08330e1aa7ae797778509e864f7ac3da687] Merge tag 'usb-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
	git bisect bad 8004d08330e1aa7ae797778509e864f7ac3da687
	# good: [a381c6d6f646226924809d0ad01a9465786da463] usb: typec: maxim_contaminant: re-enable cc toggle if cc is open and port is clean
	git bisect good a381c6d6f646226924809d0ad01a9465786da463
	# good: [4013aef2ced9b756a410f50d12df9ebe6a883e4a] ftrace: Fix potential warning in trace_printk_seq during ftrace_dump
	git bisect good 4013aef2ced9b756a410f50d12df9ebe6a883e4a
	# bad: [2eb03376151bb8585caa23ed2673583107bb5193] usb: xhci: Fix slot_id resource race conflict
	git bisect bad 2eb03376151bb8585caa23ed2673583107bb5193
	# good: [309b6341d5570fb2b41b923de2fc9bb147106b80] usb: typec: fusb302: Revert incorrect threaded irq fix
	git bisect good 309b6341d5570fb2b41b923de2fc9bb147106b80
	# first bad commit: [2eb03376151bb8585caa23ed2673583107bb5193] usb: xhci: Fix slot_id resource race conflict



Thanks
David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
  2025-08-29 18:13 [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle David Wang
@ 2025-08-30  9:48 ` Michał Pecio
  2025-08-30 10:06   ` David Wang
  2025-08-30 10:17   ` David Wang
  0 siblings, 2 replies; 13+ messages in thread
From: Michał Pecio @ 2025-08-30  9:48 UTC (permalink / raw)
  To: David Wang
  Cc: WeitaoWang-oc, mathias.nyman, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet

On Sat, 30 Aug 2025 02:13:54 +0800, David Wang wrote:
> Hi,
>
> I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
> After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
> 
> Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
> 
>        40960        5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>          320       10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
>         1920       15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
>          112       12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
>         1792       28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
> 
> But with rc3, the memory usage increase after each suspend/resume cycle: 
> 
> #1:
>        49152        6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>          384       12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
>         2176       17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
>          128       14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
>         2048       32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
> #2:
>        57344        7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
>          448       14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
>         2432       19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
>          144       16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
>         2304       36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
> #3:
>        65536        8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
>          512       16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
>         2688       21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
>          160       18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
>         2560       40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
> 
> The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
> sure whether those memory would be released sometime later.
> 
> And in kernel log, two lines of error always showed up after suspend/resume:
> 
> 	[  295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
> 	[  295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy

Hi,

Good work, looks like suspend/resume is a little understested corner
of this driver.

Did you check whether the same leak occurs if you simply disconnect
a device or if it's truly unique to suspend?

> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
> "usb: xhci: Fix slot_id resource race conflict"):

I see a trivial bug which everyone (myself included tbh) missed before.
Does this help?

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index f11e13f9cdb4..f294032c2ad7 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
  */
 static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
 {
-	struct xhci_virt_device *vdev;
+	struct xhci_virt_device *vdev, *tmp_vdev;
 	struct list_head *tt_list_head;
 	struct xhci_tt_bw_info *tt_info, *next;
 	int i;
@@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
 		if (tt_info->slot_id == slot_id) {
 			/* are any devices using this tt_info? */
 			for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
-				vdev = xhci->devs[i];
-				if (vdev && (vdev->tt_info == tt_info))
+				tmp_vdev = xhci->devs[i];
+				if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
 					xhci_free_virt_devices_depth_first(
 						xhci, i);
 			}



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
  2025-08-30  9:48 ` Michał Pecio
@ 2025-08-30 10:06   ` David Wang
  2025-08-30 10:17   ` David Wang
  1 sibling, 0 replies; 13+ messages in thread
From: David Wang @ 2025-08-30 10:06 UTC (permalink / raw)
  To: Michał Pecio
  Cc: WeitaoWang-oc, mathias.nyman, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet



At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>On Sat, 30 Aug 2025 02:13:54 +0800, David Wang wrote:
>> Hi,
>>
>> I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
>> After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
>> 
>> Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
>> 
>>        40960        5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>>          320       10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
>>         1920       15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
>>          112       12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
>>         1792       28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
>> 
>> But with rc3, the memory usage increase after each suspend/resume cycle: 
>> 
>> #1:
>>        49152        6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>>          384       12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
>>         2176       17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
>>          128       14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
>>         2048       32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
>> #2:
>>        57344        7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
>>          448       14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
>>         2432       19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
>>          144       16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
>>         2304       36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
>> #3:
>>        65536        8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
>>          512       16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
>>         2688       21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
>>          160       18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
>>         2560       40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
>> 
>> The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
>> sure whether those memory would be released sometime later.
>> 
>> And in kernel log, two lines of error always showed up after suspend/resume:
>> 
>> 	[  295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
>> 	[  295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy
>
>Hi,
>
>Good work, looks like suspend/resume is a little understested corner
>of this driver.
>
>Did you check whether the same leak occurs if you simply disconnect
>a device or if it's truly unique to suspend?
>
>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>> "usb: xhci: Fix slot_id resource race conflict"):
>
>I see a trivial bug which everyone (myself included tbh) missed before.
>Does this help?
>
>diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>index f11e13f9cdb4..f294032c2ad7 100644
>--- a/drivers/usb/host/xhci-mem.c
>+++ b/drivers/usb/host/xhci-mem.c
>@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
>  */
> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
> {
>-	struct xhci_virt_device *vdev;
>+	struct xhci_virt_device *vdev, *tmp_vdev;
> 	struct list_head *tt_list_head;
> 	struct xhci_tt_bw_info *tt_info, *next;
> 	int i;
>@@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> 		if (tt_info->slot_id == slot_id) {
> 			/* are any devices using this tt_info? */
> 			for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>-				vdev = xhci->devs[i];
>-				if (vdev && (vdev->tt_info == tt_info))
>+				tmp_vdev = xhci->devs[i];
>+				if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
> 					xhci_free_virt_devices_depth_first(
> 						xhci, i);
> 			}


I notice this too,  just a few minutes ago, I just started building a patch with this *silly* bug fixed.  
(The device pointer is wrong, that is most likely the culprit: the virtual device is not properly freed and hence memory leak.)
Will update later.


David
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
  2025-08-30  9:48 ` Michał Pecio
  2025-08-30 10:06   ` David Wang
@ 2025-08-30 10:17   ` David Wang
  2025-09-01 10:14     ` Mathias Nyman
  1 sibling, 1 reply; 13+ messages in thread
From: David Wang @ 2025-08-30 10:17 UTC (permalink / raw)
  To: Michał Pecio
  Cc: WeitaoWang-oc, mathias.nyman, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet


At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>On Sat, 30 Aug 2025 02:13:54 +0800, David Wang wrote:
>> Hi,
>>
>> I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
>> After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
>> 
>> Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
>> 
>>        40960        5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>>          320       10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
>>         1920       15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
>>          112       12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
>>         1792       28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
>> 
>> But with rc3, the memory usage increase after each suspend/resume cycle: 
>> 
>> #1:
>>        49152        6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>>          384       12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
>>         2176       17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
>>          128       14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
>>         2048       32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
>> #2:
>>        57344        7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
>>          448       14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
>>         2432       19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
>>          144       16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
>>         2304       36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
>> #3:
>>        65536        8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
>>         1024        1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
>>          512       16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
>>         2688       21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
>>          160       18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
>>         2560       40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
>> 
>> The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
>> sure whether those memory would be released sometime later.
>> 
>> And in kernel log, two lines of error always showed up after suspend/resume:
>> 
>> 	[  295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
>> 	[  295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy
>
>Hi,
>
>Good work, looks like suspend/resume is a little understested corner
>of this driver.
>
>Did you check whether the same leak occurs if you simply disconnect
>a device or if it's truly unique to suspend?
>
>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>> "usb: xhci: Fix slot_id resource race conflict"):
>
>I see a trivial bug which everyone (myself included tbh) missed before.
>Does this help?
>
>diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>index f11e13f9cdb4..f294032c2ad7 100644
>--- a/drivers/usb/host/xhci-mem.c
>+++ b/drivers/usb/host/xhci-mem.c
>@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
>  */
> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
> {
>-	struct xhci_virt_device *vdev;
>+	struct xhci_virt_device *vdev, *tmp_vdev;
> 	struct list_head *tt_list_head;
> 	struct xhci_tt_bw_info *tt_info, *next;
> 	int i;
>@@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> 		if (tt_info->slot_id == slot_id) {
> 			/* are any devices using this tt_info? */
> 			for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>-				vdev = xhci->devs[i];
>-				if (vdev && (vdev->tt_info == tt_info))
>+				tmp_vdev = xhci->devs[i];
>+				if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
> 					xhci_free_virt_devices_depth_first(
> 						xhci, i);

I confirmed this *silly* code is the root cause of this memory leak.
And I would suggest simpler code changes (which is what I was testing):  


diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 81eaad87a3d9..c4a6544aa107 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -962,7 +962,7 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
 out:
        /* we are now at a leaf device */
        xhci_debugfs_remove_slot(xhci, slot_id);
-       xhci_free_virt_device(xhci, vdev, slot_id);
+       xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id);
 }
 
 int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,



Thanks
David

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
  2025-08-30 10:17   ` David Wang
@ 2025-09-01 10:14     ` Mathias Nyman
  2025-09-01 11:17       ` David Wang
  2025-09-02  7:30       ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
  0 siblings, 2 replies; 13+ messages in thread
From: Mathias Nyman @ 2025-09-01 10:14 UTC (permalink / raw)
  To: David Wang, Michał Pecio
  Cc: WeitaoWang-oc, gregkh, linux-usb, regressions, linux-kernel,
	surenb, kent.overstreet

On 30.8.2025 13.17, David Wang wrote:
> 
> At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>>
>> Hi,
>>
>> Good work, looks like suspend/resume is a little understested corner
>> of this driver.
>>
>> Did you check whether the same leak occurs if you simply disconnect
>> a device or if it's truly unique to suspend?
>>
>>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>>> "usb: xhci: Fix slot_id resource race conflict"):
>>
>> I see a trivial bug which everyone (myself included tbh) missed before.
>> Does this help?
>>
>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>> index f11e13f9cdb4..f294032c2ad7 100644
>> --- a/drivers/usb/host/xhci-mem.c
>> +++ b/drivers/usb/host/xhci-mem.c
>> @@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
>>   */
>> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
>> {
>> -	struct xhci_virt_device *vdev;
>> +	struct xhci_virt_device *vdev, *tmp_vdev;
>> 	struct list_head *tt_list_head;
>> 	struct xhci_tt_bw_info *tt_info, *next;
>> 	int i;
>> @@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
>> 		if (tt_info->slot_id == slot_id) {
>> 			/* are any devices using this tt_info? */
>> 			for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>> -				vdev = xhci->devs[i];
>> -				if (vdev && (vdev->tt_info == tt_info))
>> +				tmp_vdev = xhci->devs[i];
>> +				if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
>> 					xhci_free_virt_devices_depth_first(
>> 						xhci, i);
> 
> I confirmed this *silly* code is the root cause of this memory leak.
> And I would suggest simpler code changes (which is what I was testing):
> 
> 
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index 81eaad87a3d9..c4a6544aa107 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -962,7 +962,7 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
>   out:
>          /* we are now at a leaf device */
>          xhci_debugfs_remove_slot(xhci, slot_id);
> -       xhci_free_virt_device(xhci, vdev, slot_id);
> +       xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id);
>   }
>   
>   int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,
> 

Thanks to both for catching this

I can quickly turn this into a proper patch unless one of you would like to submit one?

Thanks
Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
  2025-09-01 10:14     ` Mathias Nyman
@ 2025-09-01 11:17       ` David Wang
  2025-09-02  7:30       ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
  1 sibling, 0 replies; 13+ messages in thread
From: David Wang @ 2025-09-01 11:17 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Michał Pecio, WeitaoWang-oc, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet


At 2025-09-01 18:14:32, "Mathias Nyman" <mathias.nyman@linux.intel.com> wrote:
>On 30.8.2025 13.17, David Wang wrote:
>> 
>> At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Good work, looks like suspend/resume is a little understested corner
>>> of this driver.
>>>
>>> Did you check whether the same leak occurs if you simply disconnect
>>> a device or if it's truly unique to suspend?
>>>
>>>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>>>> "usb: xhci: Fix slot_id resource race conflict"):
>>>
>>> I see a trivial bug which everyone (myself included tbh) missed before.
>>> Does this help?
>>>
>>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>>> index f11e13f9cdb4..f294032c2ad7 100644
>>> --- a/drivers/usb/host/xhci-mem.c
>>> +++ b/drivers/usb/host/xhci-mem.c
>>> @@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
>>>   */
>>> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
>>> {
>>> -	struct xhci_virt_device *vdev;
>>> +	struct xhci_virt_device *vdev, *tmp_vdev;
>>> 	struct list_head *tt_list_head;
>>> 	struct xhci_tt_bw_info *tt_info, *next;
>>> 	int i;
>>> @@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
>>> 		if (tt_info->slot_id == slot_id) {
>>> 			/* are any devices using this tt_info? */
>>> 			for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>>> -				vdev = xhci->devs[i];
>>> -				if (vdev && (vdev->tt_info == tt_info))
>>> +				tmp_vdev = xhci->devs[i];
>>> +				if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
>>> 					xhci_free_virt_devices_depth_first(
>>> 						xhci, i);
>> 
>> I confirmed this *silly* code is the root cause of this memory leak.
>> And I would suggest simpler code changes (which is what I was testing):
>> 
>> 
>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>> index 81eaad87a3d9..c4a6544aa107 100644
>> --- a/drivers/usb/host/xhci-mem.c
>> +++ b/drivers/usb/host/xhci-mem.c
>> @@ -962,7 +962,7 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
>>   out:
>>          /* we are now at a leaf device */
>>          xhci_debugfs_remove_slot(xhci, slot_id);
>> -       xhci_free_virt_device(xhci, vdev, slot_id);
>> +       xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id);
>>   }
>>   
>>   int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,
>> 
>
>Thanks to both for catching this
>
>I can quickly turn this into a proper patch unless one of you would like to submit one?

Oh, I was not planning to submit a patch at all, since Michał Pecio got the credit of publishing the first patch.

Thanks
David

>
>Thanks
>Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
  2025-09-01 10:14     ` Mathias Nyman
  2025-09-01 11:17       ` David Wang
@ 2025-09-02  7:30       ` Michal Pecio
  2025-09-02  8:30         ` David Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Pecio @ 2025-09-02  7:30 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: David Wang, WeitaoWang-oc, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet

Reusing 'vdev' for iteration caused a recent commit to malfunction
uexpectedly, resulting in a reported memory leak and potential UAF
if devices are freed in bad order. Using a second variable solves
this problem, and maybe others later.

HCS_MAX_SLOTS(xhci->hcs_params1) is the highest possible slot_id,
so change the iteration range to include it. Currently this doesn't
seem to cause problems because the only caller begins with freeing
the topmost slot_id, but it breaks documented functionality.

Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/linux-usb/20250829181354.4450-1-00107082@163.com/
Fixes: 2eb03376151b ("usb: xhci: Fix slot_id resource race conflict")
Fixes: ee8665e28e8d ("xhci: free xhci virtual devices with leaf nodes first")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-mem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index eed5926b200e..db7dc70c37e5 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
  */
 static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
 {
-	struct xhci_virt_device *vdev;
+	struct xhci_virt_device *vdev, *vdev_i;
 	struct list_head *tt_list_head;
 	struct xhci_tt_bw_info *tt_info, *next;
 	int i;
@@ -951,9 +951,9 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
 		/* is this a hub device that added a tt_info to the tts list */
 		if (tt_info->slot_id == slot_id) {
 			/* are any devices using this tt_info? */
-			for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
-				vdev = xhci->devs[i];
-				if (vdev && (vdev->tt_info == tt_info))
+			for (i = 1; i <= HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
+				vdev_i = xhci->devs[i];
+				if (vdev_i && (vdev_i->tt_info == tt_info))
 					xhci_free_virt_devices_depth_first(
 						xhci, i);
 			}
-- 
2.48.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re:[PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
  2025-09-02  7:30       ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
@ 2025-09-02  8:30         ` David Wang
  2025-09-02  8:46           ` [PATCH] " Michał Pecio
  0 siblings, 1 reply; 13+ messages in thread
From: David Wang @ 2025-09-02  8:30 UTC (permalink / raw)
  To: Michal Pecio
  Cc: Mathias Nyman, WeitaoWang-oc, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet

At 2025-09-02 15:30:17, "Michal Pecio" <michal.pecio@gmail.com> wrote:
>Reusing 'vdev' for iteration caused a recent commit to malfunction
>uexpectedly, resulting in a reported memory leak and potential UAF
>if devices are freed in bad order. Using a second variable solves
>this problem, and maybe others later.
>
>HCS_MAX_SLOTS(xhci->hcs_params1) is the highest possible slot_id,
>so change the iteration range to include it. Currently this doesn't
>seem to cause problems because the only caller begins with freeing
>the topmost slot_id, but it breaks documented functionality.
>
>Reported-by: David Wang <00107082@163.com>
>Closes: https://lore.kernel.org/linux-usb/20250829181354.4450-1-00107082@163.com/
>Fixes: 2eb03376151b ("usb: xhci: Fix slot_id resource race conflict")
>Fixes: ee8665e28e8d ("xhci: free xhci virtual devices with leaf nodes first")
>Cc: stable@vger.kernel.org
>Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
>---
> drivers/usb/host/xhci-mem.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>index eed5926b200e..db7dc70c37e5 100644
>--- a/drivers/usb/host/xhci-mem.c
>+++ b/drivers/usb/host/xhci-mem.c
>@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
>  */
> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
> {
>-	struct xhci_virt_device *vdev;
>+	struct xhci_virt_device *vdev, *vdev_i;
> 	struct list_head *tt_list_head;
> 	struct xhci_tt_bw_info *tt_info, *next;
> 	int i;
>@@ -951,9 +951,9 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> 		/* is this a hub device that added a tt_info to the tts list */
> 		if (tt_info->slot_id == slot_id) {
> 			/* are any devices using this tt_info? */
>-			for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>-				vdev = xhci->devs[i];
>-				if (vdev && (vdev->tt_info == tt_info))
>+			for (i = 1; i <= HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>+				vdev_i = xhci->devs[i];
>+				if (vdev_i && (vdev_i->tt_info == tt_info))
> 					xhci_free_virt_devices_depth_first(
> 						xhci, i);
> 			}
>-- 
>2.48.1

Tested-by: David Wang <00107082@163.com>


About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?

Thanks
David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
  2025-09-02  8:30         ` David Wang
@ 2025-09-02  8:46           ` Michał Pecio
  2025-09-02  9:07             ` Michał Pecio
  0 siblings, 1 reply; 13+ messages in thread
From: Michał Pecio @ 2025-09-02  8:46 UTC (permalink / raw)
  To: David Wang
  Cc: Mathias Nyman, WeitaoWang-oc, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet

On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?

This too, you would need to fiddle with devices (or connect enough
of them) to reach Slot ID 255 (probably the highest on most systems),
depending on the xHCI controller and its ID allocation policy.

But also as explained, this bug doesn't make things go boom just yet.

Except if combined with your bug in an obscure edge case:

1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
2. Another high speed hub has slot ID HCS_MAX_SLOTS.
3. We start with freeing the second hub.
4. The loop is entered and leaves vdev pointing at the first hub.
5. The first hub is freed instead of the second one.
6. Then its children are freed and UAF its tt_info.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
  2025-09-02  8:46           ` [PATCH] " Michał Pecio
@ 2025-09-02  9:07             ` Michał Pecio
  2025-09-02 10:13               ` Mathias Nyman
  0 siblings, 1 reply; 13+ messages in thread
From: Michał Pecio @ 2025-09-02  9:07 UTC (permalink / raw)
  To: David Wang
  Cc: Mathias Nyman, WeitaoWang-oc, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet

On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:
> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
> > About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?  
> 
> This too, you would need to fiddle with devices (or connect enough
> of them) to reach Slot ID 255 (probably the highest on most systems),
> depending on the xHCI controller and its ID allocation policy.

This made me wonder what those policies are. I'm too lazy for thorough
testing, but I plugged and unplugged the same device a few times.

Most HCs kept assigning ID 1, so they likely always pick the lowest.

My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
kept assigning sequentially increasing IDs, so I suppose I could pump
it up near the top, connect two high speed hubs and trigger this bug.

> But also as explained, this bug doesn't make things go boom just yet.
> 
> Except if combined with your bug in an obscure edge case:
> 
> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
> 3. We start with freeing the second hub.
> 4. The loop is entered and leaves vdev pointing at the first hub.
> 5. The first hub is freed instead of the second one.
> 6. Then its children are freed and UAF its tt_info.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
  2025-09-02  9:07             ` Michał Pecio
@ 2025-09-02 10:13               ` Mathias Nyman
  2025-09-02 10:55                 ` Michał Pecio
  0 siblings, 1 reply; 13+ messages in thread
From: Mathias Nyman @ 2025-09-02 10:13 UTC (permalink / raw)
  To: Michał Pecio, David Wang
  Cc: WeitaoWang-oc, gregkh, linux-usb, regressions, linux-kernel,
	surenb, kent.overstreet

On 2.9.2025 12.07, Michał Pecio wrote:
> On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:
>> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
>>> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
>>
>> This too, you would need to fiddle with devices (or connect enough
>> of them) to reach Slot ID 255 (probably the highest on most systems),
>> depending on the xHCI controller and its ID allocation policy.
> 
> This made me wonder what those policies are. I'm too lazy for thorough
> testing, but I plugged and unplugged the same device a few times.
> 
> Most HCs kept assigning ID 1, so they likely always pick the lowest.
> 
> My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
> kept assigning sequentially increasing IDs, so I suppose I could pump
> it up near the top, connect two high speed hubs and trigger this bug.
> 
>> But also as explained, this bug doesn't make things go boom just yet.
>>
>> Except if combined with your bug in an obscure edge case:
>>
>> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
>> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
>> 3. We start with freeing the second hub.
>> 4. The loop is entered and leaves vdev pointing at the first hub.
>> 5. The first hub is freed instead of the second one.
>> 6. Then its children are freed and UAF its tt_info.

I'm not sure I follow the above.

I agree that changing the "<" to "<=" makes sense, but fortunately for us there shouldn't be any
issue with current implementation as xhci_free_virt_devices_depth_first() is called with highest possible
slot_id value first:

in xhci-memm.c:
  for (i = HCS_MAX_SLOTS(xhci->hcs_params1); i > 0; i--)
                 xhci_free_virt_devices_depth_first(xhci, i);

if HCS_MAX_SLOTS slot_id is a hs-hub then all its children have slot_id < HCS_MAX_SLOTS,
and loop works fine.

If HCS_MXA_SLOT slot_id is a leaf node then it is freed first.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
  2025-09-02 10:13               ` Mathias Nyman
@ 2025-09-02 10:55                 ` Michał Pecio
  2025-09-02 12:58                   ` Mathias Nyman
  0 siblings, 1 reply; 13+ messages in thread
From: Michał Pecio @ 2025-09-02 10:55 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: David Wang, WeitaoWang-oc, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet

On Tue, 2 Sep 2025 13:13:12 +0300, Mathias Nyman wrote:
> On 2.9.2025 12.07, Michał Pecio wrote:
> > On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:  
> >> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:  
> >>> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?  
> >>
> >> This too, you would need to fiddle with devices (or connect enough
> >> of them) to reach Slot ID 255 (probably the highest on most systems),
> >> depending on the xHCI controller and its ID allocation policy.  
> > 
> > This made me wonder what those policies are. I'm too lazy for thorough
> > testing, but I plugged and unplugged the same device a few times.
> > 
> > Most HCs kept assigning ID 1, so they likely always pick the lowest.
> > 
> > My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
> > kept assigning sequentially increasing IDs, so I suppose I could pump
> > it up near the top, connect two high speed hubs and trigger this bug.
> >   
> >> But also as explained, this bug doesn't make things go boom just yet.
> >>
> >> Except if combined with your bug in an obscure edge case:
> >>
> >> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
> >> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
> >> 3. We start with freeing the second hub.
> >> 4. The loop is entered and leaves vdev pointing at the first hub.
> >> 5. The first hub is freed instead of the second one.
> >> 6. Then its children are freed and UAF its tt_info.  
> 
> I'm not sure I follow the above.
> 
> I agree that changing the "<" to "<=" makes sense, but fortunately for us there shouldn't be any
> issue with current implementation as xhci_free_virt_devices_depth_first() is called with highest possible
> slot_id value first:
> 
> in xhci-memm.c:
>   for (i = HCS_MAX_SLOTS(xhci->hcs_params1); i > 0; i--)
>                  xhci_free_virt_devices_depth_first(xhci, i);
> 
> if HCS_MAX_SLOTS slot_id is a hs-hub then all its children have slot_id < HCS_MAX_SLOTS,
> and loop works fine.

The loop works fine, but it exists with vdev pointing at MAX_SLOTS-1
due to off by one and then this happens:

	xhci_free_virt_device(xhci, vdev, slot_id);

which means:

	xhci_free_virt_device(xhci, xhci->devs[MAX_SLOTS-1], MAX_SLOTS);

If MAX_SLOTS-1 is a high speed hub, it will be freed right now, without
freeing its children first.

And whatever this device is, it will be freed without nulling
xhci->devs[MAX_SLOTS-1], which might cause other UAF later (not sure).

I think it's possible, though I haven't tried actually triggering it.
The problem didn't exist before this recent patch.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
  2025-09-02 10:55                 ` Michał Pecio
@ 2025-09-02 12:58                   ` Mathias Nyman
  0 siblings, 0 replies; 13+ messages in thread
From: Mathias Nyman @ 2025-09-02 12:58 UTC (permalink / raw)
  To: Michał Pecio
  Cc: David Wang, WeitaoWang-oc, gregkh, linux-usb, regressions,
	linux-kernel, surenb, kent.overstreet

On 2.9.2025 13.55, Michał Pecio wrote:
> On Tue, 2 Sep 2025 13:13:12 +0300, Mathias Nyman wrote:
>> On 2.9.2025 12.07, Michał Pecio wrote:
>>> On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:
>>>> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
>>>>> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
>>>>
>>>> This too, you would need to fiddle with devices (or connect enough
>>>> of them) to reach Slot ID 255 (probably the highest on most systems),
>>>> depending on the xHCI controller and its ID allocation policy.
>>>
>>> This made me wonder what those policies are. I'm too lazy for thorough
>>> testing, but I plugged and unplugged the same device a few times.
>>>
>>> Most HCs kept assigning ID 1, so they likely always pick the lowest.
>>>
>>> My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
>>> kept assigning sequentially increasing IDs, so I suppose I could pump
>>> it up near the top, connect two high speed hubs and trigger this bug.
>>>    
>>>> But also as explained, this bug doesn't make things go boom just yet.
>>>>
>>>> Except if combined with your bug in an obscure edge case:
>>>>
>>>> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
>>>> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
>>>> 3. We start with freeing the second hub.
>>>> 4. The loop is entered and leaves vdev pointing at the first hub.
>>>> 5. The first hub is freed instead of the second one.
>>>> 6. Then its children are freed and UAF its tt_info.
>>
>> I'm not sure I follow the above.
>>
>> I agree that changing the "<" to "<=" makes sense, but fortunately for us there shouldn't be any
>> issue with current implementation as xhci_free_virt_devices_depth_first() is called with highest possible
>> slot_id value first:
>>
>> in xhci-memm.c:
>>    for (i = HCS_MAX_SLOTS(xhci->hcs_params1); i > 0; i--)
>>                   xhci_free_virt_devices_depth_first(xhci, i);
>>
>> if HCS_MAX_SLOTS slot_id is a hs-hub then all its children have slot_id < HCS_MAX_SLOTS,
>> and loop works fine.
> 
> The loop works fine, but it exists with vdev pointing at MAX_SLOTS-1
> due to off by one and then this happens:
> 
> 	xhci_free_virt_device(xhci, vdev, slot_id);
> 
> which means:
> 
> 	xhci_free_virt_device(xhci, xhci->devs[MAX_SLOTS-1], MAX_SLOTS);
> 
> If MAX_SLOTS-1 is a high speed hub, it will be freed right now, without
> freeing its children first.
> 
> And whatever this device is, it will be freed without nulling
> xhci->devs[MAX_SLOTS-1], which might cause other UAF later (not sure).
> 
> I think it's possible, though I haven't tried actually triggering it.
> The problem didn't exist before this recent patch.
> 

I see, yes with the slot_id regression the incorrect vdev will cause issues.

Sent the patch to fix it forward

Thanks
Mathias

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-02 12:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-29 18:13 [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle David Wang
2025-08-30  9:48 ` Michał Pecio
2025-08-30 10:06   ` David Wang
2025-08-30 10:17   ` David Wang
2025-09-01 10:14     ` Mathias Nyman
2025-09-01 11:17       ` David Wang
2025-09-02  7:30       ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
2025-09-02  8:30         ` David Wang
2025-09-02  8:46           ` [PATCH] " Michał Pecio
2025-09-02  9:07             ` Michał Pecio
2025-09-02 10:13               ` Mathias Nyman
2025-09-02 10:55                 ` Michał Pecio
2025-09-02 12:58                   ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).