* [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
@ 2025-08-29 18:13 David Wang
2025-08-30 9:48 ` Michał Pecio
0 siblings, 1 reply; 13+ messages in thread
From: David Wang @ 2025-08-29 18:13 UTC (permalink / raw)
To: WeitaoWang-oc, mathias.nyman, gregkh
Cc: linux-usb, regressions, linux-kernel, surenb, kent.overstreet
Hi,
I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
40960 5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
320 10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
1920 15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
112 12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
1792 28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
But with rc3, the memory usage increase after each suspend/resume cycle:
#1:
49152 6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
384 12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
2176 17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
128 14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
2048 32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
#2:
57344 7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
448 14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
2432 19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
144 16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
2304 36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
#3:
65536 8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
512 16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
2688 21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
160 18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
2560 40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
sure whether those memory would be released sometime later.
And in kernel log, two lines of error always showed up after suspend/resume:
[ 295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
[ 295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy
And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
"usb: xhci: Fix slot_id resource race conflict"):
git bisect start 'drivers/usb'
# status: waiting for both good and bad commits
# good: [c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9] Linux 6.17-rc2
git bisect good c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9
# status: waiting for bad commit, 1 good commit known
# bad: [1b237f190eb3d36f52dffe07a40b5eb210280e00] Linux 6.17-rc3
git bisect bad 1b237f190eb3d36f52dffe07a40b5eb210280e00
# good: [e86ba12cf84ab9cf42fbc2382235fa7ba616e18b] Merge tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
git bisect good e86ba12cf84ab9cf42fbc2382235fa7ba616e18b
# good: [471b25a2fcbb25dccd7c9bece30313f2440a554e] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
git bisect good 471b25a2fcbb25dccd7c9bece30313f2440a554e
# good: [52025b8fc992972168128be40bffee7eafa532b5] Merge tag 'driver-core-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
git bisect good 52025b8fc992972168128be40bffee7eafa532b5
# bad: [8004d08330e1aa7ae797778509e864f7ac3da687] Merge tag 'usb-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect bad 8004d08330e1aa7ae797778509e864f7ac3da687
# good: [a381c6d6f646226924809d0ad01a9465786da463] usb: typec: maxim_contaminant: re-enable cc toggle if cc is open and port is clean
git bisect good a381c6d6f646226924809d0ad01a9465786da463
# good: [4013aef2ced9b756a410f50d12df9ebe6a883e4a] ftrace: Fix potential warning in trace_printk_seq during ftrace_dump
git bisect good 4013aef2ced9b756a410f50d12df9ebe6a883e4a
# bad: [2eb03376151bb8585caa23ed2673583107bb5193] usb: xhci: Fix slot_id resource race conflict
git bisect bad 2eb03376151bb8585caa23ed2673583107bb5193
# good: [309b6341d5570fb2b41b923de2fc9bb147106b80] usb: typec: fusb302: Revert incorrect threaded irq fix
git bisect good 309b6341d5570fb2b41b923de2fc9bb147106b80
# first bad commit: [2eb03376151bb8585caa23ed2673583107bb5193] usb: xhci: Fix slot_id resource race conflict
Thanks
David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
2025-08-29 18:13 [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle David Wang
@ 2025-08-30 9:48 ` Michał Pecio
2025-08-30 10:06 ` David Wang
2025-08-30 10:17 ` David Wang
0 siblings, 2 replies; 13+ messages in thread
From: Michał Pecio @ 2025-08-30 9:48 UTC (permalink / raw)
To: David Wang
Cc: WeitaoWang-oc, mathias.nyman, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
On Sat, 30 Aug 2025 02:13:54 +0800, David Wang wrote:
> Hi,
>
> I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
> After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
>
> Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
>
> 40960 5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
> 320 10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
> 1920 15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
> 112 12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
> 1792 28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
>
> But with rc3, the memory usage increase after each suspend/resume cycle:
>
> #1:
> 49152 6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
> 384 12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
> 2176 17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
> 128 14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
> 2048 32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
> #2:
> 57344 7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
> 448 14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
> 2432 19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
> 144 16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
> 2304 36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
> #3:
> 65536 8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
> 512 16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
> 2688 21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
> 160 18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
> 2560 40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
>
> The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
> sure whether those memory would be released sometime later.
>
> And in kernel log, two lines of error always showed up after suspend/resume:
>
> [ 295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
> [ 295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy
Hi,
Good work, looks like suspend/resume is a little understested corner
of this driver.
Did you check whether the same leak occurs if you simply disconnect
a device or if it's truly unique to suspend?
> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
> "usb: xhci: Fix slot_id resource race conflict"):
I see a trivial bug which everyone (myself included tbh) missed before.
Does this help?
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index f11e13f9cdb4..f294032c2ad7 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
*/
static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
{
- struct xhci_virt_device *vdev;
+ struct xhci_virt_device *vdev, *tmp_vdev;
struct list_head *tt_list_head;
struct xhci_tt_bw_info *tt_info, *next;
int i;
@@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
if (tt_info->slot_id == slot_id) {
/* are any devices using this tt_info? */
for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
- vdev = xhci->devs[i];
- if (vdev && (vdev->tt_info == tt_info))
+ tmp_vdev = xhci->devs[i];
+ if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
xhci_free_virt_devices_depth_first(
xhci, i);
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
2025-08-30 9:48 ` Michał Pecio
@ 2025-08-30 10:06 ` David Wang
2025-08-30 10:17 ` David Wang
1 sibling, 0 replies; 13+ messages in thread
From: David Wang @ 2025-08-30 10:06 UTC (permalink / raw)
To: Michał Pecio
Cc: WeitaoWang-oc, mathias.nyman, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>On Sat, 30 Aug 2025 02:13:54 +0800, David Wang wrote:
>> Hi,
>>
>> I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
>> After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
>>
>> Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
>>
>> 40960 5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>> 320 10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
>> 1920 15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
>> 112 12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
>> 1792 28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
>>
>> But with rc3, the memory usage increase after each suspend/resume cycle:
>>
>> #1:
>> 49152 6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>> 384 12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
>> 2176 17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
>> 128 14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
>> 2048 32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
>> #2:
>> 57344 7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
>> 448 14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
>> 2432 19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
>> 144 16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
>> 2304 36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
>> #3:
>> 65536 8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
>> 512 16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
>> 2688 21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
>> 160 18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
>> 2560 40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
>>
>> The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
>> sure whether those memory would be released sometime later.
>>
>> And in kernel log, two lines of error always showed up after suspend/resume:
>>
>> [ 295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
>> [ 295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy
>
>Hi,
>
>Good work, looks like suspend/resume is a little understested corner
>of this driver.
>
>Did you check whether the same leak occurs if you simply disconnect
>a device or if it's truly unique to suspend?
>
>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>> "usb: xhci: Fix slot_id resource race conflict"):
>
>I see a trivial bug which everyone (myself included tbh) missed before.
>Does this help?
>
>diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>index f11e13f9cdb4..f294032c2ad7 100644
>--- a/drivers/usb/host/xhci-mem.c
>+++ b/drivers/usb/host/xhci-mem.c
>@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
> */
> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
> {
>- struct xhci_virt_device *vdev;
>+ struct xhci_virt_device *vdev, *tmp_vdev;
> struct list_head *tt_list_head;
> struct xhci_tt_bw_info *tt_info, *next;
> int i;
>@@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> if (tt_info->slot_id == slot_id) {
> /* are any devices using this tt_info? */
> for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>- vdev = xhci->devs[i];
>- if (vdev && (vdev->tt_info == tt_info))
>+ tmp_vdev = xhci->devs[i];
>+ if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
> xhci_free_virt_devices_depth_first(
> xhci, i);
> }
I notice this too, just a few minutes ago, I just started building a patch with this *silly* bug fixed.
(The device pointer is wrong, that is most likely the culprit: the virtual device is not properly freed and hence memory leak.)
Will update later.
David
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
2025-08-30 9:48 ` Michał Pecio
2025-08-30 10:06 ` David Wang
@ 2025-08-30 10:17 ` David Wang
2025-09-01 10:14 ` Mathias Nyman
1 sibling, 1 reply; 13+ messages in thread
From: David Wang @ 2025-08-30 10:17 UTC (permalink / raw)
To: Michał Pecio
Cc: WeitaoWang-oc, mathias.nyman, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>On Sat, 30 Aug 2025 02:13:54 +0800, David Wang wrote:
>> Hi,
>>
>> I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
>> After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
>>
>> Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
>>
>> 40960 5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>> 320 10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
>> 1920 15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
>> 112 12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
>> 1792 28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
>>
>> But with rc3, the memory usage increase after each suspend/resume cycle:
>>
>> #1:
>> 49152 6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>> 384 12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
>> 2176 17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
>> 128 14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
>> 2048 32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
>> #2:
>> 57344 7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
>> 448 14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
>> 2432 19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
>> 144 16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
>> 2304 36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
>> #3:
>> 65536 8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
>> 512 16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
>> 2688 21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
>> 160 18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
>> 2560 40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
>>
>> The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
>> sure whether those memory would be released sometime later.
>>
>> And in kernel log, two lines of error always showed up after suspend/resume:
>>
>> [ 295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
>> [ 295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy
>
>Hi,
>
>Good work, looks like suspend/resume is a little understested corner
>of this driver.
>
>Did you check whether the same leak occurs if you simply disconnect
>a device or if it's truly unique to suspend?
>
>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>> "usb: xhci: Fix slot_id resource race conflict"):
>
>I see a trivial bug which everyone (myself included tbh) missed before.
>Does this help?
>
>diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>index f11e13f9cdb4..f294032c2ad7 100644
>--- a/drivers/usb/host/xhci-mem.c
>+++ b/drivers/usb/host/xhci-mem.c
>@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
> */
> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
> {
>- struct xhci_virt_device *vdev;
>+ struct xhci_virt_device *vdev, *tmp_vdev;
> struct list_head *tt_list_head;
> struct xhci_tt_bw_info *tt_info, *next;
> int i;
>@@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> if (tt_info->slot_id == slot_id) {
> /* are any devices using this tt_info? */
> for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>- vdev = xhci->devs[i];
>- if (vdev && (vdev->tt_info == tt_info))
>+ tmp_vdev = xhci->devs[i];
>+ if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
> xhci_free_virt_devices_depth_first(
> xhci, i);
I confirmed this *silly* code is the root cause of this memory leak.
And I would suggest simpler code changes (which is what I was testing):
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 81eaad87a3d9..c4a6544aa107 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -962,7 +962,7 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
out:
/* we are now at a leaf device */
xhci_debugfs_remove_slot(xhci, slot_id);
- xhci_free_virt_device(xhci, vdev, slot_id);
+ xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id);
}
int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,
Thanks
David
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
2025-08-30 10:17 ` David Wang
@ 2025-09-01 10:14 ` Mathias Nyman
2025-09-01 11:17 ` David Wang
2025-09-02 7:30 ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
0 siblings, 2 replies; 13+ messages in thread
From: Mathias Nyman @ 2025-09-01 10:14 UTC (permalink / raw)
To: David Wang, Michał Pecio
Cc: WeitaoWang-oc, gregkh, linux-usb, regressions, linux-kernel,
surenb, kent.overstreet
On 30.8.2025 13.17, David Wang wrote:
>
> At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>>
>> Hi,
>>
>> Good work, looks like suspend/resume is a little understested corner
>> of this driver.
>>
>> Did you check whether the same leak occurs if you simply disconnect
>> a device or if it's truly unique to suspend?
>>
>>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>>> "usb: xhci: Fix slot_id resource race conflict"):
>>
>> I see a trivial bug which everyone (myself included tbh) missed before.
>> Does this help?
>>
>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>> index f11e13f9cdb4..f294032c2ad7 100644
>> --- a/drivers/usb/host/xhci-mem.c
>> +++ b/drivers/usb/host/xhci-mem.c
>> @@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
>> */
>> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
>> {
>> - struct xhci_virt_device *vdev;
>> + struct xhci_virt_device *vdev, *tmp_vdev;
>> struct list_head *tt_list_head;
>> struct xhci_tt_bw_info *tt_info, *next;
>> int i;
>> @@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
>> if (tt_info->slot_id == slot_id) {
>> /* are any devices using this tt_info? */
>> for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>> - vdev = xhci->devs[i];
>> - if (vdev && (vdev->tt_info == tt_info))
>> + tmp_vdev = xhci->devs[i];
>> + if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
>> xhci_free_virt_devices_depth_first(
>> xhci, i);
>
> I confirmed this *silly* code is the root cause of this memory leak.
> And I would suggest simpler code changes (which is what I was testing):
>
>
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index 81eaad87a3d9..c4a6544aa107 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -962,7 +962,7 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> out:
> /* we are now at a leaf device */
> xhci_debugfs_remove_slot(xhci, slot_id);
> - xhci_free_virt_device(xhci, vdev, slot_id);
> + xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id);
> }
>
> int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,
>
Thanks to both for catching this
I can quickly turn this into a proper patch unless one of you would like to submit one?
Thanks
Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle.
2025-09-01 10:14 ` Mathias Nyman
@ 2025-09-01 11:17 ` David Wang
2025-09-02 7:30 ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
1 sibling, 0 replies; 13+ messages in thread
From: David Wang @ 2025-09-01 11:17 UTC (permalink / raw)
To: Mathias Nyman
Cc: Michał Pecio, WeitaoWang-oc, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
At 2025-09-01 18:14:32, "Mathias Nyman" <mathias.nyman@linux.intel.com> wrote:
>On 30.8.2025 13.17, David Wang wrote:
>>
>> At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Good work, looks like suspend/resume is a little understested corner
>>> of this driver.
>>>
>>> Did you check whether the same leak occurs if you simply disconnect
>>> a device or if it's truly unique to suspend?
>>>
>>>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>>>> "usb: xhci: Fix slot_id resource race conflict"):
>>>
>>> I see a trivial bug which everyone (myself included tbh) missed before.
>>> Does this help?
>>>
>>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>>> index f11e13f9cdb4..f294032c2ad7 100644
>>> --- a/drivers/usb/host/xhci-mem.c
>>> +++ b/drivers/usb/host/xhci-mem.c
>>> @@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
>>> */
>>> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
>>> {
>>> - struct xhci_virt_device *vdev;
>>> + struct xhci_virt_device *vdev, *tmp_vdev;
>>> struct list_head *tt_list_head;
>>> struct xhci_tt_bw_info *tt_info, *next;
>>> int i;
>>> @@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
>>> if (tt_info->slot_id == slot_id) {
>>> /* are any devices using this tt_info? */
>>> for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>>> - vdev = xhci->devs[i];
>>> - if (vdev && (vdev->tt_info == tt_info))
>>> + tmp_vdev = xhci->devs[i];
>>> + if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
>>> xhci_free_virt_devices_depth_first(
>>> xhci, i);
>>
>> I confirmed this *silly* code is the root cause of this memory leak.
>> And I would suggest simpler code changes (which is what I was testing):
>>
>>
>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>> index 81eaad87a3d9..c4a6544aa107 100644
>> --- a/drivers/usb/host/xhci-mem.c
>> +++ b/drivers/usb/host/xhci-mem.c
>> @@ -962,7 +962,7 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
>> out:
>> /* we are now at a leaf device */
>> xhci_debugfs_remove_slot(xhci, slot_id);
>> - xhci_free_virt_device(xhci, vdev, slot_id);
>> + xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id);
>> }
>>
>> int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,
>>
>
>Thanks to both for catching this
>
>I can quickly turn this into a proper patch unless one of you would like to submit one?
Oh, I was not planning to submit a patch at all, since Michał Pecio got the credit of publishing the first patch.
Thanks
David
>
>Thanks
>Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
2025-09-01 10:14 ` Mathias Nyman
2025-09-01 11:17 ` David Wang
@ 2025-09-02 7:30 ` Michal Pecio
2025-09-02 8:30 ` David Wang
1 sibling, 1 reply; 13+ messages in thread
From: Michal Pecio @ 2025-09-02 7:30 UTC (permalink / raw)
To: Mathias Nyman
Cc: David Wang, WeitaoWang-oc, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
Reusing 'vdev' for iteration caused a recent commit to malfunction
uexpectedly, resulting in a reported memory leak and potential UAF
if devices are freed in bad order. Using a second variable solves
this problem, and maybe others later.
HCS_MAX_SLOTS(xhci->hcs_params1) is the highest possible slot_id,
so change the iteration range to include it. Currently this doesn't
seem to cause problems because the only caller begins with freeing
the topmost slot_id, but it breaks documented functionality.
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/linux-usb/20250829181354.4450-1-00107082@163.com/
Fixes: 2eb03376151b ("usb: xhci: Fix slot_id resource race conflict")
Fixes: ee8665e28e8d ("xhci: free xhci virtual devices with leaf nodes first")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-mem.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index eed5926b200e..db7dc70c37e5 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
*/
static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
{
- struct xhci_virt_device *vdev;
+ struct xhci_virt_device *vdev, *vdev_i;
struct list_head *tt_list_head;
struct xhci_tt_bw_info *tt_info, *next;
int i;
@@ -951,9 +951,9 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
/* is this a hub device that added a tt_info to the tts list */
if (tt_info->slot_id == slot_id) {
/* are any devices using this tt_info? */
- for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
- vdev = xhci->devs[i];
- if (vdev && (vdev->tt_info == tt_info))
+ for (i = 1; i <= HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
+ vdev_i = xhci->devs[i];
+ if (vdev_i && (vdev_i->tt_info == tt_info))
xhci_free_virt_devices_depth_first(
xhci, i);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re:[PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
2025-09-02 7:30 ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
@ 2025-09-02 8:30 ` David Wang
2025-09-02 8:46 ` [PATCH] " Michał Pecio
0 siblings, 1 reply; 13+ messages in thread
From: David Wang @ 2025-09-02 8:30 UTC (permalink / raw)
To: Michal Pecio
Cc: Mathias Nyman, WeitaoWang-oc, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
At 2025-09-02 15:30:17, "Michal Pecio" <michal.pecio@gmail.com> wrote:
>Reusing 'vdev' for iteration caused a recent commit to malfunction
>uexpectedly, resulting in a reported memory leak and potential UAF
>if devices are freed in bad order. Using a second variable solves
>this problem, and maybe others later.
>
>HCS_MAX_SLOTS(xhci->hcs_params1) is the highest possible slot_id,
>so change the iteration range to include it. Currently this doesn't
>seem to cause problems because the only caller begins with freeing
>the topmost slot_id, but it breaks documented functionality.
>
>Reported-by: David Wang <00107082@163.com>
>Closes: https://lore.kernel.org/linux-usb/20250829181354.4450-1-00107082@163.com/
>Fixes: 2eb03376151b ("usb: xhci: Fix slot_id resource race conflict")
>Fixes: ee8665e28e8d ("xhci: free xhci virtual devices with leaf nodes first")
>Cc: stable@vger.kernel.org
>Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
>---
> drivers/usb/host/xhci-mem.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>index eed5926b200e..db7dc70c37e5 100644
>--- a/drivers/usb/host/xhci-mem.c
>+++ b/drivers/usb/host/xhci-mem.c
>@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
> */
> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
> {
>- struct xhci_virt_device *vdev;
>+ struct xhci_virt_device *vdev, *vdev_i;
> struct list_head *tt_list_head;
> struct xhci_tt_bw_info *tt_info, *next;
> int i;
>@@ -951,9 +951,9 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> /* is this a hub device that added a tt_info to the tts list */
> if (tt_info->slot_id == slot_id) {
> /* are any devices using this tt_info? */
>- for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>- vdev = xhci->devs[i];
>- if (vdev && (vdev->tt_info == tt_info))
>+ for (i = 1; i <= HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>+ vdev_i = xhci->devs[i];
>+ if (vdev_i && (vdev_i->tt_info == tt_info))
> xhci_free_virt_devices_depth_first(
> xhci, i);
> }
>--
>2.48.1
Tested-by: David Wang <00107082@163.com>
About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
Thanks
David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
2025-09-02 8:30 ` David Wang
@ 2025-09-02 8:46 ` Michał Pecio
2025-09-02 9:07 ` Michał Pecio
0 siblings, 1 reply; 13+ messages in thread
From: Michał Pecio @ 2025-09-02 8:46 UTC (permalink / raw)
To: David Wang
Cc: Mathias Nyman, WeitaoWang-oc, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
This too, you would need to fiddle with devices (or connect enough
of them) to reach Slot ID 255 (probably the highest on most systems),
depending on the xHCI controller and its ID allocation policy.
But also as explained, this bug doesn't make things go boom just yet.
Except if combined with your bug in an obscure edge case:
1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
2. Another high speed hub has slot ID HCS_MAX_SLOTS.
3. We start with freeing the second hub.
4. The loop is entered and leaves vdev pointing at the first hub.
5. The first hub is freed instead of the second one.
6. Then its children are freed and UAF its tt_info.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
2025-09-02 8:46 ` [PATCH] " Michał Pecio
@ 2025-09-02 9:07 ` Michał Pecio
2025-09-02 10:13 ` Mathias Nyman
0 siblings, 1 reply; 13+ messages in thread
From: Michał Pecio @ 2025-09-02 9:07 UTC (permalink / raw)
To: David Wang
Cc: Mathias Nyman, WeitaoWang-oc, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:
> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
> > About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
>
> This too, you would need to fiddle with devices (or connect enough
> of them) to reach Slot ID 255 (probably the highest on most systems),
> depending on the xHCI controller and its ID allocation policy.
This made me wonder what those policies are. I'm too lazy for thorough
testing, but I plugged and unplugged the same device a few times.
Most HCs kept assigning ID 1, so they likely always pick the lowest.
My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
kept assigning sequentially increasing IDs, so I suppose I could pump
it up near the top, connect two high speed hubs and trigger this bug.
> But also as explained, this bug doesn't make things go boom just yet.
>
> Except if combined with your bug in an obscure edge case:
>
> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
> 3. We start with freeing the second hub.
> 4. The loop is entered and leaves vdev pointing at the first hub.
> 5. The first hub is freed instead of the second one.
> 6. Then its children are freed and UAF its tt_info.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
2025-09-02 9:07 ` Michał Pecio
@ 2025-09-02 10:13 ` Mathias Nyman
2025-09-02 10:55 ` Michał Pecio
0 siblings, 1 reply; 13+ messages in thread
From: Mathias Nyman @ 2025-09-02 10:13 UTC (permalink / raw)
To: Michał Pecio, David Wang
Cc: WeitaoWang-oc, gregkh, linux-usb, regressions, linux-kernel,
surenb, kent.overstreet
On 2.9.2025 12.07, Michał Pecio wrote:
> On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:
>> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
>>> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
>>
>> This too, you would need to fiddle with devices (or connect enough
>> of them) to reach Slot ID 255 (probably the highest on most systems),
>> depending on the xHCI controller and its ID allocation policy.
>
> This made me wonder what those policies are. I'm too lazy for thorough
> testing, but I plugged and unplugged the same device a few times.
>
> Most HCs kept assigning ID 1, so they likely always pick the lowest.
>
> My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
> kept assigning sequentially increasing IDs, so I suppose I could pump
> it up near the top, connect two high speed hubs and trigger this bug.
>
>> But also as explained, this bug doesn't make things go boom just yet.
>>
>> Except if combined with your bug in an obscure edge case:
>>
>> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
>> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
>> 3. We start with freeing the second hub.
>> 4. The loop is entered and leaves vdev pointing at the first hub.
>> 5. The first hub is freed instead of the second one.
>> 6. Then its children are freed and UAF its tt_info.
I'm not sure I follow the above.
I agree that changing the "<" to "<=" makes sense, but fortunately for us there shouldn't be any
issue with current implementation as xhci_free_virt_devices_depth_first() is called with highest possible
slot_id value first:
in xhci-memm.c:
for (i = HCS_MAX_SLOTS(xhci->hcs_params1); i > 0; i--)
xhci_free_virt_devices_depth_first(xhci, i);
if HCS_MAX_SLOTS slot_id is a hs-hub then all its children have slot_id < HCS_MAX_SLOTS,
and loop works fine.
If HCS_MXA_SLOT slot_id is a leaf node then it is freed first.
Thanks
Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
2025-09-02 10:13 ` Mathias Nyman
@ 2025-09-02 10:55 ` Michał Pecio
2025-09-02 12:58 ` Mathias Nyman
0 siblings, 1 reply; 13+ messages in thread
From: Michał Pecio @ 2025-09-02 10:55 UTC (permalink / raw)
To: Mathias Nyman
Cc: David Wang, WeitaoWang-oc, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
On Tue, 2 Sep 2025 13:13:12 +0300, Mathias Nyman wrote:
> On 2.9.2025 12.07, Michał Pecio wrote:
> > On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:
> >> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
> >>> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
> >>
> >> This too, you would need to fiddle with devices (or connect enough
> >> of them) to reach Slot ID 255 (probably the highest on most systems),
> >> depending on the xHCI controller and its ID allocation policy.
> >
> > This made me wonder what those policies are. I'm too lazy for thorough
> > testing, but I plugged and unplugged the same device a few times.
> >
> > Most HCs kept assigning ID 1, so they likely always pick the lowest.
> >
> > My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
> > kept assigning sequentially increasing IDs, so I suppose I could pump
> > it up near the top, connect two high speed hubs and trigger this bug.
> >
> >> But also as explained, this bug doesn't make things go boom just yet.
> >>
> >> Except if combined with your bug in an obscure edge case:
> >>
> >> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
> >> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
> >> 3. We start with freeing the second hub.
> >> 4. The loop is entered and leaves vdev pointing at the first hub.
> >> 5. The first hub is freed instead of the second one.
> >> 6. Then its children are freed and UAF its tt_info.
>
> I'm not sure I follow the above.
>
> I agree that changing the "<" to "<=" makes sense, but fortunately for us there shouldn't be any
> issue with current implementation as xhci_free_virt_devices_depth_first() is called with highest possible
> slot_id value first:
>
> in xhci-memm.c:
> for (i = HCS_MAX_SLOTS(xhci->hcs_params1); i > 0; i--)
> xhci_free_virt_devices_depth_first(xhci, i);
>
> if HCS_MAX_SLOTS slot_id is a hs-hub then all its children have slot_id < HCS_MAX_SLOTS,
> and loop works fine.
The loop works fine, but it exists with vdev pointing at MAX_SLOTS-1
due to off by one and then this happens:
xhci_free_virt_device(xhci, vdev, slot_id);
which means:
xhci_free_virt_device(xhci, xhci->devs[MAX_SLOTS-1], MAX_SLOTS);
If MAX_SLOTS-1 is a high speed hub, it will be freed right now, without
freeing its children first.
And whatever this device is, it will be freed without nulling
xhci->devs[MAX_SLOTS-1], which might cause other UAF later (not sure).
I think it's possible, though I haven't tried actually triggering it.
The problem didn't exist before this recent patch.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first()
2025-09-02 10:55 ` Michał Pecio
@ 2025-09-02 12:58 ` Mathias Nyman
0 siblings, 0 replies; 13+ messages in thread
From: Mathias Nyman @ 2025-09-02 12:58 UTC (permalink / raw)
To: Michał Pecio
Cc: David Wang, WeitaoWang-oc, gregkh, linux-usb, regressions,
linux-kernel, surenb, kent.overstreet
On 2.9.2025 13.55, Michał Pecio wrote:
> On Tue, 2 Sep 2025 13:13:12 +0300, Mathias Nyman wrote:
>> On 2.9.2025 12.07, Michał Pecio wrote:
>>> On Tue, 2 Sep 2025 10:46:30 +0200, Michał Pecio wrote:
>>>> On Tue, 2 Sep 2025 16:30:48 +0800 (CST), David Wang wrote:
>>>>> About the change from "<" to "<=", I did not observe any difference on my system. Is it because my system does not use up all slots?
>>>>
>>>> This too, you would need to fiddle with devices (or connect enough
>>>> of them) to reach Slot ID 255 (probably the highest on most systems),
>>>> depending on the xHCI controller and its ID allocation policy.
>>>
>>> This made me wonder what those policies are. I'm too lazy for thorough
>>> testing, but I plugged and unplugged the same device a few times.
>>>
>>> Most HCs kept assigning ID 1, so they likely always pick the lowest.
>>>
>>> My AMD chipset, two ASMedia USB 3.1 controllers and a Fresco FL1100
>>> kept assigning sequentially increasing IDs, so I suppose I could pump
>>> it up near the top, connect two high speed hubs and trigger this bug.
>>>
>>>> But also as explained, this bug doesn't make things go boom just yet.
>>>>
>>>> Except if combined with your bug in an obscure edge case:
>>>>
>>>> 1. A high speed hub has slot ID HCS_MAX_SLOTS-1 and some TT children.
>>>> 2. Another high speed hub has slot ID HCS_MAX_SLOTS.
>>>> 3. We start with freeing the second hub.
>>>> 4. The loop is entered and leaves vdev pointing at the first hub.
>>>> 5. The first hub is freed instead of the second one.
>>>> 6. Then its children are freed and UAF its tt_info.
>>
>> I'm not sure I follow the above.
>>
>> I agree that changing the "<" to "<=" makes sense, but fortunately for us there shouldn't be any
>> issue with current implementation as xhci_free_virt_devices_depth_first() is called with highest possible
>> slot_id value first:
>>
>> in xhci-memm.c:
>> for (i = HCS_MAX_SLOTS(xhci->hcs_params1); i > 0; i--)
>> xhci_free_virt_devices_depth_first(xhci, i);
>>
>> if HCS_MAX_SLOTS slot_id is a hs-hub then all its children have slot_id < HCS_MAX_SLOTS,
>> and loop works fine.
>
> The loop works fine, but it exists with vdev pointing at MAX_SLOTS-1
> due to off by one and then this happens:
>
> xhci_free_virt_device(xhci, vdev, slot_id);
>
> which means:
>
> xhci_free_virt_device(xhci, xhci->devs[MAX_SLOTS-1], MAX_SLOTS);
>
> If MAX_SLOTS-1 is a high speed hub, it will be freed right now, without
> freeing its children first.
>
> And whatever this device is, it will be freed without nulling
> xhci->devs[MAX_SLOTS-1], which might cause other UAF later (not sure).
>
> I think it's possible, though I haven't tried actually triggering it.
> The problem didn't exist before this recent patch.
>
I see, yes with the slot_id regression the incorrect vdev will cause issues.
Sent the patch to fix it forward
Thanks
Mathias
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-09-02 12:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-29 18:13 [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after suspend/resume cycle David Wang
2025-08-30 9:48 ` Michał Pecio
2025-08-30 10:06 ` David Wang
2025-08-30 10:17 ` David Wang
2025-09-01 10:14 ` Mathias Nyman
2025-09-01 11:17 ` David Wang
2025-09-02 7:30 ` [PATCH] usb: xhci: Fix xhci_free_virt_devices_depth_first() Michal Pecio
2025-09-02 8:30 ` David Wang
2025-09-02 8:46 ` [PATCH] " Michał Pecio
2025-09-02 9:07 ` Michał Pecio
2025-09-02 10:13 ` Mathias Nyman
2025-09-02 10:55 ` Michał Pecio
2025-09-02 12:58 ` Mathias Nyman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).