linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1]  usb: xhci: Fix NULL pointer dereference on certain command aborts
@ 2024-12-03 19:51 Michal Pecio
  2024-12-03 19:52 ` [PATCH 1/1] " Michal Pecio
  2024-12-19 20:55 ` [PATCH v2] " Michal Pecio
  0 siblings, 2 replies; 5+ messages in thread
From: Michal Pecio @ 2024-12-03 19:51 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Greg Kroah-Hartman, linux-usb, linux-kernel

Hi,

This fixes an issue discovered by code review prompted by this crash:
https://bugzilla.kernel.org/show_bug.cgi?id=219532

It has not been confirmed by the reporter and doesn't agree with some
of his own analysis, but:

1. it is consistent with logs provided by the reporter
2. it looks like a real bug in its own right anyway
3. it is rare enough that it may not happen again soon


I tried to minimize this patch and only address the potential crash.
The code looks like it could use more cleanup, though. Any comments
regarding the following suggestions?

1. It looks like we don't really need to ring the doorbell if cmd_list
   and cur_cmd are empty. Even if commands were no-oped, they are still
   on the list. And if there is anything between enqueue and dequeue
   which is not a link TRB and not on the list, we will have no idea
   how to handle the resulting event. IOW, it's not supposed to happen
   and probably never happening, or we would hear about it.

2. The whole business about turning commands into no-ops doesn't seem
   to happen in mainline today. Only the pending command gets aborted
   and it is removed from the list before the no-op loop runs.

3. xhci->current_cmd seems redundant - is there ever a valid reason for
   it being anything other than xhci->cmd_list first element?

   I find it suspicious that handle_cmd_completion() complains if the
   completed command isn't the first on cmd_list, but not if it doesn't
   match current_cmd and that current_cmd is then left unchanged.

Regards,
Michal

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] usb: xhci: Fix NULL pointer dereference on certain command aborts
  2024-12-03 19:51 [PATCH 0/1] usb: xhci: Fix NULL pointer dereference on certain command aborts Michal Pecio
@ 2024-12-03 19:52 ` Michal Pecio
  2024-12-04 17:53   ` Michal Pecio
  2024-12-19 20:55 ` [PATCH v2] " Michal Pecio
  1 sibling, 1 reply; 5+ messages in thread
From: Michal Pecio @ 2024-12-03 19:52 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Greg Kroah-Hartman, linux-usb, linux-kernel

xhci_handle_stopped_cmd_ring() attempts to restart the command ring if
there are pending TRBs on it, which it detects by comparing the enqueue
and dequeue pointers for equality. It assumes that pending TRBs imply a
pending command, and blindly dereferences cur_cmd. This can be wrong.

If a command is queued to the final usable TRB of a ring segment, the
enqueue pointer is advanced to the subsequent link TRB and no further.
If the command is later aborted, when the abort completion is handled
the dequeue pointer is advanced to the first TRB of the next segment.

If no further commands are queued, the pointers stay this way and then
xhci_handle_stopped_cmd_ring() is called by xhci_abort_cmd_ring() with
NULL cur_cmd, which triggers cur_cmd dereference as described above.

Fix this by omitting timer setup if cur_cmd is NULL. Leave the rest
unchanged, including ringing the doorbell each time the ring pointers
aren't equal. An unnecessary doorbell ring should be harmless.

This is probably Bug 219532, but no confirmation has been received.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219532
Fixes: c311e391a7ef ("xhci: rework command timeout and cancellation,")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4cf5363875c7..da26e317ab0c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -422,7 +422,8 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
 	if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
 	    !(xhci->xhc_state & XHCI_STATE_DYING)) {
 		xhci->current_cmd = cur_cmd;
-		xhci_mod_cmd_timer(xhci);
+		if (cur_cmd)
+			xhci_mod_cmd_timer(xhci);
 		xhci_ring_cmd_db(xhci);
 	}
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] usb: xhci: Fix NULL pointer dereference on certain command aborts
  2024-12-03 19:52 ` [PATCH 1/1] " Michal Pecio
@ 2024-12-04 17:53   ` Michal Pecio
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Pecio @ 2024-12-04 17:53 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Greg Kroah-Hartman, linux-usb, linux-kernel

I confirmed that the bug is real and behaves exactly as expected, using
a USB microcontroller programmed to NAK the status stage of SET_ADDRESS
requests forever and to reconnect if the host gives up enumerating it.

Command timeout was reduced to 500ms to sooner reach the segment's end
and some relevant debug info was added, hopefully self-explanatory:

[  +0,378926] usb 10-1: new full-speed USB device number 109 using xhci_hcd
[  +0,501006] xhci_hcd 0000:03:00.0: cur_cmd 0000000000000000 enq ffff88814671bff0 deq ffff88814671b000
[  +0,000001] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command
[  +0,000005] xhci_hcd 0000:03:00.0: !!! avoiding dereferencing a NULL pointer !!!
[  +0,712001] xhci_hcd 0000:03:00.0: cur_cmd 0000000000000000 enq ffff88814671b010 deq ffff88814671b010
[  +0,000001] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command
[  +0,207981] usb 10-1: device not accepting address 109, error -62

The driver and host controller continue working normally after one hour
of testing and several avoided crashes.

The only thing I haven't tried is actually crashing the kernel, but
considering what's inside xhci_mod_cmd_timer() I think it's obvious
that this is exactly what would happen next without this patch.

Regards,
Michal

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] usb: xhci: Fix NULL pointer dereference on certain command aborts
  2024-12-03 19:51 [PATCH 0/1] usb: xhci: Fix NULL pointer dereference on certain command aborts Michal Pecio
  2024-12-03 19:52 ` [PATCH 1/1] " Michal Pecio
@ 2024-12-19 20:55 ` Michal Pecio
  2024-12-20 12:47   ` Mathias Nyman
  1 sibling, 1 reply; 5+ messages in thread
From: Michal Pecio @ 2024-12-19 20:55 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Greg Kroah-Hartman, linux-usb, linux-kernel

If a command is queued to the final usable TRB of a ring segment, the
enqueue pointer is advanced to the subsequent link TRB and no further.
If the command is later aborted, when the abort completion is handled
the dequeue pointer is advanced to the first TRB of the next segment.

If no further commands are queued, xhci_handle_stopped_cmd_ring() sees
the ring pointers unequal and assumes that there is a pending command,
so it calls xhci_mod_cmd_timer() which crashes if cur_cmd was NULL.

Don't attempt timer setup if cur_cmd is NULL. The subsequent doorbell
ring likely is unnecessary too, but it's harmless. Leave it alone.

This is probably Bug 219532, but no confirmation has been received.

The issue has been independently reproduced and confirmed fixed using
a USB MCU programmed to NAK the Status stage of SET_ADDRESS forever.
Everything continued working normally after several prevented crashes.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219532
Fixes: c311e391a7ef ("xhci: rework command timeout and cancellation,")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---

This is practically a RESEND of a patch submitted earlier this month,
but with updated commit message.

 drivers/usb/host/xhci-ring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 09b05a62375e..dfe1a676d487 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -422,7 +422,8 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
 	if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
 	    !(xhci->xhc_state & XHCI_STATE_DYING)) {
 		xhci->current_cmd = cur_cmd;
-		xhci_mod_cmd_timer(xhci);
+		if (cur_cmd)
+			xhci_mod_cmd_timer(xhci);
 		xhci_ring_cmd_db(xhci);
 	}
 }
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] usb: xhci: Fix NULL pointer dereference on certain command aborts
  2024-12-19 20:55 ` [PATCH v2] " Michal Pecio
@ 2024-12-20 12:47   ` Mathias Nyman
  0 siblings, 0 replies; 5+ messages in thread
From: Mathias Nyman @ 2024-12-20 12:47 UTC (permalink / raw)
  To: Michal Pecio, Mathias Nyman; +Cc: Greg Kroah-Hartman, linux-usb, linux-kernel

On 19.12.2024 22.55, Michal Pecio wrote:
> If a command is queued to the final usable TRB of a ring segment, the
> enqueue pointer is advanced to the subsequent link TRB and no further.
> If the command is later aborted, when the abort completion is handled
> the dequeue pointer is advanced to the first TRB of the next segment.
> 
> If no further commands are queued, xhci_handle_stopped_cmd_ring() sees
> the ring pointers unequal and assumes that there is a pending command,
> so it calls xhci_mod_cmd_timer() which crashes if cur_cmd was NULL.

Nice catch.

That enqueue-dequeue pointer comparison should probably be changed to
checking cmd_list directly. We do use list_empty() and list_is_singular()
in other places

But that would be a separate cleanup later.

> 
> Don't attempt timer setup if cur_cmd is NULL. The subsequent doorbell
> ring likely is unnecessary too, but it's harmless. Leave it alone.
> 
> This is probably Bug 219532, but no confirmation has been received.
> 
> The issue has been independently reproduced and confirmed fixed using
> a USB MCU programmed to NAK the Status stage of SET_ADDRESS forever.
> Everything continued working normally after several prevented crashes.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219532
> Fixes: c311e391a7ef ("xhci: rework command timeout and cancellation,")
> CC: stable@vger.kernel.org
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---

Adding to queue

Thanks
-Mathias


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-12-20 12:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03 19:51 [PATCH 0/1] usb: xhci: Fix NULL pointer dereference on certain command aborts Michal Pecio
2024-12-03 19:52 ` [PATCH 1/1] " Michal Pecio
2024-12-04 17:53   ` Michal Pecio
2024-12-19 20:55 ` [PATCH v2] " Michal Pecio
2024-12-20 12:47   ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).