[PATCH v3 0/3] xhci: Fix Stop Endpoint problems

linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] xhci: Fix Stop Endpoint problems
@ 2024-10-31  9:41 Michal Pecio
  2024-10-31  9:43 ` [PATCH v3 1/3] usb: xhci: Limit Stop Endpoint retries Michal Pecio
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Michal Pecio @ 2024-10-31  9:41 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

This is a thorough rework of "Fix the NEC stop bug workaround" series.

The focus is on simple patches which fix one bug at a time, hopefully
for good, using localized changes touching only 1-2 related functions.

1/3 resolves the problem of infinite retries on NEC by limiting them
2/3 resolves broken cancellation on stream endpoints (purely a SW bug)
3/3 resolves almost all known cases of pointless retries, which is an
    optimization at this point, as it falls back on 1/1 when it fails

At this point zero issues are seen on NEC, but other HCs with known HW
bugs are yet to be tested fully. However, ASM3142 has just passed the
"ifconfig up/down" test, so there is that. ASM1042 not tried at all.

If this works out, the ASMedia chips could be enabled. Or maybe all
chips could be enabled. Technically, retrying Stop Endpoint doesn't
seem to violate the spec. But we know the chips already have one bug,
so who knows if they don't have other bugs...

Maybe some internal operation scheduled by the doorbell ring could
interacs badly with commands. It's clear that some HC vendor haven't
thought very hard about what happens in those cases.

Regards,
Michal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/3] usb: xhci: Limit Stop Endpoint retries
  2024-10-31  9:41 [PATCH v3 0/3] xhci: Fix Stop Endpoint problems Michal Pecio
@ 2024-10-31  9:43 ` Michal Pecio
  2024-10-31  9:44 ` [PATCH v3 2/3] usb: xhci: Fix TD invalidation under pending Set TR Dequeue Michal Pecio
  2024-10-31  9:45 ` [PATCH v3 3/3] usb: xhci: Avoid queuing redundant Stop Endpoint commands Michal Pecio
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Pecio @ 2024-10-31  9:43 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

Some host controllers fail to atomically transition an endpoint to the
Running state on a doorbell ring and enter a hidden "Restarting" state,
which looks very much like Stopped, with the important difference that
it will spontaneously transition to Running anytime soon.

A Stop Endpoint command queued in the Restarting state typically fails
with Context State Error and the completion handler sees the Endpoint
Context State as either still Stopped or already Running. Even a case
of Halted was observed, when an error occurred right after the restart.

The Halted state is already recovered from by resetting the endpoint.
The Running state is handled by retrying Stop Endpoint.

The Stopped state was recognized as a problem on NEC controllers and
worked around also by retrying, because the endpoint soon restarts and
then stops for good. But there is a risk: the command may fail if the
endpoint is "stopped for good" already, and retries will fail forever.

The possibility of this was not realized at the time, but a number of
cases was discovered later and reproduced. Some proved difficult to
deal with, and it is outright impossible to predict if an endpoint may
fail to ever restart at all due to hardware bug. One bug of this kind
(albeit not on NEC) was discovered and is reproducible from userspace:

  while true; do ifconfig eth1 up; ifconfig eth1 down; done

An endless retries storm is quite nasty. Besides putting needless load
on the xHC and CPU, it causes URBs never to be given back, paralyzing
the device and connection/disconnection logic for the whole bus if the
device is unplugged. User processes waiting for URBs become unkillable
and xhci_hcd cannot be unbound from the HC or reloaded.

For peace of mind, impose a timeout on Stop Endpoint retries in this
case. If they don't succeed in 100ms, consider the endpoint stopped
permanently for some reason and just give back the unlinked URBs. This
failure case is rare already and work is under way to make it rarer.

Start this work today by also handling one simple case of race with
Reset Endpoint, because it costs practically nothing to implement.

Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 28 ++++++++++++++++++++++++----
 drivers/usb/host/xhci.c      |  2 ++
 drivers/usb/host/xhci.h      |  1 +
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index b6eb928e260f..98e12267bbf6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -52,6 +52,7 @@
  *   endpoint rings; it generates events on the event ring for these.
  */

+#include <linux/jiffies.h>
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
 #include <linux/dma-mapping.h>
@@ -1151,16 +1152,35 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 			return;
 		case EP_STATE_STOPPED:
 			/*
-			 * NEC uPD720200 sometimes sets this state and fails with
-			 * Context Error while continuing to process TRBs.
-			 * Be conservative and trust EP_CTX_STATE on other chips.
+			 * Per xHCI 4.6.9, Stop Endpoint command on a Stopped
+			 * EP is a Context State Error, and EP stays Stopped.
+			 *
+			 * But maybe it failed on Halted, and somebody ran Reset
+			 * Endpoint later. EP state is now Stopped and EP_HALTED
+			 * still set because Reset EP handler will run after us.
+			 */
+			if (ep->ep_state & EP_HALTED)
+				break;
+			/*
+			 * On some HCs EP state remains Stopped for some tens of
+			 * us to a few ms or more after a doorbell ring, and any
+			 * new Stop Endpoint fails without aborting the restart.
+			 * This handler may run quickly enough to still see this
+			 * Stopped state, but it will soon change to Running.
+			 *
+			 * Assume this bug on unexpected Stop Endpoint failures.
+			 * Keep retrying until the EP starts and stops again, on
+			 * chips where this is known to help. Wait for 100ms.
 			 */
 			if (!(xhci->quirks & XHCI_NEC_HOST))
 				break;
+			if (time_is_before_jiffies(ep->stop_time + msecs_to_jiffies(100)))
+				break;
 			fallthrough;
 		case EP_STATE_RUNNING:
 			/* Race, HW handled stop ep cmd before ep was running */
-			xhci_dbg(xhci, "Stop ep completion ctx error, ep is running\n");
+			xhci_dbg(xhci, "Stop ep completion ctx error, ctx_state %d\n",
+					GET_EP_CTX_STATE(ep_ctx));

 			command = xhci_alloc_command(xhci, false, GFP_ATOMIC);
 			if (!command) {
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 899c0effb5d3..1ed4dce7eeb7 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -8,6 +8,7 @@
  * Some code borrowed from the Linux EHCI driver.
  */

+#include <linux/jiffies.h>
 #include <linux/pci.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
@@ -1777,6 +1778,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 			ret = -ENOMEM;
 			goto done;
 		}
+		ep->stop_time = jiffies;
 		ep->ep_state |= EP_STOP_CMD_PENDING;
 		xhci_queue_stop_endpoint(xhci, command, urb->dev->slot_id,
 					 ep_index, 0);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index f0fb696d5619..0bef6c8b51cf 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -690,6 +690,7 @@ struct xhci_virt_ep {
 	/* Bandwidth checking storage */
 	struct xhci_bw_info	bw_info;
 	struct list_head	bw_endpoint_list;
+	unsigned long		stop_time;
 	/* Isoch Frame ID checking storage */
 	int			next_frame_id;
 	/* Use new Isoch TRB layout needed for extended TBC support */
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/3] usb: xhci: Fix TD invalidation under pending Set TR Dequeue
  2024-10-31  9:41 [PATCH v3 0/3] xhci: Fix Stop Endpoint problems Michal Pecio
  2024-10-31  9:43 ` [PATCH v3 1/3] usb: xhci: Limit Stop Endpoint retries Michal Pecio
@ 2024-10-31  9:44 ` Michal Pecio
  2024-10-31 13:57   ` Mathias Nyman
  2024-10-31  9:45 ` [PATCH v3 3/3] usb: xhci: Avoid queuing redundant Stop Endpoint commands Michal Pecio
  2 siblings, 1 reply; 6+ messages in thread
From: Michal Pecio @ 2024-10-31  9:44 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

xhci_invalidate_cancelled_tds() may work incorrectly if the hardware
is modifying some Endpoint or Stream Context by executing a Set TR
Dequeue command at the same time. And even if it worked, it couldn't
queue another Set TR Dequeue if required, failing to clear xHC cache.

Yet this function is called by completion handlers of commands like Stop
Endpoint and Reset Endpoint without any testing for SET_DEQ_PENDING. The
problem remained undetected for long time because the UAS driver rarely
cancels URBs, but it does cancel them sometimes, e.g. on I/O errors.

Make it legal to call xhci_invalidate_cancelled_tds() under a pending
Set TR Dequeue. Bail out of the function when SET_DEQ_PENDING is set,
then make the completion handler call the function again and also call
xhci_giveback_invalidated_tds(), which typically must be called next.

Clean up some code which duplicates functionality of these functions.

Fixes: 5ceac4402f5d ("xhci: Handle TD clearing for multiple streams case")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 45 ++++++++++++------------------------
 1 file changed, 15 insertions(+), 30 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 98e12267bbf6..52eb3ee1d7bf 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -959,7 +959,7 @@ static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
  * We're also in the event handler, so we can't get re-interrupted if another
  * Stop Endpoint command completes.
  *
- * only call this when ring is not in a running state
+ * only call this on a ring in Stopped state (we may need to queue Set TR Deq)
  */
 
 static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
@@ -973,6 +973,13 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 	unsigned int		slot_id = ep->vdev->slot_id;
 	int			err;
 
+	/*
+	 * This is not going to work if the hardware is changing its dequeue
+	 * pointers as we look at them. Completion handler will call us later.
+	 */
+	if (ep->ep_state & SET_DEQ_PENDING)
+		return 0;
+
 	xhci = ep->xhci;
 
 	list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
@@ -1001,7 +1008,9 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 		if (td->cancel_status == TD_HALTED || trb_in_td(xhci, td, hw_deq, false)) {
 			switch (td->cancel_status) {
 			case TD_CLEARED: /* TD is already no-op */
-			case TD_CLEARING_CACHE: /* set TR deq command already queued */
+				break;
+			case TD_CLEARING_CACHE: /* set TR deq command completed */
+				td->cancel_status = TD_CLEARED;
 				break;
 			case TD_DIRTY: /* TD is cached, clear it */
 			case TD_HALTED:
@@ -1358,8 +1367,6 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 	struct xhci_virt_ep *ep;
 	struct xhci_ep_ctx *ep_ctx;
 	struct xhci_slot_ctx *slot_ctx;
-	struct xhci_td *td, *tmp_td;
-	bool deferred = false;
 
 	ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
 	stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2]));
@@ -1451,37 +1458,15 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 				  ep->queued_deq_seg, ep->queued_deq_ptr);
 		}
 	}
-	/* HW cached TDs cleared from cache, give them back */
-	list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list,
-				 cancelled_td_list) {
-		ep_ring = xhci_urb_to_transfer_ring(ep->xhci, td->urb);
-		if (td->cancel_status == TD_CLEARING_CACHE) {
-			td->cancel_status = TD_CLEARED;
-			xhci_dbg(ep->xhci, "%s: Giveback cancelled URB %p TD\n",
-				 __func__, td->urb);
-			xhci_td_cleanup(ep->xhci, td, ep_ring, td->status);
-		} else if (td->cancel_status == TD_CLEARING_CACHE_DEFERRED) {
-			deferred = true;
-		} else {
-			xhci_dbg(ep->xhci, "%s: Keep cancelled URB %p TD as cancel_status is %d\n",
-				 __func__, td->urb, td->cancel_status);
-		}
-	}
 cleanup:
 	ep->ep_state &= ~SET_DEQ_PENDING;
 	ep->queued_deq_seg = NULL;
 	ep->queued_deq_ptr = NULL;
 
-	if (deferred) {
-		/* We have more streams to clear */
-		xhci_dbg(ep->xhci, "%s: Pending TDs to clear, continuing with invalidation\n",
-			 __func__);
-		xhci_invalidate_cancelled_tds(ep);
-	} else {
-		/* Restart any rings with pending URBs */
-		xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__);
-		ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
-	}
+	/* Continue cancellations or restart the ring if done */
+	xhci_invalidate_cancelled_tds(ep);
+	ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+	xhci_giveback_invalidated_tds(ep);
 }
 
 static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 3/3] usb: xhci: Avoid queuing redundant Stop Endpoint commands
  2024-10-31  9:41 [PATCH v3 0/3] xhci: Fix Stop Endpoint problems Michal Pecio
  2024-10-31  9:43 ` [PATCH v3 1/3] usb: xhci: Limit Stop Endpoint retries Michal Pecio
  2024-10-31  9:44 ` [PATCH v3 2/3] usb: xhci: Fix TD invalidation under pending Set TR Dequeue Michal Pecio
@ 2024-10-31  9:45 ` Michal Pecio
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Pecio @ 2024-10-31  9:45 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

Stop Endpoint command on an already stopped endpoint fails, which may be
misinterpreted as a hardware bug by the completion handler. This results
in an unnecessary delay with repeated retries of the command.

Avoid queuing this command when our endpoint state indicates that it is
pointless: when other endpoint management commands are pending and will
process cancelled TDs on completion, or when the endpoint is waiting for
TT buffer to be cleared.

This should reduce unnecessary retries to almost zero. Known exceptions
include hard endpoint reset not followed by a ring restart and HW faults
causing the endpoint never to restart at all.

The endpoint reset case is likely a bug, so dealing with it is left for
later, when the matter is analyzed and resolved satisfactorily.

Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 13 +++++++++++++
 drivers/usb/host/xhci.c      | 24 ++++++++++++++++++++----
 drivers/usb/host/xhci.h      |  1 +
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 52eb3ee1d7bf..7726168b72ab 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1071,6 +1071,19 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 	return 0;
 }
 
+/*
+ * Erase queued TDs from transfer ring(s) and give back those the xHC didn't
+ * stop on. If necessary, queue commands to move the xHC off cancelled TDs it
+ * stopped on. Those will be given back later when the commands complete.
+ *
+ * Call under xhci->lock on a stopped endpoint.
+ */
+void xhci_process_cancelled_tds(struct xhci_virt_ep *ep)
+{
+	xhci_invalidate_cancelled_tds(ep);
+	xhci_giveback_invalidated_tds(ep);
+}
+
 /*
  * Returns the TD the endpoint ring halted on.
  * Only call for non-running rings without streams.
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 1ed4dce7eeb7..e75a6bf4e8d5 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1680,6 +1680,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	unsigned long flags;
 	int ret, i;
 	u32 temp;
+	bool cancelled = false;
 	struct xhci_hcd *xhci;
 	struct urb_priv	*urb_priv;
 	struct xhci_td *td;
@@ -1766,13 +1767,28 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 			td->cancel_status = TD_DIRTY;
 			list_add_tail(&td->cancelled_td_list,
 				      &ep->cancelled_td_list);
+			cancelled = true;
 		}
 	}
 
-	/* Queue a stop endpoint command, but only if this is
-	 * the first cancellation to be handled.
-	 */
-	if (!(ep->ep_state & EP_STOP_CMD_PENDING)) {
+	if (!cancelled)
+		goto done;
+
+	/* These completion handlers will sort out cancelled TDs for us */
+	if (ep->ep_state & (EP_STOP_CMD_PENDING | EP_HALTED | SET_DEQ_PENDING)) {
+		xhci_dbg(xhci, "Not queuing Stop Endpoint on slot %d ep %d in state 0x%x\n",
+				urb->dev->slot_id, ep_index, ep->ep_state);
+		goto done;
+	}
+
+	/* In this case no commands are pending but the endpoint is stopped */
+	if (ep->ep_state & EP_CLEARING_TT) {
+		/* and cancelled TDs can be given back right away */
+		xhci_dbg(xhci, "Invalidating TDs instantly on slot %d ep %d in state 0x%x\n",
+				urb->dev->slot_id, ep_index, ep->ep_state);
+		xhci_process_cancelled_tds(ep);
+	} else {
+		/* Otherwise, queue a new Stop Endpoint command */
 		command = xhci_alloc_command(xhci, false, GFP_ATOMIC);
 		if (!command) {
 			ret = -ENOMEM;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 0bef6c8b51cf..33e57f408e86 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1914,6 +1914,7 @@ void xhci_ring_doorbell_for_active_rings(struct xhci_hcd *xhci,
 void xhci_cleanup_command_queue(struct xhci_hcd *xhci);
 void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring);
 unsigned int count_trbs(u64 addr, u64 len);
+void xhci_process_cancelled_tds(struct xhci_virt_ep *ep);
 
 /* xHCI roothub code */
 void xhci_set_link_state(struct xhci_hcd *xhci, struct xhci_port *port,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/3] usb: xhci: Fix TD invalidation under pending Set TR Dequeue
  2024-10-31  9:44 ` [PATCH v3 2/3] usb: xhci: Fix TD invalidation under pending Set TR Dequeue Michal Pecio
@ 2024-10-31 13:57   ` Mathias Nyman
  2024-11-04  9:43     ` Michał Pecio
  0 siblings, 1 reply; 6+ messages in thread
From: Mathias Nyman @ 2024-10-31 13:57 UTC (permalink / raw)
  To: Michal Pecio; +Cc: linux-usb

On 31.10.2024 11.44, Michal Pecio wrote:
> xhci_invalidate_cancelled_tds() may work incorrectly if the hardware
> is modifying some Endpoint or Stream Context by executing a Set TR
> Dequeue command at the same time. And even if it worked, it couldn't
> queue another Set TR Dequeue if required, failing to clear xHC cache.
> 
> Yet this function is called by completion handlers of commands like Stop
> Endpoint and Reset Endpoint without any testing for SET_DEQ_PENDING. The
> problem remained undetected for long time because the UAS driver rarely
> cancels URBs, but it does cancel them sometimes, e.g. on I/O errors.
> 
> Make it legal to call xhci_invalidate_cancelled_tds() under a pending
> Set TR Dequeue. Bail out of the function when SET_DEQ_PENDING is set,
> then make the completion handler call the function again and also call
> xhci_giveback_invalidated_tds(), which typically must be called next.
> 
> Clean up some code which duplicates functionality of these functions.
> 
> Fixes: 5ceac4402f5d ("xhci: Handle TD clearing for multiple streams case")
> CC: stable@vger.kernel.org
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
>   drivers/usb/host/xhci-ring.c | 45 ++++++++++++------------------------
>   1 file changed, 15 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 98e12267bbf6..52eb3ee1d7bf 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -959,7 +959,7 @@ static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
>    * We're also in the event handler, so we can't get re-interrupted if another
>    * Stop Endpoint command completes.
>    *
> - * only call this when ring is not in a running state
> + * only call this on a ring in Stopped state (we may need to queue Set TR Deq)
>    */
>   
>   static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
> @@ -973,6 +973,13 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
>   	unsigned int		slot_id = ep->vdev->slot_id;
>   	int			err;
>   
> +	/*
> +	 * This is not going to work if the hardware is changing its dequeue
> +	 * pointers as we look at them. Completion handler will call us later.
> +	 */
> +	if (ep->ep_state & SET_DEQ_PENDING)
> +		return 0;
> +
>   	xhci = ep->xhci;
>   
>   	list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
> @@ -1001,7 +1008,9 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
>   		if (td->cancel_status == TD_HALTED || trb_in_td(xhci, td, hw_deq, false)) {
>   			switch (td->cancel_status) {
>   			case TD_CLEARED: /* TD is already no-op */
> -			case TD_CLEARING_CACHE: /* set TR deq command already queued */
> +				break;
> +			case TD_CLEARING_CACHE: /* set TR deq command completed */
> +				td->cancel_status = TD_CLEARED;

This should only happen if 'Set TR Deq' fails as in success cases the hw_deq is moved past this TD.
when it fails the cache may not be cleared. i.e. rd is not TD_CLEARED

We do have the same problem in current code, but moving it here makes that even harder to fix.

>   				break;
>   			case TD_DIRTY: /* TD is cached, clear it */
>   			case TD_HALTED:
> @@ -1358,8 +1367,6 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
>   	struct xhci_virt_ep *ep;
>   	struct xhci_ep_ctx *ep_ctx;
>   	struct xhci_slot_ctx *slot_ctx;
> -	struct xhci_td *td, *tmp_td;
> -	bool deferred = false;
>   
>   	ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
>   	stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2]));
> @@ -1451,37 +1458,15 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
>   				  ep->queued_deq_seg, ep->queued_deq_ptr);
>   		}
>   	}
> -	/* HW cached TDs cleared from cache, give them back */
> -	list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list,
> -				 cancelled_td_list) {
> -		ep_ring = xhci_urb_to_transfer_ring(ep->xhci, td->urb);
> -		if (td->cancel_status == TD_CLEARING_CACHE) {
> -			td->cancel_status = TD_CLEARED;
> -			xhci_dbg(ep->xhci, "%s: Giveback cancelled URB %p TD\n",
> -				 __func__, td->urb);
> -			xhci_td_cleanup(ep->xhci, td, ep_ring, td->status);
> -		} else if (td->cancel_status == TD_CLEARING_CACHE_DEFERRED) {
> -			deferred = true;
> -		} else {
> -			xhci_dbg(ep->xhci, "%s: Keep cancelled URB %p TD as cancel_status is %d\n",
> -				 __func__, td->urb, td->cancel_status);
> -		}
> -	}
>   cleanup:
>   	ep->ep_state &= ~SET_DEQ_PENDING;
>   	ep->queued_deq_seg = NULL;
>   	ep->queued_deq_ptr = NULL;
>   
> -	if (deferred) {
> -		/* We have more streams to clear */
> -		xhci_dbg(ep->xhci, "%s: Pending TDs to clear, continuing with invalidation\n",
> -			 __func__);
> -		xhci_invalidate_cancelled_tds(ep);
> -	} else {
> -		/* Restart any rings with pending URBs */
> -		xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__);
> -		ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
> -	}
> +	/* Continue cancellations or restart the ring if done */
> +	xhci_invalidate_cancelled_tds(ep);

xhci_invalidate_cancelled_tds() now gets a secondary purpose which is not obvious from
reading the code.
its repurposed here mainly to turn cancelled tds from TD_CLEARING_CACHE to
TD_CLEARED in cases where hw_deq is _NOT_ in the td.

xhci_invalidate_cancelled_tds() will now also perform some extra work on each
already cleared td that is still in the ep->cancelled_td_list

I think its clearer to run though the cancelled td list above, and only
call xhci_invalidate_cancelled_tds() if needed.

i.e.

list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
	switch (td->cancel_status) {
	case TD_CLEARING_CACHE:
		td->cancel_status = TD_CLEARED;
		break;
	case TD_CLEARING_CACHE_DEFERRED:
	case TD_DIRTY:
		pending_td_cancels = true;
		break;
	default:
		break;
	}
}
...
if (pending_td_cancels)
	xhci_invalidate_cancelled_tds(ep);

Thanks
Mathias


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/3] usb: xhci: Fix TD invalidation under pending Set TR Dequeue
  2024-10-31 13:57   ` Mathias Nyman
@ 2024-11-04  9:43     ` Michał Pecio
  0 siblings, 0 replies; 6+ messages in thread
From: Michał Pecio @ 2024-11-04  9:43 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

On Thu, 31 Oct 2024 15:57:16 +0200, Mathias Nyman wrote:
> On 31.10.2024 11.44, Michal Pecio wrote:
> > +			case TD_CLEARING_CACHE: /* set TR deq command completed */
> > +				td->cancel_status = TD_CLEARED;  
> 
> This should only happen if 'Set TR Deq' fails as in success cases the
> hw_deq is moved past this TD. when it fails the cache may not be
> cleared. i.e. rd is not TD_CLEARED
Thanks, that was a bug, completely unintended. I should have placed
this line elsewhere.
 
> We do have the same problem in current code, but moving it here makes
> that even harder to fix.
Right, I will no longer try to move this code at all.

> xhci_invalidate_cancelled_tds() will now also perform some extra work
> on each already cleared td that is still in the ep->cancelled_td_list
I think this shouldn't be a problem, because every call to invalidate()
is now followed by a call to giveback(), so a future invalidate() should
never seen stale cleared TDs left by a past invalidate().

> I think its clearer to run though the cancelled td list above, and
> only call xhci_invalidate_cancelled_tds() if needed.
> 
> i.e.
> 
> list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list,
> cancelled_td_list) { switch (td->cancel_status) {
> 	case TD_CLEARING_CACHE:
> 		td->cancel_status = TD_CLEARED;
> 		break;
> 	case TD_CLEARING_CACHE_DEFERRED:
> 	case TD_DIRTY:
> 		pending_td_cancels = true;
> 		break;
> 	default:
> 		break;
> 	}
> }
> ...
> if (pending_td_cancels)
> 	xhci_invalidate_cancelled_tds(ep);
I have done something similar, but without an extra bool. Simply test
for unempty cancelled_td_list.


Regards,
Michal

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-11-04  9:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31  9:41 [PATCH v3 0/3] xhci: Fix Stop Endpoint problems Michal Pecio
2024-10-31  9:43 ` [PATCH v3 1/3] usb: xhci: Limit Stop Endpoint retries Michal Pecio
2024-10-31  9:44 ` [PATCH v3 2/3] usb: xhci: Fix TD invalidation under pending Set TR Dequeue Michal Pecio
2024-10-31 13:57   ` Mathias Nyman
2024-11-04  9:43     ` Michał Pecio
2024-10-31  9:45 ` [PATCH v3 3/3] usb: xhci: Avoid queuing redundant Stop Endpoint commands Michal Pecio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).