public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Michal Pecio <michal.pecio@gmail.com>,
	Mathias Nyman <mathias.nyman@intel.com>
Cc: linux-usb@vger.kernel.org
Subject: Re: [PATCH 1/2] usb: xhci: Fix the NEC stop bug workaround
Date: Tue, 15 Oct 2024 14:05:22 +0300	[thread overview]
Message-ID: <033e1f4e-c64c-4e8e-b249-02303e75baa8@linux.intel.com> (raw)
In-Reply-To: <20241014211005.07562933@foxbook>

On 14.10.2024 22.10, Michal Pecio wrote:
> The NEC uPD720200 has a bug, which prevents reliably stopping
> an endpoint shortly after it has been restarted. This usually
> happens when a driver kills many URBs in quick succession and
> it results in concurrent execution and cancellation of TDs.
> 
> This is handled by stopping the endpoint again if in doubt.
> 
> This "doubt" turns out to be a problem, because Stop Endpoint
> may be queued when the EP is already Stopped (for Set TR Deq
> execution, for example) or becomes Stopped concurrently (by
> Reset Endpoint, for example). If the EP is truly Stopped, the
> command fails and further retries just keep failing forever.
> 
> This is easily triggered by modifying uvcvideo to unlink its
> isochronous URBs in 100us intervals instead of poisoning them.
> Any driver that unlinks URBs asynchronously may trigger this,
> and any URB unlink during ongoing halt recovery also can.
> 
> Fix the problem by tracking redundant Stop Endpoint commands
> which are sure to fail, and by not retrying them. It's easy,
> because xhci_urb_dequeue() is the only user ever queuing the
> command with the default handler and without ensuring that
> the endpoint is Running and will not Halt before it Stops.
> For this case, we assume that an endpoint with pending URBs
> is always Running, unless certain operations are pending on
> it which indicate known exceptions.
> 
> Note that we need to catch those exceptions when they occur,
> because their flags may be cleared before our handler runs.
> 
> It's possible that other HCs have similar bugs (see also the
> related "Running" case below), but the workaround is limited
> to NEC because no such chips are currently known and tested.
> 
> Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers")
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
>   drivers/usb/host/xhci-ring.c | 44 +++++++++++++++++++++++++++++++++---
>   drivers/usb/host/xhci.h      |  2 ++
>   2 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 4d664ba53fe9..c0efb4d34ab9 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -911,6 +911,21 @@ static int xhci_reset_halted_ep(struct xhci_hcd *xhci, unsigned int slot_id,
>   	return ret;
>   }
>   
> +/*
> + * A Stop Endpoint command is redundant if the EP is not in the Running state.
> + * It will fail with Context State Error. We sometimes queue redundant Stop EP
> + * commands when the EP is held Stopped for Set TR Deq execution, or Halted.
> + * A pending Stop Endpoint command *becomes* redundant if the EP halts before
> + * its completion, and this flag needs to be updated in those cases too.
> + */
> +static void xhci_update_stop_cmd_redundant(struct xhci_virt_ep *ep)
> +{
> +	if (ep->ep_state & (SET_DEQ_PENDING | EP_HALTED | EP_CLEARING_TT))
> +		ep->ep_state |= EP_STOP_CMD_REDUNDANT;
> +	else
> +		ep->ep_state &= ~EP_STOP_CMD_REDUNDANT;
> +}
> +
>   static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
>   				struct xhci_virt_ep *ep,
>   				struct xhci_td *td,
> @@ -946,6 +961,7 @@ static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
>   		return err;
>   
>   	ep->ep_state |= EP_HALTED;
> +	xhci_update_stop_cmd_redundant(ep);
>   
>   	xhci_ring_cmd_db(xhci);
>   
> @@ -1149,15 +1165,31 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
>   				break;
>   			ep->ep_state &= ~EP_STOP_CMD_PENDING;
>   			return;
> +
>   		case EP_STATE_STOPPED:
>   			/*
> -			 * NEC uPD720200 sometimes sets this state and fails with
> -			 * Context Error while continuing to process TRBs.
> -			 * Be conservative and trust EP_CTX_STATE on other chips.
> +			 * Per xHCI 4.6.9, Stop Endpoint command on a Stopped
> +			 * EP is a Context State Error, and EP stays Stopped.
> +			 * The EP could be stopped by some concurrent job, so
> +			 * ignore this error when that's the case.
> +			 */
> +			if (ep->ep_state & EP_STOP_CMD_REDUNDANT)
> +				break;

Can we skip the new flag and just check for the correct flags here directly?

if (ep->ep_state & (SET_DEQ_PENDING | EP_HALTED | EP_CLEARING_TT)
	break;

Thanks
Mathias

  parent reply	other threads:[~2024-10-15 11:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-14 19:08 [PATCH 0/2] Fix the NEC stop bug workaround Michal Pecio
2024-10-14 19:10 ` [PATCH 1/2] usb: xhci: " Michal Pecio
2024-10-15 10:38   ` Greg KH
2024-10-15 11:05   ` Mathias Nyman [this message]
2024-10-15 13:27     ` Michał Pecio
2024-10-14 19:11 ` [PATCH 2/2] usb: xhci: Warn about suspected "start-stop" bugs in HCs Michal Pecio
2024-10-15 10:40   ` Greg KH
2024-10-15 18:52     ` Michał Pecio
2024-10-15 12:23 ` [PATCH 0/2] Fix the NEC stop bug workaround Mathias Nyman
2024-10-15 14:51   ` Alan Stern
2024-10-16  5:47   ` Michał Pecio
2024-10-24 15:29     ` Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=033e1f4e-c64c-4e8e-b249-02303e75baa8@linux.intel.com \
    --to=mathias.nyman@linux.intel.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=michal.pecio@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox