From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Michal Pecio <michal.pecio@gmail.com>,
Mathias Nyman <mathias.nyman@intel.com>
Cc: linux-usb@vger.kernel.org
Subject: Re: [PATCH 1/2] usb: xhci: Fix the NEC stop bug workaround
Date: Tue, 15 Oct 2024 14:05:22 +0300 [thread overview]
Message-ID: <033e1f4e-c64c-4e8e-b249-02303e75baa8@linux.intel.com> (raw)
In-Reply-To: <20241014211005.07562933@foxbook>
On 14.10.2024 22.10, Michal Pecio wrote:
> The NEC uPD720200 has a bug, which prevents reliably stopping
> an endpoint shortly after it has been restarted. This usually
> happens when a driver kills many URBs in quick succession and
> it results in concurrent execution and cancellation of TDs.
>
> This is handled by stopping the endpoint again if in doubt.
>
> This "doubt" turns out to be a problem, because Stop Endpoint
> may be queued when the EP is already Stopped (for Set TR Deq
> execution, for example) or becomes Stopped concurrently (by
> Reset Endpoint, for example). If the EP is truly Stopped, the
> command fails and further retries just keep failing forever.
>
> This is easily triggered by modifying uvcvideo to unlink its
> isochronous URBs in 100us intervals instead of poisoning them.
> Any driver that unlinks URBs asynchronously may trigger this,
> and any URB unlink during ongoing halt recovery also can.
>
> Fix the problem by tracking redundant Stop Endpoint commands
> which are sure to fail, and by not retrying them. It's easy,
> because xhci_urb_dequeue() is the only user ever queuing the
> command with the default handler and without ensuring that
> the endpoint is Running and will not Halt before it Stops.
> For this case, we assume that an endpoint with pending URBs
> is always Running, unless certain operations are pending on
> it which indicate known exceptions.
>
> Note that we need to catch those exceptions when they occur,
> because their flags may be cleared before our handler runs.
>
> It's possible that other HCs have similar bugs (see also the
> related "Running" case below), but the workaround is limited
> to NEC because no such chips are currently known and tested.
>
> Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers")
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
> drivers/usb/host/xhci-ring.c | 44 +++++++++++++++++++++++++++++++++---
> drivers/usb/host/xhci.h | 2 ++
> 2 files changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 4d664ba53fe9..c0efb4d34ab9 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -911,6 +911,21 @@ static int xhci_reset_halted_ep(struct xhci_hcd *xhci, unsigned int slot_id,
> return ret;
> }
>
> +/*
> + * A Stop Endpoint command is redundant if the EP is not in the Running state.
> + * It will fail with Context State Error. We sometimes queue redundant Stop EP
> + * commands when the EP is held Stopped for Set TR Deq execution, or Halted.
> + * A pending Stop Endpoint command *becomes* redundant if the EP halts before
> + * its completion, and this flag needs to be updated in those cases too.
> + */
> +static void xhci_update_stop_cmd_redundant(struct xhci_virt_ep *ep)
> +{
> + if (ep->ep_state & (SET_DEQ_PENDING | EP_HALTED | EP_CLEARING_TT))
> + ep->ep_state |= EP_STOP_CMD_REDUNDANT;
> + else
> + ep->ep_state &= ~EP_STOP_CMD_REDUNDANT;
> +}
> +
> static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
> struct xhci_virt_ep *ep,
> struct xhci_td *td,
> @@ -946,6 +961,7 @@ static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
> return err;
>
> ep->ep_state |= EP_HALTED;
> + xhci_update_stop_cmd_redundant(ep);
>
> xhci_ring_cmd_db(xhci);
>
> @@ -1149,15 +1165,31 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
> break;
> ep->ep_state &= ~EP_STOP_CMD_PENDING;
> return;
> +
> case EP_STATE_STOPPED:
> /*
> - * NEC uPD720200 sometimes sets this state and fails with
> - * Context Error while continuing to process TRBs.
> - * Be conservative and trust EP_CTX_STATE on other chips.
> + * Per xHCI 4.6.9, Stop Endpoint command on a Stopped
> + * EP is a Context State Error, and EP stays Stopped.
> + * The EP could be stopped by some concurrent job, so
> + * ignore this error when that's the case.
> + */
> + if (ep->ep_state & EP_STOP_CMD_REDUNDANT)
> + break;
Can we skip the new flag and just check for the correct flags here directly?
if (ep->ep_state & (SET_DEQ_PENDING | EP_HALTED | EP_CLEARING_TT)
break;
Thanks
Mathias
next prev parent reply other threads:[~2024-10-15 11:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-14 19:08 [PATCH 0/2] Fix the NEC stop bug workaround Michal Pecio
2024-10-14 19:10 ` [PATCH 1/2] usb: xhci: " Michal Pecio
2024-10-15 10:38 ` Greg KH
2024-10-15 11:05 ` Mathias Nyman [this message]
2024-10-15 13:27 ` Michał Pecio
2024-10-14 19:11 ` [PATCH 2/2] usb: xhci: Warn about suspected "start-stop" bugs in HCs Michal Pecio
2024-10-15 10:40 ` Greg KH
2024-10-15 18:52 ` Michał Pecio
2024-10-15 12:23 ` [PATCH 0/2] Fix the NEC stop bug workaround Mathias Nyman
2024-10-15 14:51 ` Alan Stern
2024-10-16 5:47 ` Michał Pecio
2024-10-24 15:29 ` Mathias Nyman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=033e1f4e-c64c-4e8e-b249-02303e75baa8@linux.intel.com \
--to=mathias.nyman@linux.intel.com \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=michal.pecio@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox