linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] xhci: Fix the NEC stop bug workaround
@ 2024-10-25 10:18 Michal Pecio
  2024-10-25 10:19 ` [PATCH v2 1/2] usb: " Michal Pecio
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Michal Pecio @ 2024-10-25 10:18 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb

Hi,


This is the promised v2 of this bugfix. It took longer than expected
because I got sidetracked by two (related) issues:

1. looking for similar bugs in other chips
2. simplifying this to avoid adding the STOP_CMD_REDUNDANT flag

Changes in v2:

1. Dropped the warning patch, because dealing with other chips is a
   separate issue from fixing the bug in existing code.
2. Added CC:stable to make the patch bot happy.
3. Some comment updates/clarifications to address questions asked by
   reviewers. Comments made vendor-agnostic, no longer mention NEC in
   preparation for other buggy vendors.
4. Added an RFC patch to simplify things and avoid queuing redundant
   commands instead of trying to handle them. Still a little dodgy in
   one place, problem described in a C99 comment. But it works for me.

The simplification is a separate patch because that's how the code
evolved and because it could enable the more straightforward and lower
risk patch 1/2 to be used in stable without patch 2/2, if desired.

Or otherwise, I could squash the patches together, of course.


Regarding other chips, the following was found:
1. NEC discovered this bug and fixed it in a silicon or FW revision.
   Some chips have the bug, but I have one which doesn't.
2. I couldn't reproduce this bug on VIA VL805 and Etron EJ168A.
3. Two ASMedia chips tested, both have the bug. ASM3142 aka ASM2142
   is quite subtle, because it only seems to happen when multiple EPs
   are used at the same time. I suspect it's a matter of the command
   ring fetching commands asynchronously before we ring the command
   doorbell, or simply increased xHC load triggers some internal bug.

ASMedia presents an additional challange because it sometimes gets
stuck: Stop Endpoint fails in Stopped state even though our ep_state
says it should be running, and it never starts. I need to investigate
what exactly goes wrong and if our ep_state is bad or the chip.

This is dangerous, because the naive workaround would simply retry
the command forever. I suppose it may be a very good idea to add some
timeout. Say, if 100ms passes and the commands are still failing, just
assume that it is stopped for good and go ahead.


Regards,
Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-01  9:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-25 10:18 [PATCH 0/2] xhci: Fix the NEC stop bug workaround Michal Pecio
2024-10-25 10:19 ` [PATCH v2 1/2] usb: " Michal Pecio
2024-10-25 10:20 ` [PATCH v2 2/2 RFC] usb: xhci: Don't queue redundant Stop Endpoint commands Michal Pecio
2024-10-28  7:33 ` [PATCH 0/2] xhci: Fix the NEC stop bug workaround Michal Pecio
2024-10-28  9:54   ` Mathias Nyman
2024-10-29  8:28     ` Michał Pecio
2024-10-29  9:16       ` Mathias Nyman
2024-10-30  8:29         ` Mathias Nyman
2024-10-31  8:13           ` Michał Pecio
2024-10-31 10:49             ` Michał Pecio
2024-10-31 11:17               ` Michał Pecio
2024-10-31 14:22                 ` Mathias Nyman
2024-11-01  9:10                   ` Michał Pecio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).