From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: "Michał Pecio" <michal.pecio@gmail.com>
Cc: Mathias Nyman <mathias.nyman@intel.com>, linux-usb@vger.kernel.org
Subject: Re: [PATCH 0/2] Fix the NEC stop bug workaround
Date: Thu, 24 Oct 2024 18:29:31 +0300 [thread overview]
Message-ID: <1c54f2f7-46bb-4ab3-b447-04a07318d200@linux.intel.com> (raw)
In-Reply-To: <20241016074711.247ff14e@foxbook>
On 16.10.2024 8.47, Michał Pecio wrote:
>>> With some experimentation I found that the bug is a variant of the
>>> old "stop after restart" issue - the doorbell ring is internally
>>> reordered after the subsequent command. By busy-waiting I confirmed
>>> that EP state which is initially seen as Stopped becomes Running
>>> some time later.
>>
>> Seems host controllers aren't designed to stop, move dequeue, and
>> restart an endpoint in quick succession.
>
> As it was you who added the Running case handling, do you know hardware
> other than NEC which triggers this? Or could it be just a single vendor
> who screwed up once 15 years ago and caused all the chaos?
>
> NEC sometimes triggers the Running case too and it is obvious why. I'm
> not sure how I missed it back in January and assumed it's some sort of
> random failure for no reason.
>
> BTW, the NEC problem appears to be limited to periodic endpoints. I am
> unable to reproduce it on bulk. I thought that I reproduced it on bulk
> back then, but on second thought it may have been interrupt, which that
> device also has. Unfortunatel I wasn't printing endpoint numbers then.
>
> Regards,
> Michal
Sorry about the reply delay.
I don't think this is a NEC only issue.
I was originally fixing halted endpoints at stop endpoint command completion,
did some stress testing, and was able to hit that running case on Intel
xHC controllers
See:
9ebf30007858 xhci: Fix halted endpoint at stop endpoint command completion
1174d44906d5 xhci: handle stop endpoint command completion with endpoint in running state.
I also just got a report off-list about an exactly similar case as yours, endpoint
stopped with ctx error, endpoint state was still stopped even if doorbell was
already rung.
This caused Set TR Deq command to fail with context error as endpoint was running
by the time this command was processed.
This was on a Intel host, se we need a generic solution to this.
Thanks
-Mathias
prev parent reply other threads:[~2024-10-24 15:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-14 19:08 [PATCH 0/2] Fix the NEC stop bug workaround Michal Pecio
2024-10-14 19:10 ` [PATCH 1/2] usb: xhci: " Michal Pecio
2024-10-15 10:38 ` Greg KH
2024-10-15 11:05 ` Mathias Nyman
2024-10-15 13:27 ` Michał Pecio
2024-10-14 19:11 ` [PATCH 2/2] usb: xhci: Warn about suspected "start-stop" bugs in HCs Michal Pecio
2024-10-15 10:40 ` Greg KH
2024-10-15 18:52 ` Michał Pecio
2024-10-15 12:23 ` [PATCH 0/2] Fix the NEC stop bug workaround Mathias Nyman
2024-10-15 14:51 ` Alan Stern
2024-10-16 5:47 ` Michał Pecio
2024-10-24 15:29 ` Mathias Nyman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1c54f2f7-46bb-4ab3-b447-04a07318d200@linux.intel.com \
--to=mathias.nyman@linux.intel.com \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=michal.pecio@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox