public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Niklas Neronin <niklas.neronin@linux.intel.com>
Cc: mathias.nyman@linux.intel.com, linux-usb@vger.kernel.org
Subject: Re: [PATCH 1/2] usb: xhci: fix Command Aborting
Date: Sat, 21 Mar 2026 14:30:57 +0100	[thread overview]
Message-ID: <20260321143057.1bf31b1b.michal.pecio@gmail.com> (raw)
In-Reply-To: <20260316142720.1471906-2-niklas.neronin@linux.intel.com>

On Mon, 16 Mar 2026 15:27:19 +0100, Niklas Neronin wrote:
> Aborting the command ring requires setting the Command Abort (CA) bit
> in the 64-bit Command Ring Control Register (CRCR). Historically,
> several attempts have been made to implement this correctly, but each
> introduced its own problems. This patch fixes the final remaining
> issue: accidental modification of the Command Ring Pointer (i'll
> abbreviate it to CRP).
> 
> Originally [1], the full 64-bit CRCR value was read and written back
> after setting CA. This is a bit unnecessary, only RsvdP bits (5:4)
> should be read and written back (for future-proofing). All other
> writable fields read as zero.
> 
> Later patches attempted to solve an issues, caused by 64-bit writes
> being split into two 32-bit writes. Writing the lower 31:0 bits first
> immediately stopped the ring (CRR=0), and the following upper-half
> write then overwrote part of CRP with zeroes, thus corrupting the CRP.
> Patch [2] avoided this by writing only the lower 31:0 bits with CA
> set, but that broke controllers that latch 64-bit registers only when
> the upper bits are written, as reported in [3].

I too have HW which waits for the high DWORD after low DWORD write.

And I have HW which doesn't. If I write 0xdeadbeef to the high DWORD
after waiting for CRR=0, some HW will ignore the write and some will
IOMMU fault at 0xdeadbeefsomething.

The abort itself takes a few microseconds in my tests.
Is this race causing problems in the real world?

> The current solution [4] attempted to fix this by writing the full
> 64-bit CRCR with both CA and an updated CRP. This does not work. The
> patch tries to modify CRP while setting CA, but with CRR=1 all writes
> to CRP are ignored. Updating CRP requires writing only the CA bit,
> waiting for the controller to process the abort and clear CRR, and
> only then writing the full CRP value.

That's not a problem yet because we *don't* want to change anything.

> Writing a new CRP after CA clears CRR is also unsafe:
>  * TRBs are 16-byte aligned (bits 3:0 clear)
>  * CRP requires 64-byte alignment (bits 5:0 clear)
> Writing a TRB pointer into CRP truncates bits 5:4

But this is a problem and I found HW (NEC, AMD) where waiting for
CRR=0 and writing correct high DWORD causes "mismatched completion"
errors, which is almost certainly (I haven't debugged) this thing.

> For a Command Abort to succeed, the CA bit must be set without
> modifying the CRP. The following sequence ensures this:
> 
>   1. Write the lower 31:0 bits with only the CA bit set. Since CRR=1,
>      CRP write is ignored.
> 
>   2. Poll CRR. If CRR becomes 0, the abort succeeded with CRP
>      preserved.
> 
>   3. If CRR does not clear (timeout), test if controller requires an
>      upper bits write to latch the register. Write the upper 63:32
>      bits (which does not update the CRP because CRR=1).

Unless it just flipped to zero and CRP becomes corrupted as usual.

>      Then poll CRR again. If CRR becomes 0, it was a latching issue
>      and the abort succeeded with CRP preserved.

Another problem is that this slows down the double-write chips and
polls them under spinlock, which is completely crazy because it blocks
xhci_irq() and other IRQs and generally freezes the whole system.

Not sure if there is a way out of this madness without writeq(), quirks
or a completely out-of-the-box solution, like making it possible to
restore CRP. Truncation can be dealt with by ensuring that a few TRBs
before Dequeue are unused and can be made No-Ops at any time.

Regards,
Michal


  reply	other threads:[~2026-03-21 13:31 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 14:27 [PATCH 0/2] usb: xhci: correct Command Abort bit handling Niklas Neronin
2026-03-16 14:27 ` [PATCH 1/2] usb: xhci: fix Command Aborting Niklas Neronin
2026-03-21 13:30   ` Michal Pecio [this message]
2026-03-23 10:25     ` Neronin, Niklas
2026-03-23 11:24       ` Michal Pecio
2026-03-23 14:00         ` Neronin, Niklas
2026-03-23 21:40           ` Michal Pecio
2026-03-16 14:27 ` [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures Niklas Neronin
2026-03-22 11:36   ` Michal Pecio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260321143057.1bf31b1b.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=niklas.neronin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox