From: Michal Pecio <michal.pecio@gmail.com>
To: Niklas Neronin <niklas.neronin@linux.intel.com>
Cc: mathias.nyman@linux.intel.com, linux-usb@vger.kernel.org
Subject: Re: [PATCH 1/2] usb: xhci: fix Command Aborting
Date: Sat, 21 Mar 2026 14:30:57 +0100 [thread overview]
Message-ID: <20260321143057.1bf31b1b.michal.pecio@gmail.com> (raw)
In-Reply-To: <20260316142720.1471906-2-niklas.neronin@linux.intel.com>
On Mon, 16 Mar 2026 15:27:19 +0100, Niklas Neronin wrote:
> Aborting the command ring requires setting the Command Abort (CA) bit
> in the 64-bit Command Ring Control Register (CRCR). Historically,
> several attempts have been made to implement this correctly, but each
> introduced its own problems. This patch fixes the final remaining
> issue: accidental modification of the Command Ring Pointer (i'll
> abbreviate it to CRP).
>
> Originally [1], the full 64-bit CRCR value was read and written back
> after setting CA. This is a bit unnecessary, only RsvdP bits (5:4)
> should be read and written back (for future-proofing). All other
> writable fields read as zero.
>
> Later patches attempted to solve an issues, caused by 64-bit writes
> being split into two 32-bit writes. Writing the lower 31:0 bits first
> immediately stopped the ring (CRR=0), and the following upper-half
> write then overwrote part of CRP with zeroes, thus corrupting the CRP.
> Patch [2] avoided this by writing only the lower 31:0 bits with CA
> set, but that broke controllers that latch 64-bit registers only when
> the upper bits are written, as reported in [3].
I too have HW which waits for the high DWORD after low DWORD write.
And I have HW which doesn't. If I write 0xdeadbeef to the high DWORD
after waiting for CRR=0, some HW will ignore the write and some will
IOMMU fault at 0xdeadbeefsomething.
The abort itself takes a few microseconds in my tests.
Is this race causing problems in the real world?
> The current solution [4] attempted to fix this by writing the full
> 64-bit CRCR with both CA and an updated CRP. This does not work. The
> patch tries to modify CRP while setting CA, but with CRR=1 all writes
> to CRP are ignored. Updating CRP requires writing only the CA bit,
> waiting for the controller to process the abort and clear CRR, and
> only then writing the full CRP value.
That's not a problem yet because we *don't* want to change anything.
> Writing a new CRP after CA clears CRR is also unsafe:
> * TRBs are 16-byte aligned (bits 3:0 clear)
> * CRP requires 64-byte alignment (bits 5:0 clear)
> Writing a TRB pointer into CRP truncates bits 5:4
But this is a problem and I found HW (NEC, AMD) where waiting for
CRR=0 and writing correct high DWORD causes "mismatched completion"
errors, which is almost certainly (I haven't debugged) this thing.
> For a Command Abort to succeed, the CA bit must be set without
> modifying the CRP. The following sequence ensures this:
>
> 1. Write the lower 31:0 bits with only the CA bit set. Since CRR=1,
> CRP write is ignored.
>
> 2. Poll CRR. If CRR becomes 0, the abort succeeded with CRP
> preserved.
>
> 3. If CRR does not clear (timeout), test if controller requires an
> upper bits write to latch the register. Write the upper 63:32
> bits (which does not update the CRP because CRR=1).
Unless it just flipped to zero and CRP becomes corrupted as usual.
> Then poll CRR again. If CRR becomes 0, it was a latching issue
> and the abort succeeded with CRP preserved.
Another problem is that this slows down the double-write chips and
polls them under spinlock, which is completely crazy because it blocks
xhci_irq() and other IRQs and generally freezes the whole system.
Not sure if there is a way out of this madness without writeq(), quirks
or a completely out-of-the-box solution, like making it possible to
restore CRP. Truncation can be dealt with by ensuring that a few TRBs
before Dequeue are unused and can be made No-Ops at any time.
Regards,
Michal
next prev parent reply other threads:[~2026-03-21 13:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 14:27 [PATCH 0/2] usb: xhci: correct Command Abort bit handling Niklas Neronin
2026-03-16 14:27 ` [PATCH 1/2] usb: xhci: fix Command Aborting Niklas Neronin
2026-03-21 13:30 ` Michal Pecio [this message]
2026-03-23 10:25 ` Neronin, Niklas
2026-03-23 11:24 ` Michal Pecio
2026-03-23 14:00 ` Neronin, Niklas
2026-03-23 21:40 ` Michal Pecio
2026-03-31 11:46 ` [PATCH RFC] xhci: fix Command Aborting, diffferently Michal Pecio
2026-03-16 14:27 ` [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures Niklas Neronin
2026-03-22 11:36 ` Michal Pecio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260321143057.1bf31b1b.michal.pecio@gmail.com \
--to=michal.pecio@gmail.com \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@linux.intel.com \
--cc=niklas.neronin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.