[PATCH 0/2] usb: xhci: correct Command Abort bit handling

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] usb: xhci: correct Command Abort bit handling
@ 2026-03-16 14:27 Niklas Neronin
  2026-03-16 14:27 ` [PATCH 1/2] usb: xhci: fix Command Aborting Niklas Neronin
  2026-03-16 14:27 ` [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures Niklas Neronin
  0 siblings, 2 replies; 9+ messages in thread
From: Niklas Neronin @ 2026-03-16 14:27 UTC (permalink / raw)
  To: mathias.nyman; +Cc: linux-usb, Niklas Neronin

Address problems with xHCI Command Abort bit handling.

First patch reworks the Command Abort procedure so that the CA bit is
asserted without modifying Command Ring Pointer under any circumstances.
It documents and corrects all previously encountered corner cases.

The second patch refines this by using writeq() for the CA write on 64-bit
architectures. Since writeq() performs a true atomic 64-bit MMIO write,
it should avoid the split-write hazards that have historically caused
Command Ring Pointer corruption.

Niklas Neronin (2):
  usb: xhci: fix Command Aborting
  usb: xhci: use writeq() for CA write on 64-bit architectures

 drivers/usb/host/xhci-ring.c | 47 ++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

-- 
2.50.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] usb: xhci: fix Command Aborting
  2026-03-16 14:27 [PATCH 0/2] usb: xhci: correct Command Abort bit handling Niklas Neronin
@ 2026-03-16 14:27 ` Niklas Neronin
  2026-03-21 13:30   ` Michal Pecio
  2026-03-16 14:27 ` [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures Niklas Neronin
  1 sibling, 1 reply; 9+ messages in thread
From: Niklas Neronin @ 2026-03-16 14:27 UTC (permalink / raw)
  To: mathias.nyman; +Cc: linux-usb, Niklas Neronin

Aborting the command ring requires setting the Command Abort (CA) bit in
the 64-bit Command Ring Control Register (CRCR). Historically, several
attempts have been made to implement this correctly, but each introduced
its own problems. This patch fixes the final remaining issue: accidental
modification of the Command Ring Pointer (i'll abbreviate it to CRP).

Originally [1], the full 64-bit CRCR value was read and written back after
setting CA. This is a bit unnecessary, only RsvdP bits (5:4) should be
read and written back (for future-proofing). All other writable fields
read as zero.

Later patches attempted to solve an issues, caused by 64-bit writes being
split into two 32-bit writes. Writing the lower 31:0 bits first immediately
stopped the ring (CRR=0), and the following upper-half write then overwrote
part of CRP with zeroes, thus corrupting the CRP.
Patch [2] avoided this by writing only the lower 31:0 bits with CA set,
but that broke controllers that latch 64-bit registers only when the upper
bits are written, as reported in [3].

The current solution [4] attempted to fix this by writing the full 64-bit
CRCR with both CA and an updated CRP. This does not work. The patch tries
to modify CRP while setting CA, but with CRR=1 all writes to CRP are
ignored. Updating CRP requires writing only the CA bit, waiting for the
controller to process the abort and clear CRR, and only then writing the
full CRP value.
Writing a new CRP after CA clears CRR is also unsafe:
 * TRBs are 16-byte aligned (bits 3:0 clear)
 * CRP requires 64-byte alignment (bits 5:0 clear)
Writing a TRB pointer into CRP truncates bits 5:4 and may produce an
incorrect address. Restoring the original CRP is also not possible,
CRCR reads return zero, and writing the original CRP back forces the
controller to restart execution from the beginning of the ring.

For a Command Abort to succeed, the CA bit must be set without modifying
the CRP. The following sequence ensures this:

  1. Write the lower 31:0 bits with only the CA bit set. Since CRR=1,
     CRP write is ignored.

  2. Poll CRR. If CRR becomes 0, the abort succeeded with CRP preserved.

  3. If CRR does not clear (timeout), test if controller requires an upper
     bits write to latch the register. Write the upper 63:32 bits (which
     does not update the CRP because CRR=1).
     Then poll CRR again. If CRR becomes 0, it was a latching issue and the
     abort succeeded with CRP preserved.

Because there are now two polling stages, each timeout has been reduced
from five seconds to three seconds. The comment explaining the 5 seconds
poll has been removed.

Link: https://git.kernel.org/torvalds/c/b92cc66c047f [1]
Link: https://lore.kernel.org/all/20211008092547.3996295-5-mathias.nyman@linux.intel.com/ [2]
Link: https://lore.kernel.org/all/20211022105913.7671-1-youling257@gmail.com/ [3]
Link: https://lore.kernel.org/20211126122340.1193239-2-mathias.nyman@linux.intel.com [4]
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 42 ++++++++++++++++--------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1cbefee3c4ca..10160e76df68 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -489,38 +489,32 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
 /* Must be called with xhci->lock held, releases and acquires lock back */
 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci, unsigned long flags)
 {
-	struct xhci_segment *new_seg	= xhci->cmd_ring->deq_seg;
-	union xhci_trb *new_deq		= xhci->cmd_ring->dequeue;
-	u64 crcr;
+	u32 crcr_lo;
 	int ret;

 	xhci_dbg(xhci, "Abort command ring\n");

 	reinit_completion(&xhci->cmd_ring_stop_completion);

-	/*
-	 * The control bits like command stop, abort are located in lower
-	 * dword of the command ring control register.
-	 * Some controllers require all 64 bits to be written to abort the ring.
-	 * Make sure the upper dword is valid, pointing to the next command,
-	 * avoiding corrupting the command ring pointer in case the command ring
-	 * is stopped by the time the upper dword is written.
-	 */
-	next_trb(&new_seg, &new_deq);
-	if (trb_is_link(new_deq))
-		next_trb(&new_seg, &new_deq);
+	/* Preserve RsvdP (5:4), other writable bits read 0. */
+	crcr_lo = readl(&xhci->op_regs->cmd_ring);
+	crcr_lo |= CMD_RING_ABORT;
+	writel(crcr_lo, &xhci->op_regs->cmd_ring);

-	crcr = xhci_trb_virt_to_dma(new_seg, new_deq);
-	xhci_write_64(xhci, crcr | CMD_RING_ABORT, &xhci->op_regs->cmd_ring);
+	/* In the future we should try to recover a -ETIMEDOUT with a host controller reset */
+	ret = xhci_handshake(&xhci->op_regs->cmd_ring, CMD_RING_RUNNING, 0, 3 * USEC_PER_SEC);
+	if (ret == -ETIMEDOUT) {
+		/*
+		 * Some controllers only latch 64-bit registers when the upper (63:32) bits
+		 * are written.
+		 * While the ring is running, writes to bits 63:6 and bit 0 are ignored.
+		 */
+		xhci_dbg(xhci, "Ring still running, checking if HC needs full 64-bit CRCR write\n");
+		writel(0, (void __iomem *)(&xhci->op_regs->cmd_ring) + 4);
+		ret = xhci_handshake(&xhci->op_regs->cmd_ring, CMD_RING_RUNNING, 0,
+				     3 * USEC_PER_SEC);
+	}

-	/* Section 4.6.1.2 of xHCI 1.0 spec says software should also time the
-	 * completion of the Command Abort operation. If CRR is not negated in 5
-	 * seconds then driver handles it as if host died (-ENODEV).
-	 * In the future we should distinguish between -ENODEV and -ETIMEDOUT
-	 * and try to recover a -ETIMEDOUT with a host controller reset.
-	 */
-	ret = xhci_handshake(&xhci->op_regs->cmd_ring,
-			CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
 	if (ret < 0) {
 		xhci_err(xhci, "Abort failed to stop command ring: %d\n", ret);
 		xhci_halt(xhci);
-- 
2.50.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures
  2026-03-16 14:27 [PATCH 0/2] usb: xhci: correct Command Abort bit handling Niklas Neronin
  2026-03-16 14:27 ` [PATCH 1/2] usb: xhci: fix Command Aborting Niklas Neronin
@ 2026-03-16 14:27 ` Niklas Neronin
  2026-03-22 11:36   ` Michal Pecio
  1 sibling, 1 reply; 9+ messages in thread
From: Niklas Neronin @ 2026-03-16 14:27 UTC (permalink / raw)
  To: mathias.nyman; +Cc: linux-usb, Niklas Neronin

Setting the Command Abort (CA) bit in the Command Ring Control Register
(CRCR) stops command execution by clearing the Command Ring Running (CRR).
Some controllers latch CRCR only when the upper 32 bits are written,
requiring a retry sequence when the initial lower 32-bit write does not
update CRR. While CRR=1, the controller ignores all CRP updates, so no
field other than CA may be modified.

On 64-bit architectures (CONFIG_64BIT=y), writeq() performs a single,
atomic 64-bit MMIO write. Using writeq() for the CA write ensures that all
64 bits reach the controller in one bus transaction, without giving
controller the opportunity to process the abort between 32-bit writes.

All xHCI 64-bit registers are accessed via lo_hi_writeq().
Earlier attempts to replace these with writeq()/readq() caused regressions
and were reverted [1]. The underlying cause was never identified [2].
It may have been a quirk in writeq() implementation or controller-specific
hardware behavior, both of which are likely no longer relevant after more
than a decade of kernel and hardware evolution.

To reduce risk, this change introduces writeq() only for the CA write path
while retaining the fallback upper 32-bit write. This keeps the change
contained and allows gradual validation across different hardware.

Link: https://git.kernel.org/torvalds/c/477632dff5c7 [1]
Link: https://marc.info/?t=139093294600002&r=1&w=2 [2]
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 10160e76df68..e7910ae2e488 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -499,7 +499,12 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci, unsigned long flags)
 	/* Preserve RsvdP (5:4), other writable bits read 0. */
 	crcr_lo = readl(&xhci->op_regs->cmd_ring);
 	crcr_lo |= CMD_RING_ABORT;
+
+#ifdef CONFIG_64BIT
+	writeq(crcr_lo, &xhci->op_regs->cmd_ring);
+#else
 	writel(crcr_lo, &xhci->op_regs->cmd_ring);
+#endif

 	/* In the future we should try to recover a -ETIMEDOUT with a host controller reset */
 	ret = xhci_handshake(&xhci->op_regs->cmd_ring, CMD_RING_RUNNING, 0, 3 * USEC_PER_SEC);
-- 
2.50.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] usb: xhci: fix Command Aborting
  2026-03-16 14:27 ` [PATCH 1/2] usb: xhci: fix Command Aborting Niklas Neronin
@ 2026-03-21 13:30   ` Michal Pecio
  2026-03-23 10:25     ` Neronin, Niklas
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Pecio @ 2026-03-21 13:30 UTC (permalink / raw)
  To: Niklas Neronin; +Cc: mathias.nyman, linux-usb

On Mon, 16 Mar 2026 15:27:19 +0100, Niklas Neronin wrote:
> Aborting the command ring requires setting the Command Abort (CA) bit
> in the 64-bit Command Ring Control Register (CRCR). Historically,
> several attempts have been made to implement this correctly, but each
> introduced its own problems. This patch fixes the final remaining
> issue: accidental modification of the Command Ring Pointer (i'll
> abbreviate it to CRP).
> 
> Originally [1], the full 64-bit CRCR value was read and written back
> after setting CA. This is a bit unnecessary, only RsvdP bits (5:4)
> should be read and written back (for future-proofing). All other
> writable fields read as zero.
> 
> Later patches attempted to solve an issues, caused by 64-bit writes
> being split into two 32-bit writes. Writing the lower 31:0 bits first
> immediately stopped the ring (CRR=0), and the following upper-half
> write then overwrote part of CRP with zeroes, thus corrupting the CRP.
> Patch [2] avoided this by writing only the lower 31:0 bits with CA
> set, but that broke controllers that latch 64-bit registers only when
> the upper bits are written, as reported in [3].

I too have HW which waits for the high DWORD after low DWORD write.

And I have HW which doesn't. If I write 0xdeadbeef to the high DWORD
after waiting for CRR=0, some HW will ignore the write and some will
IOMMU fault at 0xdeadbeefsomething.

The abort itself takes a few microseconds in my tests.
Is this race causing problems in the real world?

> The current solution [4] attempted to fix this by writing the full
> 64-bit CRCR with both CA and an updated CRP. This does not work. The
> patch tries to modify CRP while setting CA, but with CRR=1 all writes
> to CRP are ignored. Updating CRP requires writing only the CA bit,
> waiting for the controller to process the abort and clear CRR, and
> only then writing the full CRP value.

That's not a problem yet because we *don't* want to change anything.

> Writing a new CRP after CA clears CRR is also unsafe:
>  * TRBs are 16-byte aligned (bits 3:0 clear)
>  * CRP requires 64-byte alignment (bits 5:0 clear)
> Writing a TRB pointer into CRP truncates bits 5:4

But this is a problem and I found HW (NEC, AMD) where waiting for
CRR=0 and writing correct high DWORD causes "mismatched completion"
errors, which is almost certainly (I haven't debugged) this thing.

> For a Command Abort to succeed, the CA bit must be set without
> modifying the CRP. The following sequence ensures this:
> 
>   1. Write the lower 31:0 bits with only the CA bit set. Since CRR=1,
>      CRP write is ignored.
> 
>   2. Poll CRR. If CRR becomes 0, the abort succeeded with CRP
>      preserved.
> 
>   3. If CRR does not clear (timeout), test if controller requires an
>      upper bits write to latch the register. Write the upper 63:32
>      bits (which does not update the CRP because CRR=1).

Unless it just flipped to zero and CRP becomes corrupted as usual.

>      Then poll CRR again. If CRR becomes 0, it was a latching issue
>      and the abort succeeded with CRP preserved.

Another problem is that this slows down the double-write chips and
polls them under spinlock, which is completely crazy because it blocks
xhci_irq() and other IRQs and generally freezes the whole system.

Not sure if there is a way out of this madness without writeq(), quirks
or a completely out-of-the-box solution, like making it possible to
restore CRP. Truncation can be dealt with by ensuring that a few TRBs
before Dequeue are unused and can be made No-Ops at any time.

Regards,
Michal


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures
  2026-03-16 14:27 ` [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures Niklas Neronin
@ 2026-03-22 11:36   ` Michal Pecio
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Pecio @ 2026-03-22 11:36 UTC (permalink / raw)
  To: Niklas Neronin; +Cc: mathias.nyman, linux-usb

On Mon, 16 Mar 2026 15:27:20 +0100, Niklas Neronin wrote:
> Setting the Command Abort (CA) bit in the Command Ring Control
> Register (CRCR) stops command execution by clearing the Command Ring
> Running (CRR). Some controllers latch CRCR only when the upper 32
> bits are written, requiring a retry sequence when the initial lower
> 32-bit write does not update CRR. While CRR=1, the controller ignores
> all CRP updates, so no field other than CA may be modified.
> 
> On 64-bit architectures (CONFIG_64BIT=y), writeq() performs a single,
> atomic 64-bit MMIO write. Using writeq() for the CA write ensures
> that all 64 bits reach the controller in one bus transaction, without
> giving controller the opportunity to process the abort between 32-bit
> writes.
> 
> All xHCI 64-bit registers are accessed via lo_hi_writeq().
> Earlier attempts to replace these with writeq()/readq() caused
> regressions and were reverted [1]. The underlying cause was never
> identified [2]. It may have been a quirk in writeq() implementation
> or controller-specific hardware behavior, both of which are likely no
> longer relevant after more than a decade of kernel and hardware
> evolution.

The cause was HW misbehavior and nothing has changed. I have this
ASM1042 piece of crap here and it still behaves the same. Using
writeq() to implement xhci_write_64() breaks it - no events and no
IRQs are generated for Enable Slot commands or port connections.

This HC doesn't have AC64 capability and it's unclear if such HW can
be expected to handle writeq() or readq(). Per 5.3.6 and 5.1, upper
bits are "not implemented" or "unused" and software "should" use DWORD
writes to access registers.

> To reduce risk, this change introduces writeq() only for the CA write
> path while retaining the fallback upper 32-bit write. This keeps the
> change contained and allows gradual validation across different
> hardware.

That being said, setting the CA bit with writeq() does actually work
on ASM1042. But I haven't tested all other 32 bit HCs yet ;)

And this chip doesn't really abort commands. The operation can only
succeed if the command accidentally completes by itself. For example,
Address Device completes with Transaction Error if I unplug the device.

Later ASM chips are similar, but they added internal Address Device
timeout so the command always completes with TrErr after a few seconds.
If you try to abort it earlier (e.g. while attempting to abort the
previous command too late) then abort waits for the internal timeout. 

So command abort on ASM can take any arbitrary time and still succeed.

Regards,
Michal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] usb: xhci: fix Command Aborting
  2026-03-21 13:30   ` Michal Pecio
@ 2026-03-23 10:25     ` Neronin, Niklas
  2026-03-23 11:24       ` Michal Pecio
  0 siblings, 1 reply; 9+ messages in thread
From: Neronin, Niklas @ 2026-03-23 10:25 UTC (permalink / raw)
  To: Michal Pecio; +Cc: mathias.nyman, linux-usb



On 21/03/2026 15.30, Michal Pecio wrote:
> On Mon, 16 Mar 2026 15:27:19 +0100, Niklas Neronin wrote:
>> Aborting the command ring requires setting the Command Abort (CA) bit
>> in the 64-bit Command Ring Control Register (CRCR). Historically,
>> several attempts have been made to implement this correctly, but each
>> introduced its own problems. This patch fixes the final remaining
>> issue: accidental modification of the Command Ring Pointer (i'll
>> abbreviate it to CRP).
>>
>> Originally [1], the full 64-bit CRCR value was read and written back
>> after setting CA. This is a bit unnecessary, only RsvdP bits (5:4)
>> should be read and written back (for future-proofing). All other
>> writable fields read as zero.
>>
>> Later patches attempted to solve an issues, caused by 64-bit writes
>> being split into two 32-bit writes. Writing the lower 31:0 bits first
>> immediately stopped the ring (CRR=0), and the following upper-half
>> write then overwrote part of CRP with zeroes, thus corrupting the CRP.
>> Patch [2] avoided this by writing only the lower 31:0 bits with CA
>> set, but that broke controllers that latch 64-bit registers only when
>> the upper bits are written, as reported in [3].
> 
> I too have HW which waits for the high DWORD after low DWORD write.
> 
> And I have HW which doesn't. If I write 0xdeadbeef to the high DWORD
> after waiting for CRR=0, some HW will ignore the write and some will
> IOMMU fault at 0xdeadbeefsomething.
> 
> The abort itself takes a few microseconds in my tests.

Yes, abort completion itself is usually very fast. The 5 second timeout
comes directly from the xHCI 1.9 specification, which explicitly allows
for that duration.

4.6.1.2 Aborting a Command:
"If software doesn't see CRR negated in a timely manner (e.g. longer
 than 5 seconds), then it should assume that there are larger problems
 with the xHC and assert HCRST"

The timeout could probably be reduced to 1 second, but I did not do that
since I do not have evidence that a shorter timeout would be safe across
all hardware.

> Is this race causing problems in the real world?

I believe so, given that a fix was proposed specifically to address it:
https://lore.kernel.org/all/20211008092547.3996295-5-mathias.nyman@linux.intel.com/#t

> 
>> The current solution [4] attempted to fix this by writing the full
>> 64-bit CRCR with both CA and an updated CRP. This does not work. The
>> patch tries to modify CRP while setting CA, but with CRR=1 all writes
>> to CRP are ignored. Updating CRP requires writing only the CA bit,
>> waiting for the controller to process the abort and clear CRR, and
>> only then writing the full CRP value.
> 
> That's not a problem yet because we *don't* want to change anything.

The current implementation does attempt to change the CRP.
The sequence is as follows:
 1. Fetch the next TRB.
 2. If the TRB is a Link TRB, advance to the next TRB.
 3. Get the DMA address of that TRB.
 4. Write the lower DWORD with the new CRP address and the CA bit set.
 5. Write the upper DWORD with the new CRP address.

This results in two possible final states, depending on how quickly
CRR is cleared:
 1. The CRP is not updated at all.
 2. Only the upper DWORD of the CRP is updated.

In neither case is the CRP updated as intended by the current
implementation.

> 
>> Writing a new CRP after CA clears CRR is also unsafe:
>>  * TRBs are 16-byte aligned (bits 3:0 clear)
>>  * CRP requires 64-byte alignment (bits 5:0 clear)
>> Writing a TRB pointer into CRP truncates bits 5:4
> 
> But this is a problem and I found HW (NEC, AMD) where waiting for
> CRR=0 and writing correct high DWORD causes "mismatched completion"
> errors, which is almost certainly (I haven't debugged) this thing.

So you believe that when waiting for CRR=0 any software writes to CRCR,
which are not immediately acted on, i.e. Command Stop/Abort, are instead
buffered internally?

OR

Do you believe that the HW will update CRP, irrelevant of what CRR is?

If that interpretation is correct, it would mean that Command Stop and
Command Abort cannot be used without also updating the RCS and CRP.

> 
>> For a Command Abort to succeed, the CA bit must be set without
>> modifying the CRP. The following sequence ensures this:
>>
>>   1. Write the lower 31:0 bits with only the CA bit set. Since CRR=1,
>>      CRP write is ignored.
>>
>>   2. Poll CRR. If CRR becomes 0, the abort succeeded with CRP
>>      preserved.
>>
>>   3. If CRR does not clear (timeout), test if controller requires an
>>      upper bits write to latch the register. Write the upper 63:32
>>      bits (which does not update the CRP because CRR=1).
> 
> Unless it just flipped to zero and CRP becomes corrupted as usual.

Yes, that race still exists, but it is significantly less likely.

Thanks,
Niklas

> 
>>      Then poll CRR again. If CRR becomes 0, it was a latching issue
>>      and the abort succeeded with CRP preserved.
> 
> Another problem is that this slows down the double-write chips and
> polls them under spinlock, which is completely crazy because it blocks
> xhci_irq() and other IRQs and generally freezes the whole system.
> 
> Not sure if there is a way out of this madness without writeq(), quirks
> or a completely out-of-the-box solution, like making it possible to
> restore CRP. Truncation can be dealt with by ensuring that a few TRBs
> before Dequeue are unused and can be made No-Ops at any time.
> 
> Regards,
> Michal
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] usb: xhci: fix Command Aborting
  2026-03-23 10:25     ` Neronin, Niklas
@ 2026-03-23 11:24       ` Michal Pecio
  2026-03-23 14:00         ` Neronin, Niklas
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Pecio @ 2026-03-23 11:24 UTC (permalink / raw)
  To: Neronin, Niklas
  Cc: mathias.nyman, linux-usb, Pavankumar Kondeti, Pavankumar Kondeti

On Mon, 23 Mar 2026 12:25:00 +0200, Neronin, Niklas wrote:
> On 21/03/2026 15.30, Michal Pecio wrote:
> > On Mon, 16 Mar 2026 15:27:19 +0100, Niklas Neronin wrote:  
> >> Aborting the command ring requires setting the Command Abort (CA)
> >> bit in the 64-bit Command Ring Control Register (CRCR).
> >> Historically, several attempts have been made to implement this
> >> correctly, but each introduced its own problems. This patch fixes
> >> the final remaining issue: accidental modification of the Command
> >> Ring Pointer (i'll abbreviate it to CRP).
> >>
> >> Originally [1], the full 64-bit CRCR value was read and written
> >> back after setting CA. This is a bit unnecessary, only RsvdP bits
> >> (5:4) should be read and written back (for future-proofing). All
> >> other writable fields read as zero.
> >>
> >> Later patches attempted to solve an issues, caused by 64-bit writes
> >> being split into two 32-bit writes. Writing the lower 31:0 bits
> >> first immediately stopped the ring (CRR=0), and the following
> >> upper-half write then overwrote part of CRP with zeroes, thus
> >> corrupting the CRP. Patch [2] avoided this by writing only the
> >> lower 31:0 bits with CA set, but that broke controllers that latch
> >> 64-bit registers only when the upper bits are written, as reported
> >> in [3].  
> > 
> > I too have HW which waits for the high DWORD after low DWORD write.
> > 
> > And I have HW which doesn't. If I write 0xdeadbeef to the high DWORD
> > after waiting for CRR=0, some HW will ignore the write and some will
> > IOMMU fault at 0xdeadbeefsomething.
> > 
> > The abort itself takes a few microseconds in my tests.  
> 
> Yes, abort completion itself is usually very fast. The 5 second
> timeout comes directly from the xHCI 1.9 specification, which
> explicitly allows for that duration.
> 
> 4.6.1.2 Aborting a Command:
> "If software doesn't see CRR negated in a timely manner (e.g. longer
>  than 5 seconds), then it should assume that there are larger problems
>  with the xHC and assert HCRST"
> 
> The timeout could probably be reduced to 1 second, but I did not do
> that since I do not have evidence that a shorter timeout would be
> safe across all hardware.

As noted in my later response to patch 2/2, in rare corner cases
ASMedia command abort can take a few seconds and still succeed.

> > Is this race causing problems in the real world?  
> 
> I believe so, given that a fix was proposed specifically to address it:
> https://lore.kernel.org/all/20211008092547.3996295-5-mathias.nyman@linux.intel.com/#t

It does look like a problem which people would be unlikely to discover
without being affected by it, but the patch says nothing.

Hi Pavankumar,

Could you elaborate on the above issue, what was the affected host
controller and what sort of workloads made this race turn bad?

In theory the two DWORD writes could be separated by some NMI/SMI
hitting the CPU or by heavy traffic on the I/O bus.

And if probelms are known to happen here, then what about other users
of xhci_write_64(), for example updating ERDP aka erst_dequeue?

> The current implementation does attempt to change the CRP.
> The sequence is as follows:
>  1. Fetch the next TRB.
>  2. If the TRB is a Link TRB, advance to the next TRB.
>  3. Get the DMA address of that TRB.
>  4. Write the lower DWORD with the new CRP address and the CA bit set.
>  5. Write the upper DWORD with the new CRP address.
> 
> This results in two possible final states, depending on how quickly
> CRR is cleared:
>  1. The CRP is not updated at all.
>  2. Only the upper DWORD of the CRP is updated.

I stand corrected, I wasn't aware of some details. That said,

1. is a valid outcome and not a bug, jumping to the next TRB is not
   necessary, it was a workaround for previous issues.
2. is a bug, but it shouldn't matter because the ring has one segment

But I did have problems on NEC and AMD, which means that something else
is also possible. I think I will need to debug it deeper, but I suspect
that writing high DWORD alone is enough to trigger bit 4:5 truncation
or overwrite low DWORD with some garbage.

> So you believe that when waiting for CRR=0 any software writes to
> CRCR, which are not immediately acted on, i.e. Command Stop/Abort,
> are instead buffered internally?

We all believe that some HW does nothing and waits for high DWORD.

Other HW is seen to begin aborting more or less immediately. However,
until the abort is complete, any subsequent write to high DWORD may
be ignored because we can't change CRP while CRR=1.

Problems begin if CRR=0 before we write high DWORD. CRP bits become
writeable and any write may trigger truncation of bits 4:5.

Note that if the HW interprets this as a single write to high DWORD
separate from the previous abort operation (which it has already
completed) then this is out of spec, because we should write low-high.

It seems that my Etron HC ignores this high DWORD write after CRR=0.
I can write 0xdeadbeef there and it continues working normally.
Or maybe it's broken and should get the disable 64 bits quirk.

But NEC and AMD start getting "mismatched completion" errors, so it
seems that their CRP is corrupted somehow. And I could probably replace
waiting for CRR=0 with usleep(20) and get the same corruption.

I will later look closer what happens here.

Regards,
Michal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] usb: xhci: fix Command Aborting
  2026-03-23 11:24       ` Michal Pecio
@ 2026-03-23 14:00         ` Neronin, Niklas
  2026-03-23 21:40           ` Michal Pecio
  0 siblings, 1 reply; 9+ messages in thread
From: Neronin, Niklas @ 2026-03-23 14:00 UTC (permalink / raw)
  To: Michal Pecio
  Cc: mathias.nyman, linux-usb, Pavankumar Kondeti, Pavankumar Kondeti



On 23/03/2026 13.24, Michal Pecio wrote:
> On Mon, 23 Mar 2026 12:25:00 +0200, Neronin, Niklas wrote:
>> On 21/03/2026 15.30, Michal Pecio wrote:
>>> On Mon, 16 Mar 2026 15:27:19 +0100, Niklas Neronin wrote:  
>>>> Aborting the command ring requires setting the Command Abort (CA)
>>>> bit in the 64-bit Command Ring Control Register (CRCR).
>>>> Historically, several attempts have been made to implement this
>>>> correctly, but each introduced its own problems. This patch fixes
>>>> the final remaining issue: accidental modification of the Command
>>>> Ring Pointer (i'll abbreviate it to CRP).
>>>>
>>>> Originally [1], the full 64-bit CRCR value was read and written
>>>> back after setting CA. This is a bit unnecessary, only RsvdP bits
>>>> (5:4) should be read and written back (for future-proofing). All
>>>> other writable fields read as zero.
>>>>
>>>> Later patches attempted to solve an issues, caused by 64-bit writes
>>>> being split into two 32-bit writes. Writing the lower 31:0 bits
>>>> first immediately stopped the ring (CRR=0), and the following
>>>> upper-half write then overwrote part of CRP with zeroes, thus
>>>> corrupting the CRP. Patch [2] avoided this by writing only the
>>>> lower 31:0 bits with CA set, but that broke controllers that latch
>>>> 64-bit registers only when the upper bits are written, as reported
>>>> in [3].  
>>>
>>> I too have HW which waits for the high DWORD after low DWORD write.
>>>
>>> And I have HW which doesn't. If I write 0xdeadbeef to the high DWORD
>>> after waiting for CRR=0, some HW will ignore the write and some will
>>> IOMMU fault at 0xdeadbeefsomething.
>>>
>>> The abort itself takes a few microseconds in my tests.  
>>
>> Yes, abort completion itself is usually very fast. The 5 second
>> timeout comes directly from the xHCI 1.9 specification, which
>> explicitly allows for that duration.
>>
>> 4.6.1.2 Aborting a Command:
>> "If software doesn't see CRR negated in a timely manner (e.g. longer
>>  than 5 seconds), then it should assume that there are larger problems
>>  with the xHC and assert HCRST"
>>
>> The timeout could probably be reduced to 1 second, but I did not do
>> that since I do not have evidence that a shorter timeout would be
>> safe across all hardware.
> 
> As noted in my later response to patch 2/2, in rare corner cases
> ASMedia command abort can take a few seconds and still succeed.
> 
>>> Is this race causing problems in the real world?  
>>
>> I believe so, given that a fix was proposed specifically to address it:
>> https://lore.kernel.org/all/20211008092547.3996295-5-mathias.nyman@linux.intel.com/#t
> 
> It does look like a problem which people would be unlikely to discover
> without being affected by it, but the patch says nothing.
>
> Hi Pavankumar,
> 
> Could you elaborate on the above issue, what was the affected host
> controller and what sort of workloads made this race turn bad?
> 
> In theory the two DWORD writes could be separated by some NMI/SMI
> hitting the CPU or by heavy traffic on the I/O bus.
> 
> 
> And if probelms are known to happen here, then what about other users
> of xhci_write_64(), for example updating ERDP aka erst_dequeue?

They are not related.

The ERDP bit field is 63:4 and can therefore hold any TRB address.
It is also not gated or "locked" behind a control bit in the same register.

Other 63:6 bit fields address in xHCI are all base addresses, which are
always programmed with correctly aligned values. The only 63:6‑bit
address field that is not a base address is the Command Ring Pointer.

> 
>> The current implementation does attempt to change the CRP.
>> The sequence is as follows:
>>  1. Fetch the next TRB.
>>  2. If the TRB is a Link TRB, advance to the next TRB.
>>  3. Get the DMA address of that TRB.
>>  4. Write the lower DWORD with the new CRP address and the CA bit set.
>>  5. Write the upper DWORD with the new CRP address.
>>
>> This results in two possible final states, depending on how quickly
>> CRR is cleared:
>>  1. The CRP is not updated at all.
>>  2. Only the upper DWORD of the CRP is updated.
> 
> I stand corrected, I wasn't aware of some details. That said,
> 
> 1. is a valid outcome and not a bug, jumping to the next TRB is not
>    necessary, it was a workaround for previous issues.
> 2. is a bug, but it shouldn't matter because the ring has one segment
> 
> But I did have problems on NEC and AMD, which means that something else
> is also possible. I think I will need to debug it deeper, but I suspect
> that writing high DWORD alone is enough to trigger bit 4:5 truncation
> or overwrite low DWORD with some garbage.
> 
>> So you believe that when waiting for CRR=0 any software writes to
>> CRCR, which are not immediately acted on, i.e. Command Stop/Abort,
>> are instead buffered internally?
> 
> We all believe that some HW does nothing and waits for high DWORD.
> 
> Other HW is seen to begin aborting more or less immediately. However,
> until the abort is complete, any subsequent write to high DWORD may
> be ignored because we can't change CRP while CRR=1.
> 
> Problems begin if CRR=0 before we write high DWORD. CRP bits become
> writeable and any write may trigger truncation of bits 4:5.

Should this be bits 31:6?

> 
> Note that if the HW interprets this as a single write to high DWORD
> separate from the previous abort operation (which it has already
> completed) then this is out of spec, because we should write low-high.

How I see it; hardware which requires an upper DWORD write will "buffer"
the lower DWORD and wait for the corresponding upper DWORD write.

However, because the specification states that the CRP and RCS are ignored
while CRR=1, the lower DWORD write ignores lower CRP bits and RCS bit.

As a result, if the upper DWORD is written after CRR=0, only upper CRP bits
and RsvdP bits are updated.
Conversely, if the upper DWORD is written before CRR=0, only RsvdP bits are
updated.

> 
> It seems that my Etron HC ignores this high DWORD write after CRR=0.
> I can write 0xdeadbeef there and it continues working normally.
> Or maybe it's broken and should get the disable 64 bits quirk.
> 
> But NEC and AMD start getting "mismatched completion" errors, so it
> seems that their CRP is corrupted somehow. And I could probably replace
> waiting for CRR=0 with usleep(20) and get the same corruption.
> 
> I will later look closer what happens here.
> 
> Regards,
> Michal


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] usb: xhci: fix Command Aborting
  2026-03-23 14:00         ` Neronin, Niklas
@ 2026-03-23 21:40           ` Michal Pecio
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Pecio @ 2026-03-23 21:40 UTC (permalink / raw)
  To: Neronin, Niklas; +Cc: mathias.nyman, linux-usb, Pavankumar Kondeti

On Mon, 23 Mar 2026 16:00:54 +0200, Neronin, Niklas wrote:
> The ERDP bit field is 63:4 and can therefore hold any TRB address.

Yes, we don't have truncation problems. But what if the high DWORD
write is delayed enough that some stupid hardware doesn't wait for it
and simply overwrites just the low DWORD and assumes that it now has
the new dequeue pointer?

Not good. But also not likely to matter, it seems we only get in
trouble if the event ring overfills and the HW fails to detect this,
exactly during the short window between the two DWORD writes.

> How I see it; hardware which requires an upper DWORD write will
> "buffer" the lower DWORD and wait for the corresponding upper DWORD
> write.

Hopefully that's what it does, yes.

> However, because the specification states that the CRP and RCS are
> ignored while CRR=1, the lower DWORD write ignores lower CRP bits and
> RCS bit.

Yes, I tried writing only the CA bit and it still works.

> As a result, if the upper DWORD is written after CRR=0, only upper
> CRP bits and RsvdP bits are updated.
> Conversely, if the upper DWORD is written before CRR=0, only RsvdP
> bits are updated.

If high DWORD is written before CRR=0 (current code executing quickly
enough) then nothnig bad happens, or at least nobody complains.

If high DWORD is written after CRR=0, then it's a spec violation (we
should write low before high in the same "transaction") and low DWORD
may become corrupted on some hardware:

# command at fffd4220 completes normally
[  +0,061649] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma fffd4220 cmd_comp_code 1
# command at fffd4230 is aborted and generates Aborted event
[  +5,501058] xhci_hcd 0000:00:10.0: abort took 69464ns
[  +0,000067] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma fffd4230 cmd_comp_code 25
# then the ring stops on the next command as it should
[  +0,000029] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma fffd4240 cmd_comp_code 24
# I wrote 0xdeadbeef after CRR=0, so CRP changes
# lower DWORD is corrupted, it should be fffd4240
[  +0,207941] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma deadbeeffffd4000 cmd_comp_code 5
[  +0,000015] xhci_hcd 0000:00:10.0: ERROR mismatched command completion event
[  +0,000004] xhci_hcd 0000:00:10.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007 address=0xdeadbeeffffd4000 flags=0x0010]

This happens on NEC and a gen-1 AMD in-CPU xHCI (rumor has it that
AMD licensed this IP core from NEC/Renesas).

On Etron, the out of spec high DWORD write is completely ignored.
But if I wait for CRR=0 and write both DWORDs, then it accepts the
new pointer and IOMMU faults like NEC.

ASM1042 is 32 bit it so it ignores the upper DWORD completely.

ASM1142 first times out waiting for the upper DWORD, then accepts
the upper DWORD and IOMMU faults, without corrupting low DWORD.

ASM3142 fails to complete the abort, even on unpatched kernel.
Not sure what happens here. But this is not a usual case, normally
Address Device times out internally. I reduced command timeout to
500ms to force it to abort.

Regards,
Michal

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-23 21:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-16 14:27 [PATCH 0/2] usb: xhci: correct Command Abort bit handling Niklas Neronin
2026-03-16 14:27 ` [PATCH 1/2] usb: xhci: fix Command Aborting Niklas Neronin
2026-03-21 13:30   ` Michal Pecio
2026-03-23 10:25     ` Neronin, Niklas
2026-03-23 11:24       ` Michal Pecio
2026-03-23 14:00         ` Neronin, Niklas
2026-03-23 21:40           ` Michal Pecio
2026-03-16 14:27 ` [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures Niklas Neronin
2026-03-22 11:36   ` Michal Pecio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox