Re: [PATCH 1/2] usb: xhci: fix Command Aborting

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

From: Michal Pecio <michal.pecio@gmail.com>
To: "Neronin, Niklas" <niklas.neronin@linux.intel.com>
Cc: mathias.nyman@linux.intel.com, linux-usb@vger.kernel.org,
	Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Subject: Re: [PATCH 1/2] usb: xhci: fix Command Aborting
Date: Mon, 23 Mar 2026 22:40:48 +0100	[thread overview]
Message-ID: <20260323224048.14426237.michal.pecio@gmail.com> (raw)
In-Reply-To: <9c6625f0-64d1-4f90-ba89-72eb9a153ea1@linux.intel.com>

On Mon, 23 Mar 2026 16:00:54 +0200, Neronin, Niklas wrote:
> The ERDP bit field is 63:4 and can therefore hold any TRB address.

Yes, we don't have truncation problems. But what if the high DWORD
write is delayed enough that some stupid hardware doesn't wait for it
and simply overwrites just the low DWORD and assumes that it now has
the new dequeue pointer?

Not good. But also not likely to matter, it seems we only get in
trouble if the event ring overfills and the HW fails to detect this,
exactly during the short window between the two DWORD writes.

> How I see it; hardware which requires an upper DWORD write will
> "buffer" the lower DWORD and wait for the corresponding upper DWORD
> write.

Hopefully that's what it does, yes.

> However, because the specification states that the CRP and RCS are
> ignored while CRR=1, the lower DWORD write ignores lower CRP bits and
> RCS bit.

Yes, I tried writing only the CA bit and it still works.

> As a result, if the upper DWORD is written after CRR=0, only upper
> CRP bits and RsvdP bits are updated.
> Conversely, if the upper DWORD is written before CRR=0, only RsvdP
> bits are updated.

If high DWORD is written before CRR=0 (current code executing quickly
enough) then nothnig bad happens, or at least nobody complains.

If high DWORD is written after CRR=0, then it's a spec violation (we
should write low before high in the same "transaction") and low DWORD
may become corrupted on some hardware:

# command at fffd4220 completes normally
[  +0,061649] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma fffd4220 cmd_comp_code 1
# command at fffd4230 is aborted and generates Aborted event
[  +5,501058] xhci_hcd 0000:00:10.0: abort took 69464ns
[  +0,000067] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma fffd4230 cmd_comp_code 25
# then the ring stops on the next command as it should
[  +0,000029] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma fffd4240 cmd_comp_code 24
# I wrote 0xdeadbeef after CRR=0, so CRP changes
# lower DWORD is corrupted, it should be fffd4240
[  +0,207941] xhci_hcd 0000:00:10.0: handle_cmd_completion cmd_dma deadbeeffffd4000 cmd_comp_code 5
[  +0,000015] xhci_hcd 0000:00:10.0: ERROR mismatched command completion event
[  +0,000004] xhci_hcd 0000:00:10.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007 address=0xdeadbeeffffd4000 flags=0x0010]

This happens on NEC and a gen-1 AMD in-CPU xHCI (rumor has it that
AMD licensed this IP core from NEC/Renesas).

On Etron, the out of spec high DWORD write is completely ignored.
But if I wait for CRR=0 and write both DWORDs, then it accepts the
new pointer and IOMMU faults like NEC.

ASM1042 is 32 bit it so it ignores the upper DWORD completely.

ASM1142 first times out waiting for the upper DWORD, then accepts
the upper DWORD and IOMMU faults, without corrupting low DWORD.

ASM3142 fails to complete the abort, even on unpatched kernel.
Not sure what happens here. But this is not a usual case, normally
Address Device times out internally. I reduced command timeout to
500ms to force it to abort.

Regards,
Michal

next prev parent reply	other threads:[~2026-03-23 21:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 14:27 [PATCH 0/2] usb: xhci: correct Command Abort bit handling Niklas Neronin
2026-03-16 14:27 ` [PATCH 1/2] usb: xhci: fix Command Aborting Niklas Neronin
2026-03-21 13:30   ` Michal Pecio
2026-03-23 10:25     ` Neronin, Niklas
2026-03-23 11:24       ` Michal Pecio
2026-03-23 14:00         ` Neronin, Niklas
2026-03-23 21:40           ` Michal Pecio [this message]
2026-03-16 14:27 ` [PATCH 2/2] usb: xhci: use writeq() for CA write on 64-bit architectures Niklas Neronin
2026-03-22 11:36   ` Michal Pecio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260323224048.14426237.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=niklas.neronin@linux.intel.com \
    --cc=quic_pkondeti@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox