* [PATCH 0/4] xhci fixes for usb-linus
@ 2024-10-16 13:59 Mathias Nyman
2024-10-16 13:59 ` [PATCH 1/4] xhci: Fix incorrect stream context type macro Mathias Nyman
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Mathias Nyman @ 2024-10-16 13:59 UTC (permalink / raw)
To: gregkh; +Cc: linux-usb, Mathias Nyman
Hi Greg
A few xhci fixes for usb-linus to better handle errors during transfer
events and URB cancel.
Also fixes an issue in how DbC might squash data inteded to be separate
bulk transfers into one single transfer.
Mathias Nyman (3):
xhci: Fix incorrect stream context type macro
xhci: Mitigate failed set dequeue pointer commands
xhci: dbc: honor usb transfer size boundaries.
Michal Pecio (1):
usb: xhci: Fix handling errors mid TD followed by other errors
drivers/usb/host/xhci-dbgcap.h | 1 +
drivers/usb/host/xhci-dbgtty.c | 55 ++++++++++++++++++++++++---
drivers/usb/host/xhci-ring.c | 68 +++++++++++++++-------------------
drivers/usb/host/xhci.h | 2 +-
4 files changed, 82 insertions(+), 44 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/4] xhci: Fix incorrect stream context type macro
2024-10-16 13:59 [PATCH 0/4] xhci fixes for usb-linus Mathias Nyman
@ 2024-10-16 13:59 ` Mathias Nyman
2024-10-16 13:59 ` [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands Mathias Nyman
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Mathias Nyman @ 2024-10-16 13:59 UTC (permalink / raw)
To: gregkh; +Cc: linux-usb, Mathias Nyman, stable
The stream contex type (SCT) bitfield is used both in the stream context
data structure, and in the 'Set TR Dequeue pointer' command TRB.
In both cases it uses bits 3:1
The SCT_FOR_TRB(p) macro used to set the stream context type (SCT) field
for the 'Set TR Dequeue pointer' command TRB incorrectly shifts the value
1 bit left before masking the three bits.
Fix this by first masking and rshifting, just like the similar
SCT_FOR_CTX(p) macro does
This issue has not been visibile as the lost bit 3 is only used with
secondary stream arrays (SSA). Xhci driver currently only supports using
a primary stream array with Linear stream addressing.
Fixes: 95241dbdf828 ("xhci: Set SCT field for Set TR dequeue on streams")
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 620502de971a..f0fb696d5619 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1001,7 +1001,7 @@ enum xhci_setup_dev {
/* Set TR Dequeue Pointer command TRB fields, 6.4.3.9 */
#define TRB_TO_STREAM_ID(p) ((((p) & (0xffff << 16)) >> 16))
#define STREAM_ID_FOR_TRB(p) ((((p)) & 0xffff) << 16)
-#define SCT_FOR_TRB(p) (((p) << 1) & 0x7)
+#define SCT_FOR_TRB(p) (((p) & 0x7) << 1)
/* Link TRB specific fields */
#define TRB_TC (1<<1)
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands
2024-10-16 13:59 [PATCH 0/4] xhci fixes for usb-linus Mathias Nyman
2024-10-16 13:59 ` [PATCH 1/4] xhci: Fix incorrect stream context type macro Mathias Nyman
@ 2024-10-16 13:59 ` Mathias Nyman
2024-10-17 6:40 ` Michał Pecio
2024-10-16 13:59 ` [PATCH 3/4] usb: xhci: Fix handling errors mid TD followed by other errors Mathias Nyman
2024-10-16 14:00 ` [PATCH 4/4] xhci: dbc: honor usb transfer size boundaries Mathias Nyman
3 siblings, 1 reply; 9+ messages in thread
From: Mathias Nyman @ 2024-10-16 13:59 UTC (permalink / raw)
To: gregkh; +Cc: linux-usb, Mathias Nyman, stable
Avoid xHC host from processing a cancelled URB by always turning
cancelled URB TDs into no-op TRBs before queuing a 'Set TR Deq' command.
If the command fails then xHC will start processing the cancelled TD
instead of skipping it once endpoint is restarted, causing issues like
Babble error.
This is not a complete solution as a failed 'Set TR Deq' command does not
guarantee xHC TRB caches are cleared.
Fixes: 4db356924a50 ("xhci: turn cancelled td cleanup to its own function")
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4d664ba53fe9..7dedf31bbddd 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1023,7 +1023,7 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
td_to_noop(xhci, ring, cached_td, false);
cached_td->cancel_status = TD_CLEARED;
}
-
+ td_to_noop(xhci, ring, td, false);
td->cancel_status = TD_CLEARING_CACHE;
cached_td = td;
break;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/4] usb: xhci: Fix handling errors mid TD followed by other errors
2024-10-16 13:59 [PATCH 0/4] xhci fixes for usb-linus Mathias Nyman
2024-10-16 13:59 ` [PATCH 1/4] xhci: Fix incorrect stream context type macro Mathias Nyman
2024-10-16 13:59 ` [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands Mathias Nyman
@ 2024-10-16 13:59 ` Mathias Nyman
2024-10-16 14:00 ` [PATCH 4/4] xhci: dbc: honor usb transfer size boundaries Mathias Nyman
3 siblings, 0 replies; 9+ messages in thread
From: Mathias Nyman @ 2024-10-16 13:59 UTC (permalink / raw)
To: gregkh; +Cc: linux-usb, Michal Pecio, Mathias Nyman
From: Michal Pecio <michal.pecio@gmail.com>
Some host controllers fail to produce the final completion event on an
isochronous TD which experienced an error mid TD. We deal with it by
flagging such TDs and checking if the next event points at the flagged
TD or at the next one, and giving back the flagged TD if the latter.
This is not enough, because the next TD may be missed by the xHC. Or
there may be no next TD but a ring underrun. We also need to get such
TD quickly out of the way, or errors on later TDs may be handled wrong.
If the next TD experiences a Missed Service Error, we will set the skip
flag on the endpoint and then attempt skipping TDs when yet another
event arrives. In such scenario, we ought to report the 'error mid TD'
transfer as such rather than skip it.
Another problem case are Stopped events. If we see one after an error
mid TD, we naively assume that it's a Force Stopped Event because it
doesn't match the pending TD, but in reality it might be an ordinary
Stopped event for the next TD, which we fail to recognize and handle.
Fix this by moving error mid TD handling before the whole TD skipping
loop. Remove unnecessary conditions, always give back the TD if the new
event points to any TRB outside it or if the pointer is NULL, as may be
the case in Ring Underrun and Overrun events on 1st gen hardware. Only
if the pending TD isn't flagged, consider other actions like skipping.
As a side effect of reordering with skip and FSE cases, error mid TD is
reordered with last_td_was_short check. This is harmless, because the
two cases are mutually exclusive - only one can happen in any given run
of handle_tx_event().
Tested on the NEC host and a USB camera with flaky cable. Dynamic debug
confirmed that Transaction Errors are sometimes seen, sometimes mid-TD,
sometimes followed by Missed Service. In such cases, they were finished
properly before skipping began.
[Rebase on 6.12-rc1 -Mathias]
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 66 ++++++++++++++++--------------------
1 file changed, 29 insertions(+), 37 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 7dedf31bbddd..b6eb928e260f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2775,6 +2775,29 @@ static int handle_tx_event(struct xhci_hcd *xhci,
return 0;
}
+ /*
+ * xhci 4.10.2 states isoc endpoints should continue
+ * processing the next TD if there was an error mid TD.
+ * So host like NEC don't generate an event for the last
+ * isoc TRB even if the IOC flag is set.
+ * xhci 4.9.1 states that if there are errors in mult-TRB
+ * TDs xHC should generate an error for that TRB, and if xHC
+ * proceeds to the next TD it should genete an event for
+ * any TRB with IOC flag on the way. Other host follow this.
+ *
+ * We wait for the final IOC event, but if we get an event
+ * anywhere outside this TD, just give it back already.
+ */
+ td = list_first_entry_or_null(&ep_ring->td_list, struct xhci_td, td_list);
+
+ if (td && td->error_mid_td && !trb_in_td(xhci, td, ep_trb_dma, false)) {
+ xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
+ ep_ring->dequeue = td->last_trb;
+ ep_ring->deq_seg = td->last_trb_seg;
+ inc_deq(xhci, ep_ring);
+ xhci_td_cleanup(xhci, td, ep_ring, td->status);
+ }
+
if (list_empty(&ep_ring->td_list)) {
/*
* Don't print wanings if ring is empty due to a stopped endpoint generating an
@@ -2836,44 +2859,13 @@ static int handle_tx_event(struct xhci_hcd *xhci,
return 0;
}
- /*
- * xhci 4.10.2 states isoc endpoints should continue
- * processing the next TD if there was an error mid TD.
- * So host like NEC don't generate an event for the last
- * isoc TRB even if the IOC flag is set.
- * xhci 4.9.1 states that if there are errors in mult-TRB
- * TDs xHC should generate an error for that TRB, and if xHC
- * proceeds to the next TD it should genete an event for
- * any TRB with IOC flag on the way. Other host follow this.
- * So this event might be for the next TD.
- */
- if (td->error_mid_td &&
- !list_is_last(&td->td_list, &ep_ring->td_list)) {
- struct xhci_td *td_next = list_next_entry(td, td_list);
-
- ep_seg = trb_in_td(xhci, td_next, ep_trb_dma, false);
- if (ep_seg) {
- /* give back previous TD, start handling new */
- xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
- ep_ring->dequeue = td->last_trb;
- ep_ring->deq_seg = td->last_trb_seg;
- inc_deq(xhci, ep_ring);
- xhci_td_cleanup(xhci, td, ep_ring, td->status);
- td = td_next;
- }
- }
-
- if (!ep_seg) {
- /* HC is busted, give up! */
- xhci_err(xhci,
- "ERROR Transfer event TRB DMA ptr not "
- "part of current TD ep_index %d "
- "comp_code %u\n", ep_index,
- trb_comp_code);
- trb_in_td(xhci, td, ep_trb_dma, true);
+ /* HC is busted, give up! */
+ xhci_err(xhci,
+ "ERROR Transfer event TRB DMA ptr not part of current TD ep_index %d comp_code %u\n",
+ ep_index, trb_comp_code);
+ trb_in_td(xhci, td, ep_trb_dma, true);
- return -ESHUTDOWN;
- }
+ return -ESHUTDOWN;
}
if (ep->skip) {
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/4] xhci: dbc: honor usb transfer size boundaries.
2024-10-16 13:59 [PATCH 0/4] xhci fixes for usb-linus Mathias Nyman
` (2 preceding siblings ...)
2024-10-16 13:59 ` [PATCH 3/4] usb: xhci: Fix handling errors mid TD followed by other errors Mathias Nyman
@ 2024-10-16 14:00 ` Mathias Nyman
3 siblings, 0 replies; 9+ messages in thread
From: Mathias Nyman @ 2024-10-16 14:00 UTC (permalink / raw)
To: gregkh; +Cc: linux-usb, Mathias Nyman, Uday M Bhat, Łukasz Bartosik,
stable
Treat each completed full size write to /dev/ttyDBC0 as a separate usb
transfer. Make sure the size of the TRBs matches the size of the tty
write by first queuing as many max packet size TRBs as possible up to
the last TRB which will be cut short to match the size of the tty write.
This solves an issue where userspace writes several transfers back to
back via /dev/ttyDBC0 into a kfifo before dbgtty can find available
request to turn that kfifo data into TRBs on the transfer ring.
The boundary between transfer was lost as xhci-dbgtty then turned
everyting in the kfifo into as many 'max packet size' TRBs as possible.
DbC would then send more data to the host than intended for that
transfer, causing host to issue a babble error.
Refuse to write more data to kfifo until previous tty write data is
turned into properly sized TRBs with data size boundaries matching tty
write size
Tested-by: Uday M Bhat <uday.m.bhat@intel.com>
Tested-by: Łukasz Bartosik <ukaszb@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci-dbgcap.h | 1 +
drivers/usb/host/xhci-dbgtty.c | 55 ++++++++++++++++++++++++++++++----
2 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-dbgcap.h b/drivers/usb/host/xhci-dbgcap.h
index 8ec813b6e9fd..9dc8f4d8077c 100644
--- a/drivers/usb/host/xhci-dbgcap.h
+++ b/drivers/usb/host/xhci-dbgcap.h
@@ -110,6 +110,7 @@ struct dbc_port {
struct tasklet_struct push;
struct list_head write_pool;
+ unsigned int tx_boundary;
bool registered;
};
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index b8e78867e25a..d719c16ea30b 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -24,6 +24,29 @@ static inline struct dbc_port *dbc_to_port(struct xhci_dbc *dbc)
return dbc->priv;
}
+static unsigned int
+dbc_kfifo_to_req(struct dbc_port *port, char *packet)
+{
+ unsigned int len;
+
+ len = kfifo_len(&port->port.xmit_fifo);
+
+ if (len == 0)
+ return 0;
+
+ len = min(len, DBC_MAX_PACKET);
+
+ if (port->tx_boundary)
+ len = min(port->tx_boundary, len);
+
+ len = kfifo_out(&port->port.xmit_fifo, packet, len);
+
+ if (port->tx_boundary)
+ port->tx_boundary -= len;
+
+ return len;
+}
+
static int dbc_start_tx(struct dbc_port *port)
__releases(&port->port_lock)
__acquires(&port->port_lock)
@@ -36,7 +59,7 @@ static int dbc_start_tx(struct dbc_port *port)
while (!list_empty(pool)) {
req = list_entry(pool->next, struct dbc_request, list_pool);
- len = kfifo_out(&port->port.xmit_fifo, req->buf, DBC_MAX_PACKET);
+ len = dbc_kfifo_to_req(port, req->buf);
if (len == 0)
break;
do_tty_wake = true;
@@ -200,14 +223,32 @@ static ssize_t dbc_tty_write(struct tty_struct *tty, const u8 *buf,
{
struct dbc_port *port = tty->driver_data;
unsigned long flags;
+ unsigned int written = 0;
spin_lock_irqsave(&port->port_lock, flags);
- if (count)
- count = kfifo_in(&port->port.xmit_fifo, buf, count);
- dbc_start_tx(port);
+
+ /*
+ * Treat tty write as one usb transfer. Make sure the writes are turned
+ * into TRB request having the same size boundaries as the tty writes.
+ * Don't add data to kfifo before previous write is turned into TRBs
+ */
+ if (port->tx_boundary) {
+ spin_unlock_irqrestore(&port->port_lock, flags);
+ return 0;
+ }
+
+ if (count) {
+ written = kfifo_in(&port->port.xmit_fifo, buf, count);
+
+ if (written == count)
+ port->tx_boundary = kfifo_len(&port->port.xmit_fifo);
+
+ dbc_start_tx(port);
+ }
+
spin_unlock_irqrestore(&port->port_lock, flags);
- return count;
+ return written;
}
static int dbc_tty_put_char(struct tty_struct *tty, u8 ch)
@@ -241,6 +282,10 @@ static unsigned int dbc_tty_write_room(struct tty_struct *tty)
spin_lock_irqsave(&port->port_lock, flags);
room = kfifo_avail(&port->port.xmit_fifo);
+
+ if (port->tx_boundary)
+ room = 0;
+
spin_unlock_irqrestore(&port->port_lock, flags);
return room;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands
2024-10-16 13:59 ` [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands Mathias Nyman
@ 2024-10-17 6:40 ` Michał Pecio
2024-10-17 13:10 ` Mathias Nyman
0 siblings, 1 reply; 9+ messages in thread
From: Michał Pecio @ 2024-10-17 6:40 UTC (permalink / raw)
To: mathias.nyman; +Cc: gregkh, linux-usb, stable
> Avoid xHC host from processing a cancelled URB by always turning
> cancelled URB TDs into no-op TRBs before queuing a 'Set TR Deq'
> command.
>
> If the command fails then xHC will start processing the cancelled TD
> instead of skipping it once endpoint is restarted, causing issues like
> Babble error.
>
> This is not a complete solution as a failed 'Set TR Deq' command does
> not guarantee xHC TRB caches are cleared.
Hmm, wouldn't a long and partially cached TD basically become corrupted
by this overwrite?
For instance, No Op following a chain bit TRB is prohibited by 4.11.7.
4.11.5.1 even goes as far as saying that there are no constraints on
the order in which TRBs are fetched from the ring, not sure how much
"out of order" it can be and if a cached TD could be left with a hole?
If the reason of Set TR Deq failure is an earlier Stop Endpoint failure,
the xHC is executing this TD right now. Or maybe the next one - I guess
the driver already risks UB when it misses any Stop EP failure.
If it didn't fail, xHC may store some "state" which allows it to restart
a TRB stopped in the middle. It might not expect the TRB to change.
Actually, it would *almost* be better to deal with it by simply leaving
the TRB on the ring and waiting for it to complete. Problem is when it
doesn't execute soon, or ever, leaving the urb_dequeue() caller hanging.
Regards,
Michal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands
2024-10-17 6:40 ` Michał Pecio
@ 2024-10-17 13:10 ` Mathias Nyman
2024-10-17 16:14 ` Michał Pecio
0 siblings, 1 reply; 9+ messages in thread
From: Mathias Nyman @ 2024-10-17 13:10 UTC (permalink / raw)
To: Michał Pecio; +Cc: gregkh, linux-usb, stable
On 17.10.2024 9.40, Michał Pecio wrote:
>> Avoid xHC host from processing a cancelled URB by always turning
>> cancelled URB TDs into no-op TRBs before queuing a 'Set TR Deq'
>> command.
>>
>> If the command fails then xHC will start processing the cancelled TD
>> instead of skipping it once endpoint is restarted, causing issues like
>> Babble error.
>>
>> This is not a complete solution as a failed 'Set TR Deq' command does
>> not guarantee xHC TRB caches are cleared.
>
> Hmm, wouldn't a long and partially cached TD basically become corrupted
> by this overwrite?
Unlikely but not impossible.
We already turn all cancelled TDs that we don't stop on into no-ops, so those
would already now experience the same problem.
We stopped the endpoint, and issued a 'Set TR deq' command which is supposed
to clear xHC TRB cache. I find it hard to believe xHC would continue
by caching some select TRBs of a TD to cache.
But lets say we end up corrupting the TD. It might still be better than
allowing xHC to process the TRBs and write to DMA addresses that might be
freed/reused already.
>
> For instance, No Op following a chain bit TRB is prohibited by 4.11.7.
>
> 4.11.5.1 even goes as far as saying that there are no constraints on
> the order in which TRBs are fetched from the ring, not sure how much
> "out of order" it can be and if a cached TD could be left with a hole?
>
> If the reason of Set TR Deq failure is an earlier Stop Endpoint failure,
> the xHC is executing this TD right now. Or maybe the next one - I guess
> the driver already risks UB when it misses any Stop EP failure.
>
> If it didn't fail, xHC may store some "state" which allows it to restart
> a TRB stopped in the middle. It might not expect the TRB to change.
This should not be an issue.
We don't queue a 'Set TR Deq' command if we intend to continue processing
a stopped TD, as the 'Set TR Deq' is designed to dump all transfer related
state of the endpoint.
>
>
> Actually, it would *almost* be better to deal with it by simply leaving
> the TRB on the ring and waiting for it to complete. Problem is when it
> doesn't execute soon, or ever, leaving the urb_dequeue() caller hanging.
We need to give back the cancelled URB at some point, and 'Set TR Deq'
command completion is the latest reasonable place to do it.
After this we should prevent xHC hw from accessing URB DMA pointers.
Thanks
Mathias
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands
2024-10-17 13:10 ` Mathias Nyman
@ 2024-10-17 16:14 ` Michał Pecio
2024-10-18 9:59 ` Mathias Nyman
0 siblings, 1 reply; 9+ messages in thread
From: Michał Pecio @ 2024-10-17 16:14 UTC (permalink / raw)
To: Mathias Nyman; +Cc: gregkh, linux-usb, stable
On Thu, 17 Oct 2024 16:10:39 +0300, Mathias Nyman wrote:
> > Hmm, wouldn't a long and partially cached TD basically become
> > corrupted by this overwrite?
>
> Unlikely but not impossible.
> We already turn all cancelled TDs that we don't stop on into no-ops,
> so those would already now experience the same problem.
No, I think they wouldn't. Note in xHCI 1.2, 4.6.9, on page 135 states
clearly that xHC shall invalidate cached TRBs besides the current TD.
Same page, point 3, mentions that software "may not modify" the current
TD, whatever on earth is that supposed to mean. Unfortunately, I can't
find a clear "shall not" in 4.6.9, but I would see it as such.
> We stopped the endpoint, and issued a 'Set TR deq' command which is
> supposed to clear xHC TRB cache. I find it hard to believe xHC would
> continue by caching some select TRBs of a TD to cache.
The idea is, if Set TR Deq fails, the xHC preserves transfer state and
cache and tries to continue. If the TD wasn't fully cached when the xHC
stopped, it remains incomplete. Missing TRBs will be filled with No Ops
when it restarts, yielding an ivalid TD (e.g. No Op chained at the end).
So it may turn out that instead of "EP TRB ptr not part of current TD"
something else would show up, perhaps TRB Errors.
> But lets say we end up corrupting the TD. It might still be better
> than allowing xHC to process the TRBs and write to DMA addresses that
> might be freed/reused already.
There is some truth to that, I guess. It's bummer that those bugs are
here in the first place and no one seems to know where they come from.
Was this tested on HW? I suppose it wouldn't be hard to corrupt a Set
TR Deq command to make it fail, stream 0xffff or something like that.
It may be harder to come up with a realistic test case with long TDs.
Regards,
Michal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands
2024-10-17 16:14 ` Michał Pecio
@ 2024-10-18 9:59 ` Mathias Nyman
0 siblings, 0 replies; 9+ messages in thread
From: Mathias Nyman @ 2024-10-18 9:59 UTC (permalink / raw)
To: Michał Pecio; +Cc: gregkh, linux-usb, stable, Marc SCHAEFER
On 17.10.2024 19.14, Michał Pecio wrote:
> On Thu, 17 Oct 2024 16:10:39 +0300, Mathias Nyman wrote:
>>> Hmm, wouldn't a long and partially cached TD basically become
>>> corrupted by this overwrite?
>>
>> Unlikely but not impossible.
>> We already turn all cancelled TDs that we don't stop on into no-ops,
>> so those would already now experience the same problem.
>
> No, I think they wouldn't. Note in xHCI 1.2, 4.6.9, on page 135 states
> clearly that xHC shall invalidate cached TRBs besides the current TD.
>
> Same page, point 3, mentions that software "may not modify" the current
> TD, whatever on earth is that supposed to mean. Unfortunately, I can't
> find a clear "shall not" in 4.6.9, but I would see it as such.
>
Ok, I think we are talking about two different things here.
Point 3 you mentioned is about modifying TDs on the ring, and then continue.
And you are right, xHC should in this case invalidate all future TDs, but
not the current one it stopped on.
I'm talking about point 2, about aborting the current TD where we know
we are queuing a "Set TR Deq" command. Same section states that
Set TD Deq may be used to force xHC to dump any internal state it has for
the ring.
>> We stopped the endpoint, and issued a 'Set TR deq' command which is
>> supposed to clear xHC TRB cache. I find it hard to believe xHC would
>> continue by caching some select TRBs of a TD to cache.
>
> The idea is, if Set TR Deq fails, the xHC preserves transfer state and
> cache and tries to continue. If the TD wasn't fully cached when the xHC
> stopped, it remains incomplete. Missing TRBs will be filled with No Ops
> when it restarts, yielding an ivalid TD (e.g. No Op chained at the end).
>
> So it may turn out that instead of "EP TRB ptr not part of current TD"
> something else would show up, perhaps TRB Errors.
If this is how xHC behaves on failed Set TR Deq commands, then yes,
TRB errors are possible.
But if xHC does clear TD cache on failed Set TR Deq command then it's
smooth sailing.
If we don't turn the TD to no-op then xHC is more likely to write to
freed DMA address in both cases above, which I think is worse.
>
>> But lets say we end up corrupting the TD. It might still be better
>> than allowing xHC to process the TRBs and write to DMA addresses that
>> might be freed/reused already.
>
> There is some truth to that, I guess. It's bummer that those bugs are
> here in the first place and no one seems to know where they come from.
>
>
> Was this tested on HW? I suppose it wouldn't be hard to corrupt a Set
> TR Deq command to make it fail, stream 0xffff or something like that.
> It may be harder to come up with a realistic test case with long TDs.
Unfortunately no, this patch is an attempt to mitigate the issue seen in
"Strange issues with USB device" [1]. That discussion continued off-list
with a lot more testing and debugging, but I ran out of testing goodwill
before I came up with this partial solution.
1. https://lore.kernel.org/linux-usb/ZsjgmCjHdzck9UKd@alphanet.ch/
Thanks
Mathias
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-10-18 9:58 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 13:59 [PATCH 0/4] xhci fixes for usb-linus Mathias Nyman
2024-10-16 13:59 ` [PATCH 1/4] xhci: Fix incorrect stream context type macro Mathias Nyman
2024-10-16 13:59 ` [PATCH 2/4] xhci: Mitigate failed set dequeue pointer commands Mathias Nyman
2024-10-17 6:40 ` Michał Pecio
2024-10-17 13:10 ` Mathias Nyman
2024-10-17 16:14 ` Michał Pecio
2024-10-18 9:59 ` Mathias Nyman
2024-10-16 13:59 ` [PATCH 3/4] usb: xhci: Fix handling errors mid TD followed by other errors Mathias Nyman
2024-10-16 14:00 ` [PATCH 4/4] xhci: dbc: honor usb transfer size boundaries Mathias Nyman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox