* xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
@ 2015-07-18 14:49 Arkadiusz Miskiewicz
2015-07-20 20:13 ` Arkadiusz Miskiewicz
0 siblings, 1 reply; 10+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-07-18 14:49 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-usb
Hi.
I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file from
usb storage (sata disk behind sata-usb bridge or pendrive; hapens in both cases)
copying process hangs just early after start with:
[ 77.372137] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[ 77.388945] usb 2-1: New USB device found, idVendor=1f75, idProduct=0611
[ 77.388952] usb 2-1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
[ 77.388956] usb 2-1: SerialNumber: 20130514
[ 77.402599] usb-storage 2-1:1.0: USB Mass Storage device detected
[ 77.403177] scsi host6: usb-storage 2-1:1.0
[ 77.403318] usbcore: registered new interface driver usb-storage
[ 77.407529] usbcore: registered new interface driver uas
[ 78.400954] scsi 6:0:0:0: scsi scan: INQUIRY result too short (5), using 36
[ 78.400961] scsi 6:0:0:0: Direct-Access Hitachi HDS723020BLA642 PQ: 0 ANSI: 0
[ 78.401401] sd 6:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 78.402126] sd 6:0:0:0: [sdb] Write Protect is off
[ 78.402130] sd 6:0:0:0: [sdb] Mode Sense: 3b 00 00 00
[ 78.402876] sd 6:0:0:0: [sdb] No Caching mode page found
[ 78.402882] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[ 78.444310] sd 6:0:0:0: [sdb] Attached SCSI disk
[ 85.907972] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: (null)
[ 113.556376] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 113.556383] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffa4000 trb-start 00000000fffa5fe0 trb-end 00000000fffa6000 seg-start 00000000fffa5000 seg-end 00000000fffa5ff0
[ 234.236311] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[ 234.890862] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 234.890869] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff94000 trb-start 00000000fff95fd0 trb-end 00000000fff96000 seg-start 00000000fff95000 seg-end 00000000fff95ff0
[ 355.339935] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[ 355.574012] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 355.574018] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff84000 trb-start 00000000fff85fb0 trb-end 00000000fff86000 seg-start 00000000fff85000 seg-end 00000000fff85ff0
[ 476.430339] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[ 476.728729] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 476.728738] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff74000 trb-start 00000000fff75fe0 trb-end 00000000fff76000 seg-start 00000000fff75000 seg-end 00000000fff75ff0
3.19.3 works fine
4.0.8 fails
4.2.0-rc2-00077-gf760b87 fails
There was some similar thread in march 2015 but the issue there got resolved
by reverting one usb patch, so my issue has to be different (that revert is included
in 4.0 final I think).
.config -> http://sprunge.us/bACC
lsusb -> http://sprunge.us/DOWb
dmesg -> http://sprunge.us/UbTF
machine is dell xps 15 9530
--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-18 14:49 xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 Arkadiusz Miskiewicz
@ 2015-07-20 20:13 ` Arkadiusz Miskiewicz
2015-07-21 14:45 ` Mathias Nyman
0 siblings, 1 reply; 10+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-07-20 20:13 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-usb
On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> Hi.
>
> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
> from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
> both cases) copying process hangs just early after start with:
Looks like suspend & resume is enough. Reloading bluetooth firmware done by kernel
triggers problem:
[ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
[ 106.313280] PM: resume of devices complete after 3003.032 msecs
[ 106.314079] Restarting tasks ... done.
[ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
[ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
[ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3000 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.400396] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.400402] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3030 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.402225] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.402228] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3060 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.404401] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.404408] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3090 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.406229] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.406232] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd30c0 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.408389] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.408395] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd30f0 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.410291] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.410294] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3120 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.412427] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.412433] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3150 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[ 106.414315] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.414318] xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3180 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000 seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
[...]
http://sprunge.us/IDUh
--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-20 20:13 ` Arkadiusz Miskiewicz
@ 2015-07-21 14:45 ` Mathias Nyman
2015-07-22 14:12 ` Arkadiusz Miskiewicz
0 siblings, 1 reply; 10+ messages in thread
From: Mathias Nyman @ 2015-07-21 14:45 UTC (permalink / raw)
To: arekm, linux-kernel; +Cc: linux-usb
[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]
On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>> Hi.
>>
>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
>> from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
>> both cases) copying process hangs just early after start with:
>
> Looks like suspend & resume is enough. Reloading bluetooth firmware done by kernel
> triggers problem:
>
> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
> [ 106.314079] Restarting tasks ... done.
> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
> [ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
Looks like we get an event for a really old transfer for some reason, it should probably be handled already.
I got a patch that adds more paranoid checks for TRB cancel, which has been one major reasons
for the "Transfer event TRB DMA ptr not part of current TD" Errors. It also adds some logging
to show what's went wrong. (patch attached, applies on 4.2-rc3) Can you see if it helps?
If it doesn't, then adding xhci debugging could give us some clue:
echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you could
bisect it?
Thanks
-Mathias
[-- Attachment #2: 0001-xhci-Don-t-touch-URB-TD-memory-if-they-are-no-longer.patch --]
[-- Type: text/x-patch, Size: 5425 bytes --]
>From cd27574451792569dff002ab33148cbaf9d52faf Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Tue, 25 Nov 2014 17:35:27 +0200
Subject: [PATCH] xhci: Don't touch URB TD memory if they are no longer on the
endpoint ring
If a URB is cancelled we want to make sure the URB's TRBs still point
to memory on the endpoint ring. If the ring was already dropped then that
TRB may point to memory already in use by another ring, or freed.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 33 ++++++++++++++++++++++++++++++---
drivers/usb/host/xhci.c | 29 ++++++++++++++++++++++++++++-
drivers/usb/host/xhci.h | 1 +
3 files changed, 59 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..1e46d4f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -136,6 +136,25 @@ static void next_trb(struct xhci_hcd *xhci,
}
}
+/* check if the TD is on the ring */
+bool xhci_td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
+{
+ struct xhci_segment *seg;
+
+ if (!td->start_seg || !ring || !ring->first_seg)
+ return false;
+
+ seg = ring->first_seg;
+ do {
+ if (td->start_seg == seg)
+ return true;
+ seg = seg->next;
+ } while (seg != ring->first_seg);
+
+ return false;
+}
+
+
/*
* See Cycle bit rules. SW is the consumer for the event ring only.
* Don't make a ring full of link TRBs. That would be dumb and this would loop.
@@ -685,10 +704,16 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
cur_td->urb->stream_id);
goto remove_finished_td;
}
- /*
- * If we stopped on the TD we need to cancel, then we have to
- * move the xHC endpoint ring dequeue pointer past this TD.
+ /* In case ring was dropped and segments freed or cached we
+ * don't want to touch that memory anymore, or, if we stopped
+ * on the TD we want to remove we need to move the dq pointer
+ * past this TD, otherwise just turn TD to no-op
*/
+ if (!xhci_td_on_ring(cur_td, ep_ring)) {
+ xhci_err(xhci, "Cancelled TD not on stopped ring\n");
+ goto remove_finished_td;
+ }
+
if (cur_td == ep->stopped_td)
xhci_find_new_dequeue_state(xhci, slot_id, ep_index,
cur_td->urb->stream_id,
@@ -1295,11 +1320,13 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
/* Is the command ring deq ptr out of sync with the deq seg ptr? */
if (cmd_dequeue_dma == 0) {
xhci->error_bitmask |= 1 << 4;
+ xhci_err(xhci, "Command completion ptr and seg out of sync\n");
return;
}
/* Does the DMA address match our internal dequeue pointer address? */
if (cmd_dma != (u64) cmd_dequeue_dma) {
xhci->error_bitmask |= 1 << 5;
+ xhci_err(xhci, "Command completion DMA address mismatch\n");
return;
}
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 7da0d60..d72b46e 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1527,6 +1527,7 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
struct xhci_ring *ep_ring;
struct xhci_virt_ep *ep;
struct xhci_command *command;
+ bool remove_td_from_ring = false;
xhci = hcd_to_xhci(hcd);
spin_lock_irqsave(&xhci->lock, flags);
@@ -1587,9 +1588,28 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
urb_priv->td[i]->start_seg,
urb_priv->td[i]->first_trb));
+ /* make sure the TDs of the URB are still on the endpoint ring */
for (; i < urb_priv->length; i++) {
td = urb_priv->td[i];
- list_add_tail(&td->cancelled_td_list, &ep->cancelled_td_list);
+ if (xhci_td_on_ring(td, ep_ring)) {
+ list_add_tail(&td->cancelled_td_list,
+ &ep->cancelled_td_list);
+ remove_td_from_ring = true;
+ } else {
+ xhci_err(xhci, "Cancel URB NOT on current ring\n");
+ if (!list_empty(&td->td_list))
+ list_del_init(&td->td_list);
+ }
+ }
+
+
+ /* No need to stop the ring to remove the TD if it isn't on the ring */
+ if (!remove_td_from_ring) {
+ xhci_urb_free_priv(urb_priv);
+ usb_hcd_unlink_urb_from_ep(hcd, urb);
+ spin_unlock_irqrestore(&xhci->lock, flags);
+ usb_hcd_giveback_urb(hcd, urb, 0);
+ return ret;
}
/* Queue a stop endpoint command, but only if this is
@@ -3555,6 +3575,13 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, struct usb_device *udev)
ep->ep_state &= ~EP_HAS_STREAMS;
}
+ if (!list_empty(&ep->cancelled_td_list))
+ xhci_err(xhci, "Error unhandled cancelled TD's after dev reset\n");
+
+ if (ep->ring && !list_empty(&ep->ring->td_list))
+ xhci_err(xhci, "Error unhandled TD's after dev reset\n");
+
+
if (ep->ring) {
xhci_free_or_cache_endpoint_ring(xhci, virt_dev, i);
last_freed_endpoint = i;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 31e46cc..efb504c 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1808,6 +1808,7 @@ void xhci_reset_bandwidth(struct usb_hcd *hcd, struct usb_device *udev);
/* xHCI ring, segment, TRB, and TD functions */
dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg, union xhci_trb *trb);
+bool xhci_td_on_ring(struct xhci_td *td, struct xhci_ring *ring);
struct xhci_segment *trb_in_td(struct xhci_hcd *xhci,
struct xhci_segment *start_seg, union xhci_trb *start_trb,
union xhci_trb *end_trb, dma_addr_t suspect_dma, bool debug);
--
1.8.3.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-21 14:45 ` Mathias Nyman
@ 2015-07-22 14:12 ` Arkadiusz Miskiewicz
2015-07-24 11:59 ` Mathias Nyman
0 siblings, 1 reply; 10+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-07-22 14:12 UTC (permalink / raw)
To: Mathias Nyman; +Cc: linux-kernel, linux-usb
[sorry, resend from different email - vger postmaster team has stupid filters
in place]
On Tuesday 21 of July 2015, Mathias Nyman wrote:
> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> > On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> >> Hi.
> >>
> >> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
> >> file from usb storage (sata disk behind sata-usb bridge or pendrive;
> >> hapens in
> >
> >> both cases) copying process hangs just early after start with:
> > Looks like suspend & resume is enough. Reloading bluetooth firmware done
> > by kernel triggers problem:
> >
> > [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> > [ 106.313280] PM: resume of devices complete after 3003.032 msecs
> > [ 106.314079] Restarting tasks ... done.
> > [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> > [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
> > intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
> > 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
> > ep_index 0 comp_code 1
>
> Looks like we get an event for a really old transfer for some reason, it
> should probably be handled already.
>
> I got a patch that adds more paranoid checks for TRB cancel, which has been
> one major reasons for the "Transfer event TRB DMA ptr not part of current
> TD" Errors. It also adds some logging to show what's went wrong. (patch
> attached, applies on 4.2-rc3) Can you see if it helps?
It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
-> dmesg http://sprunge.us/PDIE
around 91s - after resume from ram bluetooth driver reloads
around 754s - tried to copy data from external usb disk
> If it doesn't, then adding xhci debugging could give us some clue:
> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
Ok, http://sprunge.us/GiHX
mounted fs around 1347s and started copying; TRB problems were still there but
file was being copied (in bursts)
> You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you
> could bisect it?
Unfortunately some kernels around 3.19 don't even boot (grub says it loads
initrd and no progress from that) - some commit fixed that but no idea which
one.
> Thanks
> -Mathias
--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-22 14:12 ` Arkadiusz Miskiewicz
@ 2015-07-24 11:59 ` Mathias Nyman
2015-07-24 13:39 ` Mathias Nyman
0 siblings, 1 reply; 10+ messages in thread
From: Mathias Nyman @ 2015-07-24 11:59 UTC (permalink / raw)
To: arekm; +Cc: linux-kernel, linux-usb
On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
>
> [sorry, resend from different email - vger postmaster team has stupid filters
> in place]
>
> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>>>> Hi.
>>>>
>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
>>>> hapens in
>>>
>>>> both cases) copying process hangs just early after start with:
>>> Looks like suspend & resume is enough. Reloading bluetooth firmware done
>>> by kernel triggers problem:
>>>
>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
>>> [ 106.314079] Restarting tasks ... done.
>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>>> ep_index 0 comp_code 1
>>
>> Looks like we get an event for a really old transfer for some reason, it
>> should probably be handled already.
>>
>> I got a patch that adds more paranoid checks for TRB cancel, which has been
>> one major reasons for the "Transfer event TRB DMA ptr not part of current
>> TD" Errors. It also adds some logging to show what's went wrong. (patch
>> attached, applies on 4.2-rc3) Can you see if it helps?
>
> It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
> -> dmesg http://sprunge.us/PDIE
>
> around 91s - after resume from ram bluetooth driver reloads
>
> around 754s - tried to copy data from external usb disk
>
>
>> If it doesn't, then adding xhci debugging could give us some clue:
>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
>
> Ok, http://sprunge.us/GiHX
>
Thanks for the logs, They show that the error is related to transfer descriptors that wrap around
on the endpoint ring buffer by exactly one transfer block.
I don't know yet why this happens, and I might need some help running additional debug
patches to solve this. I'll take a more in depth look at the code one more time first.
A short explanation of the error, mostly for myself:
To transfer data we have a ring buffer that holds transfer requests blocks (TRBs).
The ring buffer is made up of two segments,
the last TRB in each segment is a special link TRB that points to the next segment.
Segment A: 0x3000 - 0x3ff0 (where link TRB at 0x3ff0 points to Segment B, 0x4000)
Segment B: 0x4000 - 0x4ff0 (where link TRB at 0x4ff0 points back at Segment A, 0x3000)
A tranfer descriptor (TD) can consist of many TRBs, in this case three TRBs.
When a TD is completed we will get an event telling where the last transferred TRB of the TD was.
So in this case the error:
xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fffd3000 trb-start 00000000fffd4fd0 trb-end 00000000fffd5000
seg-start 00000000fffd4000 seg-end 00000000fffd4ff0
tells us the first TRB of the TD (trb-start) is at 4fd0,
The second TRB is at 4fe0,
Then we have the special link TRB be at 4ff0, pointing us back to the first segment.
The third and final TRB should be back at the first segment at 3000.
We get an event for the last TRB at 3000 and all should be fine, but driver claims the TDs TRBs start at 4fd0,
and the last TRB is at 5000, missing the link TRB wrapping us back to the first segment.
-Mathias
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-24 11:59 ` Mathias Nyman
@ 2015-07-24 13:39 ` Mathias Nyman
2015-07-24 13:53 ` Peter Stuge
2015-07-24 15:33 ` Arkadiusz Miskiewicz
0 siblings, 2 replies; 10+ messages in thread
From: Mathias Nyman @ 2015-07-24 13:39 UTC (permalink / raw)
To: arekm; +Cc: linux-kernel, linux-usb
[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]
On 24.07.2015 14:59, Mathias Nyman wrote:
> On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
>>
>> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>>>>> Hi.
>>>>>
>>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
>>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
>>>>> hapens in
>>>>
>>>>> both cases) copying process hangs just early after start with:
>>>> Looks like suspend & resume is enough. Reloading bluetooth firmware done
>>>> by kernel triggers problem:
>>>>
>>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
>>>> [ 106.314079] Restarting tasks ... done.
>>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
>>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>>>> ep_index 0 comp_code 1
>>>
>
> Thanks for the logs, They show that the error is related to transfer descriptors that wrap around
> on the endpoint ring buffer by exactly one transfer block.
>
> I don't know yet why this happens, and I might need some help running additional debug
> patches to solve this. I'll take a more in depth look at the code one more time first.
>
I think I found something, The recent ring segment size increase exposed an off by one
error that has been in the driver for a long time. But you need to be unlucky and have
your memory pages allocated in a specific order to trigger it.
small fix, looks like this:
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..77da8fe 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
return 0;
/* offset in TRBs */
segment_offset = trb - seg->trbs;
- if (segment_offset > TRBS_PER_SEGMENT)
+ if (segment_offset > TRBS_PER_SEGMENT - 1)
return 0;
return seg->dma + (segment_offset * sizeof(*trb));
}
Patch attached, could you try it out?
Thanks
-Mathias
[-- Attachment #2: 0001-xhci-fix-off-by-one-error-in-TRB-DMA-address-boundar.patch --]
[-- Type: text/x-patch, Size: 1809 bytes --]
>From 10e909ee20846793e41973941b1367e2303ec313 Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Fri, 24 Jul 2015 15:56:23 +0300
Subject: [PATCH] xhci: fix off by one error in TRB DMA address boundary check
We need to check that a TRB is part of the current segment in use
before calculating its DMA address.
Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.
Now that we use a full memory page, 256 TRBs (4096 bytes) the off by one
caused issues as it doesnt catch the case when a TRB was the first element
of the next segment.
This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and causes
errors like:
[ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not
part of current TD ep_index 0 comp_code 1
[ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0
the trb-end address is one outside the end-seg address.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..77da8fe 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
return 0;
/* offset in TRBs */
segment_offset = trb - seg->trbs;
- if (segment_offset > TRBS_PER_SEGMENT)
+ if (segment_offset > TRBS_PER_SEGMENT - 1)
return 0;
return seg->dma + (segment_offset * sizeof(*trb));
}
--
1.8.3.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-24 13:39 ` Mathias Nyman
@ 2015-07-24 13:53 ` Peter Stuge
2015-07-24 14:37 ` Mathias Nyman
2015-07-24 15:33 ` Arkadiusz Miskiewicz
1 sibling, 1 reply; 10+ messages in thread
From: Peter Stuge @ 2015-07-24 13:53 UTC (permalink / raw)
To: Mathias Nyman; +Cc: arekm, linux-kernel, linux-usb
Mathias Nyman wrote:
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
> return 0;
> /* offset in TRBs */
> segment_offset = trb - seg->trbs;
> - if (segment_offset > TRBS_PER_SEGMENT)
> + if (segment_offset > TRBS_PER_SEGMENT - 1)
Maybe change the > comparison to >= rather than add the extra "- 1"?
//Peter
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-24 13:53 ` Peter Stuge
@ 2015-07-24 14:37 ` Mathias Nyman
0 siblings, 0 replies; 10+ messages in thread
From: Mathias Nyman @ 2015-07-24 14:37 UTC (permalink / raw)
To: Peter Stuge; +Cc: arekm, linux-kernel, linux-usb
On 24.07.2015 16:53, Peter Stuge wrote:
> Mathias Nyman wrote:
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
>> return 0;
>> /* offset in TRBs */
>> segment_offset = trb - seg->trbs;
>> - if (segment_offset > TRBS_PER_SEGMENT)
>> + if (segment_offset > TRBS_PER_SEGMENT - 1)
>
> Maybe change the > comparison to >= rather than add the extra "- 1"?
Yes, sure, I'll change it if this really was the cause.
Just happy to finally find a probable cause after staring at logs and code for some time
-Mathias
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-24 13:39 ` Mathias Nyman
2015-07-24 13:53 ` Peter Stuge
@ 2015-07-24 15:33 ` Arkadiusz Miskiewicz
2015-08-03 13:12 ` Mathias Nyman
1 sibling, 1 reply; 10+ messages in thread
From: Arkadiusz Miskiewicz @ 2015-07-24 15:33 UTC (permalink / raw)
To: Mathias Nyman; +Cc: linux-kernel, linux-usb
On Friday 24 of July 2015, Mathias Nyman wrote:
> On 24.07.2015 14:59, Mathias Nyman wrote:
> > On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
> >> On Tuesday 21 of July 2015, Mathias Nyman wrote:
> >>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> >>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> >>>>> Hi.
> >>>>>
> >>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
> >>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
> >>>>> hapens in
> >>>>
> >>>>> both cases) copying process hangs just early after start with:
> >>>> Looks like suspend & resume is enough. Reloading bluetooth firmware
> >>>> done by kernel triggers problem:
> >>>>
> >>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> >>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
> >>>> [ 106.314079] Restarting tasks ... done.
> >>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> >>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
> >>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
> >>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
> >>>> ep_index 0 comp_code 1
> >
> > Thanks for the logs, They show that the error is related to transfer
> > descriptors that wrap around on the endpoint ring buffer by exactly one
> > transfer block.
> >
> > I don't know yet why this happens, and I might need some help running
> > additional debug patches to solve this. I'll take a more in depth look
> > at the code one more time first.
>
> I think I found something, The recent ring segment size increase exposed an
> off by one error that has been in the driver for a long time. But you need
> to be unlucky and have your memory pages allocated in a specific order to
> trigger it.
>
> small fix, looks like this:
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 94416ff..77da8fe 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
> return 0;
> /* offset in TRBs */
> segment_offset = trb - seg->trbs;
> - if (segment_offset > TRBS_PER_SEGMENT)
> + if (segment_offset > TRBS_PER_SEGMENT - 1)
> return 0;
> return seg->dma + (segment_offset * sizeof(*trb));
> }
>
>
> Patch attached, could you try it out?
Works fine with this patch, so:
Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Thanks!
ps. please push to stable@, too
>
> Thanks
> -Mathias
--
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
2015-07-24 15:33 ` Arkadiusz Miskiewicz
@ 2015-08-03 13:12 ` Mathias Nyman
0 siblings, 0 replies; 10+ messages in thread
From: Mathias Nyman @ 2015-08-03 13:12 UTC (permalink / raw)
To: arekm; +Cc: linux-kernel, linux-usb
On 24.07.2015 18:33, Arkadiusz Miskiewicz wrote:
> On Friday 24 of July 2015, Mathias Nyman wrote:
>> On 24.07.2015 14:59, Mathias Nyman wrote:
>>> On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
>>>> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>>>>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>>>>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
>>>>>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
>>>>>>> hapens in
>>>>>>
>>>>>>> both cases) copying process hangs just early after start with:
>>>>>> Looks like suspend & resume is enough. Reloading bluetooth firmware
>>>>>> done by kernel triggers problem:
>>>>>>
>>>>>> [ 106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>>>>>> [ 106.313280] PM: resume of devices complete after 3003.032 msecs
>>>>>> [ 106.314079] Restarting tasks ... done.
>>>>>> [ 106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>>>>>> [ 106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>>>>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [ 106.398223] xhci_hcd
>>>>>> 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>>>>>> ep_index 0 comp_code 1
>>>
>>> Thanks for the logs, They show that the error is related to transfer
>>> descriptors that wrap around on the endpoint ring buffer by exactly one
>>> transfer block.
>>>
>>> I don't know yet why this happens, and I might need some help running
>>> additional debug patches to solve this. I'll take a more in depth look
>>> at the code one more time first.
>>
>> I think I found something, The recent ring segment size increase exposed an
>> off by one error that has been in the driver for a long time. But you need
>> to be unlucky and have your memory pages allocated in a specific order to
>> trigger it.
>>
>> small fix, looks like this:
>>
>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>> index 94416ff..77da8fe 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
>> return 0;
>> /* offset in TRBs */
>> segment_offset = trb - seg->trbs;
>> - if (segment_offset > TRBS_PER_SEGMENT)
>> + if (segment_offset > TRBS_PER_SEGMENT - 1)
>> return 0;
>> return seg->dma + (segment_offset * sizeof(*trb));
>> }
>>
>>
>> Patch attached, could you try it out?
>
> Works fine with this patch, so:
>
> Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
>
> Thanks!
>
> ps. please push to stable@, too
>
Patch sent forward, added Tested-by and stable tags
Thanks
- Mathias
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-08-03 13:08 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-18 14:49 xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 Arkadiusz Miskiewicz
2015-07-20 20:13 ` Arkadiusz Miskiewicz
2015-07-21 14:45 ` Mathias Nyman
2015-07-22 14:12 ` Arkadiusz Miskiewicz
2015-07-24 11:59 ` Mathias Nyman
2015-07-24 13:39 ` Mathias Nyman
2015-07-24 13:53 ` Peter Stuge
2015-07-24 14:37 ` Mathias Nyman
2015-07-24 15:33 ` Arkadiusz Miskiewicz
2015-08-03 13:12 ` Mathias Nyman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).