linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Strange issues with USB device
@ 2024-08-23 19:18 Marc SCHAEFER
  2024-08-24  6:44 ` Michał Pecio
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-08-23 19:18 UTC (permalink / raw)
  To: linux-usb

Hello,
   
Doing a lot of reads on USB devices, I get sometimes these errors (~ every 20 to 30 minutes):
(each time on a different drive)
   
   Aug 23 19:13:02 video kernel: [ 9007.574197] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
   Aug 23 19:13:02 video udisksd[967]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K2RV8LZ4: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
   Aug 23 19:13:02 video kernel: [ 9007.601188] sd 16:0:0:0: [sdc] tag#0 data cmplt err -75 uas-tag 5 inflight: CMD
   Aug 23 19:13:02 video kernel: [ 9007.601190] DMAR: DRHD: handling fault status reg 3
   Aug 23 19:13:02 video kernel: [ 9007.601194] DMAR: [DMA Write] Request device [01:00.0] PASID ffffffff fault addr f2e69000 [fault reason 05] PTE Write access is not set
   Aug 23 19:13:02 video kernel: [ 9007.601199] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 ac d4 30 80 00 00 04 00 00 00
   Aug 23 19:13:02 video kernel: [ 9007.693901] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 10 ep 10 with no TDs queued?
   
   [ many ]
   
   Aug 23 19:13:21 video kernel: [ 9026.706893] xhci_hcd 0000:01:00.0: WARN Set TR deq ptr command for freed stream ID 37025
   Aug 23 19:13:21 video kernel: [ 9026.711876] sd 16:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=19s
   Aug 23 19:13:21 video kernel: [ 9026.711880] sd 16:0:0:0: [sdc] tag#0 Sense Key : Illegal Request [current]
   Aug 23 19:13:21 video kernel: [ 9026.711883] sd 16:0:0:0: [sdc] tag#0 Add. Sense: Invalid command operation code
   Aug 23 19:13:21 video kernel: [ 9026.711887] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 ac d4 30 80 00 00 04 00 00 00
   Aug 23 19:13:21 video kernel: [ 9026.711891] blk_update_request: critical target error, dev sdc, sector 2899587200 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
   Aug 23 19:13:21 video kernel: [ 9026.737762] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 4
   Aug 23 19:13:21 video kernel: [ 9026.737770] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000ffdce4e0 trb-start 00000000ffdcece0 trb-end 0000000000000000 seg-start 00000000ffdce000 seg-end 00000000ffdceff0
   Aug 23 19:13:21 video kernel: [ 9026.737773] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000ffdce4e0 trb-start 00000000ffdcd000 trb-end 00000000ffdcd4e0 seg-start 00000000ffdcd000 seg-end 00000000ffdcdff0
   Aug 23 19:13:32 video kernel: [ 9037.762813] sd 16:0:0:0: [sdc] tag#5 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN
   Aug 23 19:13:32 video kernel: [ 9037.762819] sd 16:0:0:0: [sdc] tag#5 CDB: Read(16) 88 00 00 00 00 00 ac d4 3c 80 00 00 04 00 00 00
   Aug 23 19:13:32 video kernel: [ 9037.763066] sd 16:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
   Aug 23 19:13:32 video kernel: [ 9037.763072] sd 16:0:0:0: [sdc] tag#4 CDB: Read(16) 88 00 00 00 00 00 ac d4 38 80 00 00 04 00 00 00
   Aug 23 19:13:32 video kernel: [ 9037.763094] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
   Aug 23 19:13:32 video kernel: [ 9037.763269] sd 16:0:0:0: [sdc] tag#2 uas_eh_abort_handler 0 uas-tag 6 inflight: IN
   Aug 23 19:13:32 video kernel: [ 9037.763272] sd 16:0:0:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 00 ac d4 20 80 00 00 04 00 00 00
   Aug 23 19:13:32 video kernel: [ 9037.763318] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
   Aug 23 19:13:32 video kernel: [ 9037.763484] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
   Aug 23 19:13:40 video kernel: [ 9045.792699] usb 4-1.1.2: stat urb: no pending cmd for uas-tag 2
   Aug 23 19:13:40 video smartd[963]: Device: /dev/sdc [SAT], Read SMART Self Test Log Failed
   Aug 23 19:13:40 video smartd[963]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
   Aug 23 19:13:40 video kernel: [ 9045.792965] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
   Aug 23 19:13:40 video smartd[963]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
   Aug 23 19:13:52 video kernel: [ 9057.475000] sd 16:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
   Aug 23 19:13:52 video kernel: [ 9057.475006] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 ac d4 40 80 00 00 04 00 00 00
   Aug 23 19:13:52 video kernel: [ 9057.475248] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
   Aug 23 19:13:52 video kernel: [ 9057.498985] scsi host16: uas_eh_device_reset_handler start
   Aug 23 19:13:52 video kernel: [ 9057.591061] usb 4-1.1.2: reset SuperSpeedPlus Gen 2 USB device number 7 using xhci_hcd
   Aug 23 19:13:52 video kernel: [ 9057.638613] scsi host16: uas_eh_device_reset_handler success

It looks like there is a SMART query which maybe leads to a timeout, but does
it look more like a USB host or target issue?

I found some references to udisksd flooding logs when a disk is idle, but I am
sure it is not idle, there were a lot of reads on that device running according
to saidar.

Should I disable SMART udisksd polling on those drives?  It does not really
look it's easy to disable this with udisksd.

Hardware:
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M       
        |__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/4p, 10000M  
            |__ Port 2: Dev 7, If 0, Class=Mass Storage, Driver=uas, 10000M  
            |__ Port 3: Dev 8, If 0, Class=Mass Storage, Driver=uas, 10000M 
            |__ Port 1: Dev 6, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 4: Dev 9, If 0, Class=Mass Storage, Driver=uas, 10000M

An USB enclosure:
   Bus 004 Device 009: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor
   Bus 004 Device 008: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor
   Bus 004 Device 007: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor
   Bus 004 Device 006: ID 2109:0715 VIA Labs, Inc. VL817 SATA Adaptor

   Bus 004 Device 004: ID 2109:8822 VIA Labs, Inc. USB3.1 Hub
   Bus 004 Device 002: ID 2109:0822 VIA Labs, Inc. USB3.1 Hub

the host, I think it's this one:
   01:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]

Smart info on those 4 drives:
   root@video:~# for i in b c d e; do smartctl -a /dev/sd$i|grep Device; done
   Device Model:     WDC WD40EURX-63BMCY0
   LU WWN Device Id: 5 0014ee 2bca07681
   Device is:        Not in smartctl database [for details use: -P showall]
   Device Model:     WDC WD40EURX-63BMCY0
   LU WWN Device Id: 5 0014ee 26720014d
   Device is:        Not in smartctl database [for details use: -P showall]
   Device Model:     WDC WD40EURX-63BMCY0
   LU WWN Device Id: 5 0014ee 267201249
   Device is:        Not in smartctl database [for details use: -P showall]
   Device Model:     WDC WD40EURX-63BMCY0
   LU WWN Device Id: 5 0014ee 2674ab7b3
   Device is:        Not in smartctl database [for details use: -P showall]

Do you have any suggestion?

Thank you.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with USB device
  2024-08-23 19:18 Strange issues with USB device Marc SCHAEFER
@ 2024-08-24  6:44 ` Michał Pecio
  2024-08-24  6:51   ` Marc SCHAEFER
  2024-08-25 16:32   ` Strange issues with USB device Marc SCHAEFER
  0 siblings, 2 replies; 25+ messages in thread
From: Michał Pecio @ 2024-08-24  6:44 UTC (permalink / raw)
  To: schaefer; +Cc: linux-usb

Hi,

What sort of kernel is this?

Looks like there are some stream-related xhci_hcd bugs here. You may
have more luck with UAS disabled if you need this system working ASAP.

Not only SMART is failing, but also ordinary block reads and maybe
writes as well, so it could potentially be a risk of data loss.

For debugging, it would help to have a reliable trigger, like "happens
every time I run smartctl manually" or "every time I do it while
'cat /dev/sdc >/dev/null'" or at least "often while <some workload>".

I am able to find the PCI adapter responsible for a USB bus like this:

$ ls -d /sys/bus/pci/devices/*/usb8
/sys/bus/pci/devices/0000:02:00.0/usb8

Regards,
Michal

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with USB device
  2024-08-24  6:44 ` Michał Pecio
@ 2024-08-24  6:51   ` Marc SCHAEFER
  2024-08-24  8:44     ` Michał Pecio
  2024-08-25 16:32   ` Strange issues with USB device Marc SCHAEFER
  1 sibling, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-08-24  6:51 UTC (permalink / raw)
  To: linux-usb

Hello,

On Sat, Aug 24, 2024 at 08:44:37AM +0200, Micha?? Pecio wrote:
> What sort of kernel is this?

5.10.0-32-amd64 #1 SMP Debian 5.10.223-1 (2024-08-10) x86_64 GNU/Linux

If required, I can try a backport 6.x, or even compile manually
a last stable 6.x

> Looks like there are some stream-related xhci_hcd bugs here. You may
> have more luck with UAS disabled if you need this system working ASAP.

There is no urgency, but yes, I will try that.

> Not only SMART is failing, but also ordinary block reads and maybe
> writes as well, so it could potentially be a risk of data loss.

I am test-qualifying this system for a few weeks now, and that bug
only happened under very high load, apparently.  So far no data
corruption (data comparison is part of the test).

> For debugging, it would help to have a reliable trigger, like "happens
> every time I run smartctl manually" or "every time I do it while
> 'cat /dev/sdc >/dev/null'" or at least "often while <some workload>".

This seems to be encountered when I do an array check with mdadm,
and when I adjust the min and max speed to the max.

> I am able to find the PCI adapter responsible for a USB bus like this:
> 
> $ ls -d /sys/bus/pci/devices/*/usb8
> /sys/bus/pci/devices/0000:02:00.0/usb8

Thank you.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with USB device
  2024-08-24  6:51   ` Marc SCHAEFER
@ 2024-08-24  8:44     ` Michał Pecio
  2024-08-26  5:17       ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Pecio @ 2024-08-24  8:44 UTC (permalink / raw)
  To: schaefer; +Cc: linux-usb

This driver received lots of fixes and improvements in the last few
years.

The newer release you can get, the better. Fixes aren't backported
immediately, and some aren't backported at all because of too many
changes in surrounding code.

If it still doesn't work in v6.6, check v6.10 or v6.11-rc5 tomorrow
even if you aren't going to use these branches long term.

Regards,
Michal

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with USB device
  2024-08-24  6:44 ` Michał Pecio
  2024-08-24  6:51   ` Marc SCHAEFER
@ 2024-08-25 16:32   ` Marc SCHAEFER
  1 sibling, 0 replies; 25+ messages in thread
From: Marc SCHAEFER @ 2024-08-25 16:32 UTC (permalink / raw)
  To: linux-usb

Hello,

Thank you for your input.

On Sat, Aug 24, 2024 at 08:44:37AM +0200, Micha?? Pecio wrote:
> Looks like there are some stream-related xhci_hcd bugs here. You may
> have more luck with UAS disabled if you need this system working ASAP.

I did this first.  I still have regular SMART bugs every 20 minutes:

   Aug 25 17:58:35 video udisksd[896]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K7KNYR4H: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)

However, during ~6 hours of intense reading, I had only once an USB issue:

Aug 25 17:31:25 video kernel: [12176.640245] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Aug 25 17:31:25 video kernel: [12176.727876] usb 4-1.1.4: reset SuperSpeedPlus Gen 2 USB device number 9 using xhci_hcd

No other consequence.

My next step will be, as suggested, to try the latest kernel release of today,
this time without UAS disabled.

> I am able to find the PCI adapter responsible for a USB bus like this:
> 
> $ ls -d /sys/bus/pci/devices/*/usb8
> /sys/bus/pci/devices/0000:02:00.0/usb8

$ ls -d /sys/bus/pci/devices/*/usb4
/sys/bus/pci/devices/0000:01:00.0/usb4

01:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller
[ 1b21:2142 ] https://linux-hardware.org/?id=pci:1b21-2142-1043-8732

Have a nice day.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with USB device
  2024-08-24  8:44     ` Michał Pecio
@ 2024-08-26  5:17       ` Marc SCHAEFER
  2024-09-03  7:48         ` Strange issues with UAS URB cancellation Michał Pecio
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-08-26  5:17 UTC (permalink / raw)
  To: linux-usb

Hello,

On Sat, Aug 24, 2024 at 10:44:28AM +0200, Micha?? Pecio wrote:
> The newer release you can get, the better. Fixes aren't backported

So, I did some tests (around 6 hours, a mdadm array check, read-only, with min
and max speed set to max, leading to 350 MByte/s at the start and around 250
MByte/s at the end) with

Linux video 6.11.0-rc5 #1 SMP PREEMPT_DYNAMIC Sun Aug 25 15:13:35 CEST 2024 x86_64 GNU/Linux

(and UAS activated on the 4 drives, opposite to the previous test):

root@video:~# lsusb -t | grep uas
            |__ Port 3: Dev 8, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 1: Dev 6, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 4: Dev 9, If 0, Class=Mass Storage, Driver=uas, 10000M
            |__ Port 2: Dev 7, If 0, Class=Mass Storage, Driver=uas, 10000M

The good news is that there aren't systematic USB issues, what I observed is:

1) very regular SMART errors (every ten minutes):

Aug 25 21:31:59 video udisksd[1004]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K2RV8LZ4: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)

2) from time to time, USB errors, sometimes maybe linked to the SMART errors:
   (6 times over 6 hours), see below

  [ sometimes the SMART error is NOT loggued just before or after, but the
    times are so similar every 10 minutes that I guess it is linked
  ]

Aug 25 23:11:59 video udisksd[1004]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K2RV8LZ4: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Aug 25 23:11:59 video kernel: [ 7214.544803] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Aug 25 23:11:59 video kernel: [ 7214.544824] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001207ab0 trb-start 0000000001207ac0 trb-end 0000000000000000 seg-start 0000000001207000 seg-end 0000000001207ff0
Aug 25 23:11:59 video kernel: [ 7214.544828] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001207ab0 trb-start 0000000001208000 trb-end 00000000012082c0 seg-start 0000000001208000 seg-end 0000000001208ff0
Aug 25 23:12:30 video kernel: [ 7245.092841] sd 16:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 5 inflight: IN
Aug 25 23:12:30 video kernel: [ 7245.092851] sd 16:0:0:0: [sdc] tag#4 CDB: Read(16) 88 00 00 00 00 00 8c 5b 44 00 00 00 04 00 00 00
Aug 25 23:12:30 video kernel: [ 7245.093687] sd 16:0:0:0: [sdc] tag#2 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
Aug 25 23:12:30 video kernel: [ 7245.093704] sd 16:0:0:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 00 8c 5b 64 00 00 00 04 00 00 00
Aug 25 23:12:30 video kernel: [ 7245.094530] sd 16:0:0:0: [sdc] tag#1 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
Aug 25 23:12:30 video kernel: [ 7245.094548] sd 16:0:0:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 00 8c 5b 60 00 00 00 04 00 00 00
Aug 25 23:12:30 video kernel: [ 7245.095370] sd 16:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
Aug 25 23:12:30 video kernel: [ 7245.095377] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 8c 5b 68 00 00 00 04 00 00 00
Aug 25 23:12:30 video kernel: [ 7245.116843] scsi host16: uas_eh_device_reset_handler start
Aug 25 23:12:30 video kernel: [ 7245.209034] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
Aug 25 23:12:30 video kernel: [ 7245.256495] scsi host16: uas_eh_device_reset_handler success
 

Aug 26 01:41:59 video udisksd[1004]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K7KNYR4H: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Aug 26 01:41:59 video kernel: [16214.490054] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Aug 26 01:41:59 video kernel: [16214.490077] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001280f90 trb-start 0000000001280fa0 trb-end 0000000000000000 seg-start 0000000001280000 seg-end 0000000001280ff0
Aug 26 01:41:59 video kernel: [16214.490082] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001280f90 trb-start 0000000001281000 trb-end 00000000012817a0 seg-start 0000000001281000 seg-end 0000000001281ff0
Aug 26 01:42:31 video kernel: [16246.199014] sd 18:0:0:0: [sde] tag#5 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN
Aug 26 01:42:31 video kernel: [16246.199023] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 01 25 56 10 00 00 00 04 00 00 00
Aug 26 01:42:31 video kernel: [16246.199839] sd 18:0:0:0: [sde] tag#4 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
Aug 26 01:42:31 video kernel: [16246.199847] sd 18:0:0:0: [sde] tag#4 CDB: Read(16) 88 00 00 00 00 01 25 56 0c 00 00 00 04 00 00 00
Aug 26 01:42:31 video kernel: [16246.200626] sd 18:0:0:0: [sde] tag#3 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN
Aug 26 01:42:31 video kernel: [16246.200635] sd 18:0:0:0: [sde] tag#3 CDB: Read(16) 88 00 00 00 00 01 25 56 08 00 00 00 04 00 00 00
Aug 26 01:42:31 video kernel: [16246.201463] sd 18:0:0:0: [sde] tag#1 uas_eh_abort_handler 0 uas-tag 2 inflight: IN
Aug 26 01:42:31 video kernel: [16246.201472] sd 18:0:0:0: [sde] tag#1 CDB: Read(16) 88 00 00 00 00 01 25 55 fc 00 00 00 04 00 00 00
Aug 26 01:42:31 video kernel: [16246.215042] scsi host18: uas_eh_device_reset_handler start
Aug 26 01:42:31 video kernel: [16246.555177] usb 3-1.1.4: reset SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
Aug 26 01:42:31 video kernel: [16246.602825] scsi host18: uas_eh_device_reset_handler success

Aug 26 02:31:59 video kernel: [19214.813966] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Aug 26 02:31:59 video kernel: [19214.917797] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Aug 26 02:31:59 video kernel: [19214.917809] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001210360 trb-start 0000000001210380 trb-end 0000000001210b70 seg-start 0000000001210000 seg-end 0000000001210ff0
Aug 26 02:32:31 video kernel: [19246.567687] sd 16:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 3 inflight: IN
Aug 26 02:32:31 video kernel: [19246.567697] sd 16:0:0:0: [sdc] tag#4 CDB: Read(16) 88 00 00 00 00 01 51 34 ec 00 00 00 04 00 00 00
Aug 26 02:32:31 video kernel: [19246.568522] sd 16:0:0:0: [sdc] tag#3 uas_eh_abort_handler 0 uas-tag 2 inflight: IN
Aug 26 02:32:31 video kernel: [19246.568531] sd 16:0:0:0: [sdc] tag#3 CDB: Read(16) 88 00 00 00 00 01 51 34 e8 00 00 00 04 00 00 00
Aug 26 02:32:31 video kernel: [19246.569332] sd 16:0:0:0: [sdc] tag#1 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN
Aug 26 02:32:31 video kernel: [19246.569340] sd 16:0:0:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 01 51 35 08 00 00 00 04 00 00 00
Aug 26 02:32:31 video kernel: [19246.570142] sd 16:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
Aug 26 02:32:31 video kernel: [19246.570151] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 51 35 04 00 00 00 04 00 00 00
Aug 26 02:32:31 video kernel: [19246.591712] scsi host16: uas_eh_device_reset_handler start
Aug 26 02:32:31 video kernel: [19246.683819] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
Aug 26 02:32:31 video kernel: [19246.731831] scsi host16: uas_eh_device_reset_handler success
 
Aug 26 03:21:59 video kernel: [22214.861402] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Aug 26 03:21:59 video udisksd[1004]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K7KNYR4H: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Aug 26 03:21:59 video kernel: [22214.881310] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Aug 26 03:21:59 video kernel: [22214.881318] xhci_hcd 0000:01:00.0: Looking for event-dma 000000000128ea40 trb-start 000000000128ea50 trb-end 0000000000000000 seg-start 000000000128e000 seg-end 000000000128eff0
Aug 26 03:21:59 video kernel: [22214.881320] xhci_hcd 0000:01:00.0: Looking for event-dma 000000000128ea40 trb-start 000000000128f000 trb-end 000000000128f250 seg-start 000000000128f000 seg-end 000000000128fff0
Aug 26 03:21:59 video udisksd[1004]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K2RV8LZ4: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Aug 26 03:22:29 video kernel: [22244.884466] sd 18:0:0:0: [sde] tag#5 uas_eh_abort_handler 0 uas-tag 5 inflight: IN
Aug 26 03:22:29 video kernel: [22244.884476] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 01 79 18 bc 00 00 00 04 00 00 00
Aug 26 03:22:29 video kernel: [22244.912424] sd 18:0:0:0: [sde] tag#1 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
Aug 26 03:22:29 video kernel: [22244.912435] sd 18:0:0:0: [sde] tag#1 CDB: Read(16) 88 00 00 00 00 01 79 18 dc 00 00 00 04 00 00 00
Aug 26 03:22:29 video kernel: [22244.913011] sd 18:0:0:0: [sde] tag#0 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
Aug 26 03:22:29 video kernel: [22244.913016] sd 18:0:0:0: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 01 79 18 d8 00 00 00 04 00 00 00
Aug 26 03:22:29 video kernel: [22244.944420] sd 18:0:0:0: [sde] tag#2 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN
Aug 26 03:22:29 video kernel: [22244.944430] sd 18:0:0:0: [sde] tag#2 CDB: Read(16) 88 00 00 00 00 01 79 18 e0 00 00 00 04 00 00 00
Aug 26 03:22:29 video kernel: [22244.964465] scsi host18: uas_eh_device_reset_handler start
Aug 26 03:22:29 video kernel: [22245.056593] usb 3-1.1.4: reset SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
Aug 26 03:22:29 video kernel: [22245.104078] scsi host18: uas_eh_device_reset_handler success

Aug 26 04:11:59 video udisksd[1004]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K7KNYR4H: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Aug 26 04:11:59 video kernel: [25214.758361] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Aug 26 04:11:59 video udisksd[1004]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K2RV8LZ4: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Aug 26 04:11:59 video kernel: [25214.781786] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Aug 26 04:11:59 video kernel: [25214.781797] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012046b0 trb-start 00000000012046c0 trb-end 0000000001204eb0 seg-start 0000000001204000 seg-end 0000000001204ff0
Aug 26 04:12:30 video kernel: [25245.249595] sd 16:0:0:0: [sdc] tag#5 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN
Aug 26 04:12:30 video kernel: [25245.249605] sd 16:0:0:0: [sdc] tag#5 CDB: Read(16) 88 00 00 00 00 01 9d 33 38 00 00 00 04 00 00 00
Aug 26 04:12:30 video kernel: [25245.250410] sd 16:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
Aug 26 04:12:30 video kernel: [25245.250418] sd 16:0:0:0: [sdc] tag#4 CDB: Read(16) 88 00 00 00 00 01 9d 33 34 00 00 00 04 00 00 00
Aug 26 04:12:30 video kernel: [25245.251185] sd 16:0:0:0: [sdc] tag#1 uas_eh_abort_handler 0 uas-tag 3 inflight: IN
Aug 26 04:12:30 video kernel: [25245.251193] sd 16:0:0:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 01 9d 33 18 00 00 00 04 00 00 00
Aug 26 04:12:30 video kernel: [25245.251990] sd 16:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
Aug 26 04:12:30 video kernel: [25245.251998] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 9d 33 3c 00 00 00 04 00 00 00
Aug 26 04:12:30 video kernel: [25245.277631] scsi host16: uas_eh_device_reset_handler start
Aug 26 04:12:30 video kernel: [25245.373630] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
Aug 26 04:12:30 video kernel: [25245.421419] scsi host16: uas_eh_device_reset_handler success

Aug 26 04:21:59 video kernel: [25815.021994] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Aug 26 04:21:59 video kernel: [25815.040719] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Aug 26 04:21:59 video kernel: [25815.040730] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012836b0 trb-start 00000000012836c0 trb-end 0000000001283eb0 seg-start 0000000001283000 seg-end 0000000001283ff0
Aug 26 04:22:07 video kernel: [25822.291130] sd 18:0:0:0: [sde] tag#2 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN
Aug 26 04:22:07 video kernel: [25822.291140] sd 18:0:0:0: [sde] tag#2 CDB: ATA command pass through(12)/Blank a1 08 2e d0 01 00 4f c2 00 b0 00 00
Aug 26 04:22:18 video kernel: [25834.151565] usb 3-1.1.4: stat urb: no pending cmd for uas-tag 4
Aug 26 04:22:30 video kernel: [25845.331588] sd 18:0:0:0: [sde] tag#5 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
Aug 26 04:22:30 video kernel: [25845.331607] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 01 a3 b9 1c 00 00 00 04 00 00 00
Aug 26 04:22:30 video kernel: [25845.332384] sd 18:0:0:0: [sde] tag#4 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
Aug 26 04:22:30 video kernel: [25845.332392] sd 18:0:0:0: [sde] tag#4 CDB: Read(16) 88 00 00 00 00 01 a3 b9 24 00 00 00 04 00 00 00
Aug 26 04:22:30 video kernel: [25845.333197] sd 18:0:0:0: [sde] tag#3 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN
Aug 26 04:22:30 video kernel: [25845.333205] sd 18:0:0:0: [sde] tag#3 CDB: Read(16) 88 00 00 00 00 01 a3 b9 20 00 00 00 04 00 00 00
Aug 26 04:22:30 video kernel: [25845.334032] sd 18:0:0:0: [sde] tag#1 uas_eh_abort_handler 0 uas-tag 3 inflight: IN
Aug 26 04:22:30 video kernel: [25845.334041] sd 18:0:0:0: [sde] tag#1 CDB: Read(16) 88 00 00 00 00 01 a3 b9 04 00 00 00 04 00 00 00
Aug 26 04:22:30 video kernel: [25845.355520] scsi host18: uas_eh_device_reset_handler start
Aug 26 04:22:30 video kernel: [25845.455779] usb 3-1.1.4: reset SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
Aug 26 04:22:30 video kernel: [25845.503395] scsi host18: uas_eh_device_reset_handler success

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-08-26  5:17       ` Marc SCHAEFER
@ 2024-09-03  7:48         ` Michał Pecio
  2024-09-03 12:55           ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Pecio @ 2024-09-03  7:48 UTC (permalink / raw)
  To: schaefer; +Cc: linux-usb

[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]

Hi,

[ I changed the title a little. And please use "reply to all" when
responding to the list, so that everybody involved in the thread gets
the response directly, regardless of linux-usb subscription status. ]

This 6.11-rc5 result is still not looking great, and neither is the
non-UAS 5.10.223 for that matter. These xHCI errors shouldn't be there;
it looks like there are still some bugs in URB cancellation code or in
the hardware maybe.

I'm not very familiar with streams stuff, but I remember having similar
symptoms on isochronous devices due to a particular hardware bug (not
on ASMedia though) and due to some event processing bugs.

I can offer this patch for testing (on 6.11-rc5), which tries to:
1. work around (hypothetical) similar HW bugs on non-NEC controllers
2. generally add more error checking in this area
3. log all transfer completion events and print the log on errors

This may shed more light on what's going on and why things are failing.
Knowing xHCI driver, in each of those cases it probably gets completely
stuck, until SCSI layer loses patience 30 seconds later and requests a
reset of the storage device.

Regards,
Michal

[-- Attachment #2: strange-debug.patch --]
[-- Type: text/x-patch, Size: 4073 bytes --]

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4ea2c3e072a9..0acdcf4d28db 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -814,11 +814,16 @@ static void xhci_unmap_td_bounce_buffer(struct xhci_hcd *xhci,
 	seg->bounce_offs = 0;
 }
 
-static int xhci_td_cleanup(struct xhci_hcd *xhci, struct xhci_td *td,
-			   struct xhci_ring *ep_ring, int status)
+static int __xhci_td_cleanup(struct xhci_hcd *xhci, struct xhci_td *td,
+			   struct xhci_ring *ep_ring, int status, const char *from)
 {
 	struct urb *urb = NULL;
 
+	ep_ring->cleanup_first_dma = xhci_trb_virt_to_dma(td->start_seg, td->first_trb);
+	ep_ring->cleanup_last_dma = xhci_trb_virt_to_dma(td->last_trb_seg, td->last_trb);
+	ep_ring->cleanup_status = status;
+	ep_ring->cleanup_from = from;
+
 	/* Clean up the endpoint's TD list */
 	urb = td->urb;
 
@@ -862,6 +867,8 @@ static int xhci_td_cleanup(struct xhci_hcd *xhci, struct xhci_td *td,
 	return 0;
 }
 
+#define xhci_td_cleanup(xhci, td, ep_ring, status) __xhci_td_cleanup(xhci, td, ep_ring, status, __func__)
+
 
 /* Complete the cancelled URBs we unlinked from td_list. */
 static void xhci_giveback_invalidated_tds(struct xhci_virt_ep *ep)
@@ -1153,14 +1160,13 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 			/*
 			 * NEC uPD720200 sometimes sets this state and fails with
 			 * Context Error while continuing to process TRBs.
-			 * Be conservative and trust EP_CTX_STATE on other chips.
 			 */
-			if (!(xhci->quirks & XHCI_NEC_HOST))
-				break;
+			xhci_info(xhci, "Stop Endpoint Context Error & Stopped on slot %d ep %d\n",
+					slot_id, ep_index);
 			fallthrough;
 		case EP_STATE_RUNNING:
 			/* Race, HW handled stop ep cmd before ep was running */
-			xhci_dbg(xhci, "Stop ep completion ctx error, ep is running\n");
+			xhci_info(xhci, "Stop ep completion ctx error, ep is running\n");
 
 			command = xhci_alloc_command(xhci, false, GFP_ATOMIC);
 			if (!command) {
@@ -1176,6 +1182,11 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 		}
 	}
 
+	int state = GET_EP_CTX_STATE(ep_ctx);
+	if (comp_code != COMP_SUCCESS || state != EP_STATE_STOPPED)
+		xhci_err(xhci, "Stop Endpoint on slot %d ep %d failed with comp %d ctx_state %d\n",
+				slot_id, ep_index, comp_code, state);
+
 	/* will queue a set TR deq if stopped on a cancelled, uncleared TD */
 	xhci_invalidate_cancelled_tds(ep);
 	ep->ep_state &= ~EP_STOP_CMD_PENDING;
@@ -2636,6 +2647,11 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	if (!ep_ring)
 		return handle_transferless_tx_event(xhci, ep, trb_comp_code);
 
+	snprintf(&ep_ring->log_b[ep_ring->log_i++][0], 100,
+			"event %px ep_trb_dma %llx comp_code %d len %d slot %d ep %d",
+			event, ep_trb_dma, trb_comp_code, EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)), slot_id, ep_index);
+	ep_ring->log_i %= 5;
+
 	/* Look for common error cases */
 	switch (trb_comp_code) {
 	/* Skip codes that require special handling depending on
@@ -2863,6 +2879,11 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 					"comp_code %u\n", ep_index,
 					trb_comp_code);
 				trb_in_td(xhci, td, ep_trb_dma, true);
+				xhci_info(xhci, "last xhci_td_cleanup: first_dma %llx last_dma %llx status %d from %s\n",
+						ep_ring->cleanup_first_dma, ep_ring->cleanup_last_dma,
+						ep_ring->cleanup_status, ep_ring->cleanup_from);
+				for (int i = 0; i < 5; i++)
+					xhci_info(xhci, "handle_tx_event log %2d: %s\n", i-4, &ep_ring->log_b[(ep_ring->log_i + i) % 5][0]);
 
 				return -ESHUTDOWN;
 			}
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index ebd0afd59a60..cec743d86a3b 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1361,6 +1361,14 @@ struct xhci_ring {
 	enum xhci_ring_type	type;
 	bool			last_td_was_short;
 	struct radix_tree_root	*trb_address_map;
+
+	dma_addr_t		cleanup_first_dma;
+	dma_addr_t		cleanup_last_dma;
+	const char		*cleanup_from;
+	int			cleanup_status;
+
+	int			log_i;
+	char			log_b[5][100];
 };
 
 struct xhci_erst_entry {

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03  7:48         ` Strange issues with UAS URB cancellation Michał Pecio
@ 2024-09-03 12:55           ` Marc SCHAEFER
  2024-09-03 13:22             ` Michał Pecio
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-03 12:55 UTC (permalink / raw)
  To: Micha?? Pecio; +Cc: linux-usb

Hello,


On Tue, Sep 03, 2024 at 09:48:22AM +0200, Micha?? Pecio wrote:
> [ I changed the title a little. And please use "reply to all" when

I hope it now works like you want it.

> I can offer this patch for testing (on 6.11-rc5), which tries to:

Thank you, I applied + recompiled.  After a while (~1 hour), I got a lot of
messages in dmesg, slowing down the md raidcheck to a crawl.

So, after about two hours of md raidcheck of the array at ~330 MByte/s:

Sep  3 13:53:43 video kernel: [ 7193.705199] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 12 ep 10
Sep  3 13:53:43 video kernel: [ 7193.705209] xhci_hcd 0000:01:00.0: Stop ep completion ctx error, ep is running
Sep  3 13:53:43 video kernel: [ 7193.705461] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 12 ep 10
Sep  3 13:53:43 video kernel: [ 7193.705469] xhci_hcd 0000:01:00.0: Stop ep completion ctx error, ep is running
Sep  3 13:53:43 video kernel: [ 7193.705754] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 12 ep 10
Sep  3 13:53:43 video kernel: [ 7193.705759] xhci_hcd 0000:01:00.0: Stop ep completion ctx error, ep is running

[ ... and many like this ... ]

then it gets more interesting:

Sep  3 13:53:55 video kernel: [ 7205.983631] sd 18:0:0:0: [sde] tag#5 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN 
Sep  3 13:53:55 video kernel: [ 7205.983635] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 00 8c cf e4 00 00 00 04 00 00 00
Sep  3 13:54:21 video kernel: [ 7232.213557] usb 3-1.1.4: stat urb: no pending cmd for uas-tag 6

then the kernel complains about:

Sep  3 13:56:41 video kernel: [ 7371.838240] INFO: task smartd:906 blocked for more than 120 seconds.
Sep  3 13:56:41 video kernel: [ 7371.838246]       Tainted: G            E      6.11.0-rc5 #2
Sep  3 13:56:41 video kernel: [ 7371.838248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  3 13:56:41 video kernel: [ 7371.838249] task:smartd          state:D stack:0     pid:906   tgid:906   ppid:1      flags:0x00000000

(and a few other utility programs, including the md0_resync kernel task)

then:

Sep  3 14:00:42 video kernel: [ 7613.502312] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

Thank you :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03 12:55           ` Marc SCHAEFER
@ 2024-09-03 13:22             ` Michał Pecio
  2024-09-03 13:50               ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Pecio @ 2024-09-03 13:22 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: linux-usb

[-- Attachment #1: Type: text/plain, Size: 637 bytes --]

> I hope it now works like you want it.
Yes, that was it.

> Thank you, I applied + recompiled.  After a while (~1 hour), I got a
> lot of messages in dmesg, slowing down the md raidcheck to a crawl.
Sorry about that, this was not intended. It looks like your controller
doesn't tolerate the NEC workaround.

Here's a fixed patch which shouldn't have this effect. It only prints
a warning that something fishy is going on and lets it happen. We will
see if it's correlated with the other problems or not.


If using the patch utility, you will need to undo the original change
with -R flag before applying this new one.

Regards,
Michal

[-- Attachment #2: strange-debug-2.patch --]
[-- Type: text/x-patch, Size: 4004 bytes --]

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4ea2c3e072a9..0a055737ec33 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -814,11 +814,16 @@ static void xhci_unmap_td_bounce_buffer(struct xhci_hcd *xhci,
 	seg->bounce_offs = 0;
 }
 
-static int xhci_td_cleanup(struct xhci_hcd *xhci, struct xhci_td *td,
-			   struct xhci_ring *ep_ring, int status)
+static int __xhci_td_cleanup(struct xhci_hcd *xhci, struct xhci_td *td,
+			   struct xhci_ring *ep_ring, int status, const char *from)
 {
 	struct urb *urb = NULL;
 
+	ep_ring->cleanup_first_dma = xhci_trb_virt_to_dma(td->start_seg, td->first_trb);
+	ep_ring->cleanup_last_dma = xhci_trb_virt_to_dma(td->last_trb_seg, td->last_trb);
+	ep_ring->cleanup_status = status;
+	ep_ring->cleanup_from = from;
+
 	/* Clean up the endpoint's TD list */
 	urb = td->urb;
 
@@ -862,6 +867,8 @@ static int xhci_td_cleanup(struct xhci_hcd *xhci, struct xhci_td *td,
 	return 0;
 }
 
+#define xhci_td_cleanup(xhci, td, ep_ring, status) __xhci_td_cleanup(xhci, td, ep_ring, status, __func__)
+
 
 /* Complete the cancelled URBs we unlinked from td_list. */
 static void xhci_giveback_invalidated_tds(struct xhci_virt_ep *ep)
@@ -1155,12 +1162,14 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 			 * Context Error while continuing to process TRBs.
 			 * Be conservative and trust EP_CTX_STATE on other chips.
 			 */
+			xhci_info(xhci, "Stop Endpoint Context Error & Stopped on slot %d ep %d\n",
+					slot_id, ep_index);
 			if (!(xhci->quirks & XHCI_NEC_HOST))
 				break;
 			fallthrough;
 		case EP_STATE_RUNNING:
 			/* Race, HW handled stop ep cmd before ep was running */
-			xhci_dbg(xhci, "Stop ep completion ctx error, ep is running\n");
+			xhci_info(xhci, "Stop ep completion ctx error, ep is running\n");
 
 			command = xhci_alloc_command(xhci, false, GFP_ATOMIC);
 			if (!command) {
@@ -1176,6 +1185,11 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 		}
 	}
 
+	int state = GET_EP_CTX_STATE(ep_ctx);
+	if (comp_code != COMP_SUCCESS || state != EP_STATE_STOPPED)
+		xhci_err(xhci, "Stop Endpoint on slot %d ep %d failed with comp %d ctx_state %d\n",
+				slot_id, ep_index, comp_code, state);
+
 	/* will queue a set TR deq if stopped on a cancelled, uncleared TD */
 	xhci_invalidate_cancelled_tds(ep);
 	ep->ep_state &= ~EP_STOP_CMD_PENDING;
@@ -2636,6 +2650,11 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	if (!ep_ring)
 		return handle_transferless_tx_event(xhci, ep, trb_comp_code);
 
+	snprintf(&ep_ring->log_b[ep_ring->log_i++][0], 100,
+			"event %px ep_trb_dma %llx comp_code %d len %d slot %d ep %d",
+			event, ep_trb_dma, trb_comp_code, EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)), slot_id, ep_index);
+	ep_ring->log_i %= 5;
+
 	/* Look for common error cases */
 	switch (trb_comp_code) {
 	/* Skip codes that require special handling depending on
@@ -2863,6 +2882,11 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 					"comp_code %u\n", ep_index,
 					trb_comp_code);
 				trb_in_td(xhci, td, ep_trb_dma, true);
+				xhci_info(xhci, "last xhci_td_cleanup: first_dma %llx last_dma %llx status %d from %s\n",
+						ep_ring->cleanup_first_dma, ep_ring->cleanup_last_dma,
+						ep_ring->cleanup_status, ep_ring->cleanup_from);
+				for (int i = 0; i < 5; i++)
+					xhci_info(xhci, "handle_tx_event log %2d: %s\n", i-4, &ep_ring->log_b[(ep_ring->log_i + i) % 5][0]);
 
 				return -ESHUTDOWN;
 			}
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index ebd0afd59a60..cec743d86a3b 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1361,6 +1361,14 @@ struct xhci_ring {
 	enum xhci_ring_type	type;
 	bool			last_td_was_short;
 	struct radix_tree_root	*trb_address_map;
+
+	dma_addr_t		cleanup_first_dma;
+	dma_addr_t		cleanup_last_dma;
+	const char		*cleanup_from;
+	int			cleanup_status;
+
+	int			log_i;
+	char			log_b[5][100];
 };
 
 struct xhci_erst_entry {

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03 13:22             ` Michał Pecio
@ 2024-09-03 13:50               ` Marc SCHAEFER
  2024-09-03 13:52                 ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-03 13:50 UTC (permalink / raw)
  To: Micha?? Pecio; +Cc: linux-usb

Re,

On Tue, Sep 03, 2024 at 03:22:18PM +0200, Micha?? Pecio wrote:
> > Thank you, I applied + recompiled.  After a while (~1 hour), I got a
> > lot of messages in dmesg, slowing down the md raidcheck to a crawl.
> Sorry about that, this was not intended. It looks like your controller
> doesn't tolerate the NEC workaround.

It's a test system, no worry.

Just patched as instructed & recompiled & re-run the same test, probably won't
hear from me for the next few hours.

Sep  3 15:49:32 video kernel: [  110.660593] md: check of RAID array md0

Have a nice evening!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03 13:50               ` Marc SCHAEFER
@ 2024-09-03 13:52                 ` Marc SCHAEFER
  2024-09-03 13:55                   ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-03 13:52 UTC (permalink / raw)
  To: Micha?? Pecio; +Cc: linux-usb

On Tue, Sep 03, 2024 at 03:50:17PM +0200, Marc SCHAEFER wrote:
> Just patched as instructed & recompiled & re-run the same test, probably won't
> hear from me for the next few hours.
> 
> Sep  3 15:49:32 video kernel: [  110.660593] md: check of RAID array md0

Well:

Sep  3 15:50:43 video kernel: [  181.320257] sd 15:0:0:0: [sdb] tag#5 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN 
Sep  3 15:50:43 video kernel: [  181.320268] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 00 d5 10 00 00 00 04 00 00 00
Sep  3 15:50:43 video kernel: [  181.321080] sd 15:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN 
Sep  3 15:50:43 video kernel: [  181.321088] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 d5 08 00 00 00 04 00 00 00
Sep  3 15:50:43 video kernel: [  181.321897] sd 15:0:0:0: [sdb] tag#3 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN 
Sep  3 15:50:43 video kernel: [  181.321905] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 14 00 00 00 04 00 00 00
Sep  3 15:50:43 video kernel: [  181.322675] sd 15:0:0:0: [sdb] tag#1 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
Sep  3 15:50:43 video kernel: [  181.322683] sd 15:0:0:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 00 d5 0c 00 00 00 04 00 00 00
Sep  3 15:50:43 video kernel: [  181.352247] scsi host15: uas_eh_device_reset_handler start
Sep  3 15:50:43 video kernel: [  181.444451] usb 3-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
Sep  3 15:50:43 video kernel: [  181.491830] scsi host15: uas_eh_device_reset_handler success

Sep  3 15:51:14 video kernel: [  212.052745] scsi host15: uas_eh_device_reset_handler start
Sep  3 15:51:14 video kernel: [  212.057515] sd 15:0:0:0: [sdb] tag#3 uas_zap_pending 0 uas-tag 3 inflight: CMD 
Sep  3 15:51:14 video kernel: [  212.057524] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 10 00 00 00 04 00 00 00
Sep  3 15:51:14 video kernel: [  212.057529] sd 15:0:0:0: [sdb] tag#4 uas_zap_pending 0 uas-tag 4 inflight: CMD 
Sep  3 15:51:14 video kernel: [  212.057531] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 d5 08 00 00 00 04 00 00 00
Sep  3 15:51:14 video kernel: [  212.057534] sd 15:0:0:0: [sdb] tag#0 uas_zap_pending 0 uas-tag 5 inflight: CMD 
Sep  3 15:51:14 video kernel: [  212.057537] sd 15:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 d5 14 00 00 00 04 00 00 00
Sep  3 15:51:14 video kernel: [  212.057540] sd 15:0:0:0: [sdb] tag#5 uas_zap_pending 0 uas-tag 6 inflight: CMD 
Sep  3 15:51:14 video kernel: [  212.057542] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 00 d5 0c 00 00 00 04 00 00 00
Sep  3 15:51:14 video kernel: [  212.148874] usb 3-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
Sep  3 15:51:14 video kernel: [  212.196890] scsi host15: uas_eh_device_reset_handler success

Sep  3 15:51:44 video kernel: [  242.757206] scsi host15: uas_eh_device_reset_handler start
Sep  3 15:51:44 video kernel: [  242.761872] sd 15:0:0:0: [sdb] tag#4 uas_zap_pending 0 uas-tag 3 inflight: CMD 
Sep  3 15:51:44 video kernel: [  242.761881] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 d5 14 00 00 00 04 00 00 00
Sep  3 15:51:44 video kernel: [  242.761885] sd 15:0:0:0: [sdb] tag#0 uas_zap_pending 0 uas-tag 4 inflight: CMD 
Sep  3 15:51:44 video kernel: [  242.761888] sd 15:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 d5 10 00 00 00 04 00 00 00
Sep  3 15:51:44 video kernel: [  242.761891] sd 15:0:0:0: [sdb] tag#1 uas_zap_pending 0 uas-tag 5 inflight: CMD 
Sep  3 15:51:44 video kernel: [  242.761893] sd 15:0:0:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 00 d5 08 00 00 00 04 00 00 00
Sep  3 15:51:44 video kernel: [  242.761896] sd 15:0:0:0: [sdb] tag#5 uas_zap_pending 0 uas-tag 6 inflight: CMD 
Sep  3 15:51:44 video kernel: [  242.761898] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 00 d5 0c 00 00 00 04 00 00 00
Sep  3 15:51:44 video kernel: [  242.853347] usb 3-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
Sep  3 15:51:44 video kernel: [  242.901194] scsi host15: uas_eh_device_reset_handler success

Sep  3 15:52:15 video kernel: [  273.493674] scsi host15: uas_eh_device_reset_handler start
Sep  3 15:52:15 video kernel: [  273.498377] sd 15:0:0:0: [sdb] tag#3 uas_zap_pending 0 uas-tag 3 inflight: CMD 
Sep  3 15:52:15 video kernel: [  273.498387] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 10 00 00 00 04 00 00 00
Sep  3 15:52:15 video kernel: [  273.498391] sd 15:0:0:0: [sdb] tag#4 uas_zap_pending 0 uas-tag 4 inflight: CMD 
Sep  3 15:52:15 video kernel: [  273.498394] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 d5 08 00 00 00 04 00 00 00
Sep  3 15:52:15 video kernel: [  273.498397] sd 15:0:0:0: [sdb] tag#0 uas_zap_pending 0 uas-tag 5 inflight: CMD 
Sep  3 15:52:15 video kernel: [  273.498399] sd 15:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 d5 14 00 00 00 04 00 00 00
Sep  3 15:52:15 video kernel: [  273.498402] sd 15:0:0:0: [sdb] tag#5 uas_zap_pending 0 uas-tag 6 inflight: CMD 
Sep  3 15:52:15 video kernel: [  273.498404] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 00 d5 0c 00 00 00 04 00 00 00
Sep  3 15:52:15 video kernel: [  273.841780] usb 3-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
Sep  3 15:52:15 video kernel: [  273.889421] scsi host15: uas_eh_device_reset_handler success

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03 13:52                 ` Marc SCHAEFER
@ 2024-09-03 13:55                   ` Marc SCHAEFER
  2024-09-03 15:45                     ` Michał Pecio
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-03 13:55 UTC (permalink / raw)
  To: Micha?? Pecio; +Cc: linux-usb

On Tue, Sep 03, 2024 at 03:52:43PM +0200, Marc SCHAEFER wrote:
> Sep  3 15:52:15 video kernel: [  273.889421] scsi host15: uas_eh_device_reset_handler success

and:

Sep  3 15:52:34 video kernel: [  293.012852] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 9 ep 10
Sep  3 15:52:34 video kernel: [  293.012862] xhci_hcd 0000:01:00.0: Stop Endpoint on slot 9 ep 10 failed with comp 19 ctx_state 3
Sep  3 15:52:46 video kernel: [  304.214157] scsi host15: uas_eh_device_reset_handler start
Sep  3 15:52:46 video kernel: [  304.216530] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 9 ep 10
Sep  3 15:52:46 video kernel: [  304.216538] xhci_hcd 0000:01:00.0: Stop Endpoint on slot 9 ep 10 failed with comp 19 ctx_state 3
Sep  3 15:52:46 video kernel: [  304.240297] sd 15:0:0:0: [sdb] tag#1 uas_zap_pending 0 uas-tag 3 inflight: CMD 
Sep  3 15:52:46 video kernel: [  304.240302] sd 15:0:0:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 00 d5 14 00 00 00 04 00 00 00
Sep  3 15:52:46 video kernel: [  304.240303] sd 15:0:0:0: [sdb] tag#2 uas_zap_pending 0 uas-tag 4 inflight: CMD 
Sep  3 15:52:46 video kernel: [  304.240304] sd 15:0:0:0: [sdb] tag#2 CDB: Read(16) 88 00 00 00 00 00 00 d5 10 00 00 00 04 00 00 00
Sep  3 15:52:46 video kernel: [  304.240306] sd 15:0:0:0: [sdb] tag#3 uas_zap_pending 0 uas-tag 5 inflight: CMD 
Sep  3 15:52:46 video kernel: [  304.240307] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 08 00 00 00 04 00 00 00
Sep  3 15:52:46 video kernel: [  304.240308] sd 15:0:0:0: [sdb] tag#4 uas_zap_pending 0 uas-tag 6 inflight: CMD 
Sep  3 15:52:46 video kernel: [  304.240309] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 d5 0c 00 00 00 04 00 00 00
Sep  3 15:52:46 video kernel: [  304.248055] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 9 ep 10 with no TDs queued?
Sep  3 15:52:46 video kernel: [  304.334450] usb 3-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
Sep  3 15:52:46 video kernel: [  304.381956] scsi host15: uas_eh_device_reset_handler success
Sep  3 15:53:05 video kernel: [  323.565319] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 9 ep 10
Sep  3 15:53:05 video kernel: [  323.565329] xhci_hcd 0000:01:00.0: Stop Endpoint on slot 9 ep 10 failed with comp 19 ctx_state 3
Sep  3 15:53:05 video kernel: [  323.565691] sd 15:0:0:0: [sdb] tag#4 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=173s
Sep  3 15:53:05 video kernel: [  323.565698] sd 15:0:0:0: [sdb] tag#4 Sense Key : Illegal Request [current] 
Sep  3 15:53:05 video kernel: [  323.565701] sd 15:0:0:0: [sdb] tag#4 Add. Sense: Invalid command operation code
Sep  3 15:53:05 video kernel: [  323.565705] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 d5 00 00 00 00 04 00 00 00
Sep  3 15:53:05 video kernel: [  323.565707] critical target error, dev sdb, sector 13959168 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
Sep  3 15:53:24 video kernel: [  342.838610] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 9 ep 10
Sep  3 15:53:24 video kernel: [  342.838619] xhci_hcd 0000:01:00.0: Stop Endpoint on slot 9 ep 10 failed with comp 19 ctx_state 3
Sep  3 15:53:37 video kernel: [  355.402892] sd 15:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
Sep  3 15:53:37 video kernel: [  355.402898] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 d5 18 00 00 00 04 00 00 00
Sep  3 15:53:37 video kernel: [  355.403050] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 9 ep 10
Sep  3 15:53:37 video kernel: [  355.403054] xhci_hcd 0000:01:00.0: Stop Endpoint on slot 9 ep 10 failed with comp 19 ctx_state 3
Sep  3 15:53:37 video kernel: [  355.426900] scsi host15: uas_eh_device_reset_handler start
Sep  3 15:53:37 video kernel: [  355.434945] sd 15:0:0:0: [sdb] tag#3 uas_zap_pending 0 uas-tag 5 inflight: CMD 
Sep  3 15:53:37 video kernel: [  355.434951] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 08 00 00 00 04 00 00 00
Sep  3 15:53:37 video kernel: [  355.434953] sd 15:0:0:0: [sdb] tag#5 uas_zap_pending 0 uas-tag 6 inflight: CMD 
Sep  3 15:53:37 video kernel: [  355.434954] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 00 d5 0c 00 00 00 04 00 00 00
Sep  3 15:53:37 video kernel: [  355.526944] usb 3-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
Sep  3 15:53:37 video kernel: [  355.573871] scsi host15: uas_eh_device_reset_handler success
Sep  3 15:53:37 video kernel: [  355.638997] sd 15:0:0:0: [sdb] tag#2 timing out command, waited 180s
Sep  3 15:53:37 video kernel: [  355.639014] sd 15:0:0:0: [sdb] tag#2 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=205s
Sep  3 15:53:37 video kernel: [  355.639017] sd 15:0:0:0: [sdb] tag#2 Sense Key : Not Ready [current] 
Sep  3 15:53:37 video kernel: [  355.639019] sd 15:0:0:0: [sdb] tag#2 Add. Sense: Logical unit is in process of becoming ready
Sep  3 15:53:37 video kernel: [  355.639022] sd 15:0:0:0: [sdb] tag#2 CDB: Read(16) 88 00 00 00 00 00 00 d5 10 00 00 00 04 00 00 00
Sep  3 15:53:37 video kernel: [  355.639024] I/O error, dev sdb, sector 13963264 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
Sep  3 15:53:37 video kernel: [  355.639043] sd 15:0:0:0: [sdb] tag#3 timing out command, waited 180s
Sep  3 15:53:37 video kernel: [  355.639046] sd 15:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=205s
Sep  3 15:53:37 video kernel: [  355.639048] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 08 00 00 00 04 00 00 00
Sep  3 15:53:37 video kernel: [  355.639049] I/O error, dev sdb, sector 13961216 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
Sep  3 15:53:37 video kernel: [  355.639059] sd 15:0:0:0: [sdb] tag#5 timing out command, waited 180s
Sep  3 15:53:37 video kernel: [  355.639061] sd 15:0:0:0: [sdb] tag#5 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=205s
Sep  3 15:53:37 video kernel: [  355.639063] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 00 d5 0c 00 00 00 04 00 00 00
Sep  3 15:53:37 video kernel: [  355.639064] I/O error, dev sdb, sector 13962240 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
Sep  3 15:53:56 video kernel: [  374.766217] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 9 ep 10
Sep  3 15:53:56 video kernel: [  374.766227] xhci_hcd 0000:01:00.0: Stop Endpoint on slot 9 ep 10 failed with comp 19 ctx_state 3
Sep  3 15:53:56 video kernel: [  374.766587] sd 15:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=225s
Sep  3 15:53:56 video kernel: [  374.766593] sd 15:0:0:0: [sdb] tag#3 Sense Key : Illegal Request [current] 
Sep  3 15:53:56 video kernel: [  374.766596] sd 15:0:0:0: [sdb] tag#3 Add. Sense: Invalid command operation code
Sep  3 15:53:56 video kernel: [  374.766601] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 04 00 00 00 04 00 00 00
Sep  3 15:53:56 video kernel: [  374.766603] critical target error, dev sdb, sector 13960192 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03 13:55                   ` Marc SCHAEFER
@ 2024-09-03 15:45                     ` Michał Pecio
  2024-09-03 19:40                       ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Michał Pecio @ 2024-09-03 15:45 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: linux-usb

> On Tue, Sep 03, 2024 at 03:52:43PM +0200, Marc SCHAEFER wrote:
> Sep  3 15:53:37 video kernel: [  355.639064] I/O error, dev sdb, sector 13962240 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0
> Sep  3 15:53:56 video kernel: [  374.766217] xhci_hcd 0000:01:00.0: Stop Endpoint Context Error & Stopped on slot 9 ep 10
> Sep  3 15:53:56 video kernel: [  374.766227] xhci_hcd 0000:01:00.0: Stop Endpoint on slot 9 ep 10 failed with comp 19 ctx_state 3
> Sep  3 15:53:56 video kernel: [  374.766587] sd 15:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=225s
> Sep  3 15:53:56 video kernel: [  374.766593] sd 15:0:0:0: [sdb] tag#3 Sense Key : Illegal Request [current] 
> Sep  3 15:53:56 video kernel: [  374.766596] sd 15:0:0:0: [sdb] tag#3 Add. Sense: Invalid command operation code
> Sep  3 15:53:56 video kernel: [  374.766601] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 00 d5 04 00 00 00 04 00 00 00
> Sep  3 15:53:56 video kernel: [  374.766603] critical target error, dev sdb, sector 13960192 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 0

Hmm, this is possibly not a coincidence, but it's also not the same
"ERROR Transfer event TRB DMA ptr not part of current TD" that happened
last week. Let me know if you see the other one, because those messages
should now contain more information thanks to the patch.

But this is interesting in its own right, I will have to think about it.
Maybe it's not a HW bug but a SW one, maybe also on my NEC controller.

Regards,
Michal

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03 15:45                     ` Michał Pecio
@ 2024-09-03 19:40                       ` Marc SCHAEFER
  2024-09-04 14:26                         ` Mathias Nyman
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-03 19:40 UTC (permalink / raw)
  To: Micha?? Pecio; +Cc: linux-usb

Re, 

On Tue, Sep 03, 2024 at 05:45:35PM +0200, Micha?? Pecio wrote:
> Hmm, this is possibly not a coincidence, but it's also not the same
> "ERROR Transfer event TRB DMA ptr not part of current TD" that happened

Got one:

Sep  3 21:32:58 video kernel: [11408.230466] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Sep  3 21:32:58 video udisksd[962]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K6KTRC7F: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Sep  3 21:32:58 video kernel: [11408.247189] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Sep  3 21:32:58 video kernel: [11408.247197] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000d9911140 trb-start 00000000d9911150 trb-end 00000000d9911940 seg-start 00000000d9911000 seg-end 00000000d9911ff0
Sep  3 21:32:58 video kernel: [11408.247199] xhci_hcd 0000:01:00.0: last xhci_td_cleanup: first_dma d9911140 last_dma d9911140 status 0 from xhci_handle_cmd_set_deq
Sep  3 21:32:58 video kernel: [11408.247201] xhci_hcd 0000:01:00.0: handle_tx_event log -4: event ffff9e6659920b10 ep_trb_dma d9911910 comp_code 1 len 0 slot 9 ep 10
Sep  3 21:32:58 video kernel: [11408.247203] xhci_hcd 0000:01:00.0: handle_tx_event log -3: event ffff9e66599210b0 ep_trb_dma d9910120 comp_code 1 len 0 slot 9 ep 10
Sep  3 21:32:58 video kernel: [11408.247204] xhci_hcd 0000:01:00.0: handle_tx_event log -2: event ffff9e66599215c0 ep_trb_dma d9910920 comp_code 1 len 0 slot 9 ep 10
Sep  3 21:32:58 video kernel: [11408.247204] xhci_hcd 0000:01:00.0: handle_tx_event log -1: event ffff9e6659921a60 ep_trb_dma d9911130 comp_code 1 len 0 slot 9 ep 10
Sep  3 21:32:58 video kernel: [11408.247205] xhci_hcd 0000:01:00.0: handle_tx_event log  0: event ffff9e6659920440 ep_trb_dma d9911140 comp_code 3 len 0 slot 9 ep 10
Sep  3 21:32:58 video udisksd[962]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K5ZZ7U4J: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
Sep  3 21:33:29 video kernel: [11439.353265] sd 15:0:0:0: [sdb] tag#5 uas_eh_abort_handler 0 uas-tag 4 inflight: IN
Sep  3 21:33:29 video kernel: [11439.353283] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 ca c1 6c 00 00 00 04 00 00 00
Sep  3 21:33:29 video kernel: [11439.354066] sd 15:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
Sep  3 21:33:29 video kernel: [11439.354073] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 ca c1 88 00 00 00 04 00 00 00
Sep  3 21:33:29 video kernel: [11439.354864] sd 15:0:0:0: [sdb] tag#3 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
Sep  3 21:33:29 video kernel: [11439.354872] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 ca c1 8c 00 00 00 04 00 00 00
Sep  3 21:33:29 video kernel: [11439.355647] sd 15:0:0:0: [sdb] tag#0 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
Sep  3 21:33:29 video kernel: [11439.355655] sd 15:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 ca c1 90 00 00 00 04 00 00 00
Sep  3 21:33:29 video kernel: [11439.385194] scsi host15: uas_eh_device_reset_handler start
Sep  3 21:33:29 video kernel: [11439.477362] usb 4-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
Sep  3 21:33:29 video kernel: [11439.524986] scsi host15: uas_eh_device_reset_handler success

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-03 19:40                       ` Marc SCHAEFER
@ 2024-09-04 14:26                         ` Mathias Nyman
  2024-09-04 16:36                           ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Mathias Nyman @ 2024-09-04 14:26 UTC (permalink / raw)
  To: Marc SCHAEFER, Micha?? Pecio; +Cc: linux-usb

Hi

Looks like there are two cases that might be related.

One is that the device seems to send more data than host asks for.
This triggers a babble error pointing to a already returned transfer, so error
does not get properly handled.

Other issue is the "WARN Set TR Deq Ptr cmd invalid because of stream ID configuration" error.

xhci driver queues a set TR Deq Ptr command when canceling transfers.
This error shown if there is an TRB_ERROR in the actual command we queue.

I can start working on some debugging patches as well if you have the time to try
them out.

More details inlined in log below:

On 3.9.2024 22.40, Marc SCHAEFER wrote:
> Re,
> 
> On Tue, Sep 03, 2024 at 05:45:35PM +0200, Micha?? Pecio wrote:
>> Hmm, this is possibly not a coincidence, but it's also not the same
>> "ERROR Transfer event TRB DMA ptr not part of current TD" that happened
> 
> Got one:
> 
> Sep  3 21:32:58 video kernel: [11408.230466] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
Set TR Deq command completes with TRB_ERROR, meaning the command xhci driver queues was faulty.
I'm guessing we somehow mess up the stream ID when xhci driver craetes the TRB.

> Sep  3 21:32:58 video udisksd[962]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WDC_WD40EURX_63BMCY0_WD_WCC7K6KTRC7F: Error updating SMART data: sk_disk_smart_read_data: Operation not supported (udisks-error-quark, 0)
> Sep  3 21:32:58 video kernel: [11408.247189] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
Transfer event with comp_code 3 which is "Babble Detected Error", but the event doesn't point to a pending transfer

> Sep  3 21:32:58 video kernel: [11408.247197] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000d9911140 trb-start 00000000d9911150 trb-end 00000000d9911940 seg-start 00000000d9911000 seg-end 00000000d9911ff0
The "Babble Detected Error" event points to transfer at 0000000d9911140,
this is one transfer block before the expected trasnfer 0000000d9911150.

This means the Babble Detected Error was intended for the previous transfers, which xhci driver has
already given back to class driver.

A Babble error will halt the endpoint, but xhci driver doesn't recover the endpoint as
event doesn't map to any transfer. This needs to be fixed.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-04 14:26                         ` Mathias Nyman
@ 2024-09-04 16:36                           ` Marc SCHAEFER
  2024-09-05 13:52                             ` Mathias Nyman
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-04 16:36 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Micha?? Pecio, linux-usb

Hello,

On Wed, Sep 04, 2024 at 05:26:28PM +0300, Mathias Nyman wrote:
> I can start working on some debugging patches as well if you have the time to try

Yes, with pleasure.  I often have time to recompile stuff, and I can leave the test
system running.

Have a nice evening.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-04 16:36                           ` Marc SCHAEFER
@ 2024-09-05 13:52                             ` Mathias Nyman
  2024-09-05 15:01                               ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Mathias Nyman @ 2024-09-05 13:52 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: Micha?? Pecio, linux-usb

[-- Attachment #1: Type: text/plain, Size: 5704 bytes --]

On 4.9.2024 19.36, Marc SCHAEFER wrote:
> Hello,
> 
> On Wed, Sep 04, 2024 at 05:26:28PM +0300, Mathias Nyman wrote:
>> I can start working on some debugging patches as well if you have the time to try
> 
> Yes, with pleasure.  I often have time to recompile stuff, and I can leave the test
> system running.

Thanks, testpatch below, and also as attachment.

8<---

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index efdf4c228b8c..b5f97f75a5ff 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1749,14 +1749,14 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
  
         i = urb_priv->num_tds_done;
         if (i < urb_priv->num_tds)
-               xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
-                               "Cancel URB %p, dev %s, ep 0x%x, "
-                               "starting at offset 0x%llx",
-                               urb, urb->dev->devpath,
-                               urb->ep->desc.bEndpointAddress,
-                               (unsigned long long) xhci_trb_virt_to_dma(
-                                       urb_priv->td[i].start_seg,
-                                       urb_priv->td[i].first_trb));
+               xhci_warn(xhci,
+                         "Cancel URB %p, dev %s, ep 0x%x, stream_id %u starting at offset 0x%llx",
+                         urb, urb->dev->devpath,
+                         urb->ep->desc.bEndpointAddress,
+                         urb->stream_id,
+                         (unsigned long long) xhci_trb_virt_to_dma(
+                                 urb_priv->td[i].start_seg,
+                                 urb_priv->td[i].first_trb));
  
         for (; i < urb_priv->num_tds; i++) {
                 td = &urb_priv->td[i];
(END)
+       char str[XHCI_MSG_MAX];
  
         ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
         stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2]));
@@ -1351,6 +1352,11 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
         if (!ep_ring) {
                 xhci_warn(xhci, "WARN Set TR deq ptr command for freed stream ID %u\n",
                                 stream_id);
+               xhci_warn(xhci, "MN: %s\n" ,
+                            xhci_decode_trb(str, XHCI_MSG_MAX, le32_to_cpu(trb->generic.field[0]),
+                                           le32_to_cpu(trb->generic.field[1]),
+                                           le32_to_cpu(trb->generic.field[2]),
+                                          le32_to_cpu(trb->generic.field[3])));
                 /* XXX: Harmless??? */
                 goto cleanup;
         }
@@ -1386,6 +1392,12 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
                                         cmd_comp_code);
                         break;
                 }
+               xhci_warn(xhci, "MN: %s\n",
+                         xhci_decode_trb(str, XHCI_MSG_MAX, le32_to_cpu(trb->generic.field[0]),
+                                         le32_to_cpu(trb->generic.field[1]),
+                                         le32_to_cpu(trb->generic.field[2]),
+                                         le32_to_cpu(trb->generic.field[3])));
+
                 /* OK what do we do now?  The endpoint state is hosed, and we
                  * should never get to this point if the synchronization between
                  * queueing, and endpoint state are correct.  This might happen
@@ -2864,6 +2876,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
                                         trb_comp_code);
                                 trb_in_td(xhci, td, ep_trb_dma, true);
  
+                               if (xhci_halted_host_endpoint(ep_ctx, trb_comp_code)) {
+                                       xhci_err(xhci, "MN: No TD found, fix halted ep");
+                                       xhci_handle_halted_endpoint(xhci, ep, NULL, EP_HARD_RESET);
+                               } else {
+                                       xhci_err(xhci, "MN: No TD found, ep not halted");
+                               }
                                 return -ESHUTDOWN;
                         }
                 }
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index efdf4c228b8c..b5f97f75a5ff 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1749,14 +1749,14 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
  
         i = urb_priv->num_tds_done;
         if (i < urb_priv->num_tds)
-               xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
-                               "Cancel URB %p, dev %s, ep 0x%x, "
-                               "starting at offset 0x%llx",
-                               urb, urb->dev->devpath,
-                               urb->ep->desc.bEndpointAddress,
-                               (unsigned long long) xhci_trb_virt_to_dma(
-                                       urb_priv->td[i].start_seg,
-                                       urb_priv->td[i].first_trb));
+               xhci_warn(xhci,
+                         "Cancel URB %p, dev %s, ep 0x%x, stream_id %u starting at offset 0x%llx",
+                         urb, urb->dev->devpath,
+                         urb->ep->desc.bEndpointAddress,
+                         urb->stream_id,
+                         (unsigned long long) xhci_trb_virt_to_dma(
+                                 urb_priv->td[i].start_seg,
+                                 urb_priv->td[i].first_trb));
  
         for (; i < urb_priv->num_tds; i++) {
                 td = &urb_priv->td[i];

[-- Attachment #2: 0001-xhci-Debug-patch-handle-halted-ep-if-TD-is-not-found.patch --]
[-- Type: text/x-patch, Size: 4408 bytes --]

From bf7bbf8dbf92dc06e108a103f5f01b3f416339da Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Thu, 5 Sep 2024 16:20:22 +0300
Subject: [PATCH] xhci: Debug patch: handle halted ep if TD is not found.

A host side halted endpoint may not be recovered if the Babble error
event does not point to any pending  tranfer descriptor (TD).

Handle halted endpoint event if transfer event does not point to
any pending TD. The Babble error may have been for a TD that has already
been given back.

Aklso add debuiggoing for TRB_ERROR issues seen when queuing a
Set TR Deq pointer command

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 20 +++++++++++++++++++-
 drivers/usb/host/xhci.c      | 16 ++++++++--------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4ea2c3e072a9..4ee8b2b42ac5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1008,7 +1008,7 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 				if (cached_td) {
 					if (cached_td->urb->stream_id != td->urb->stream_id) {
 						/* Multiple streams case, defer move dq */
-						xhci_dbg(xhci,
+						xhci_warn(xhci,
 							 "Move dq deferred: stream %u URB %p\n",
 							 td->urb->stream_id, td->urb);
 						td->cancel_status = TD_CLEARING_CACHE_DEFERRED;
@@ -1340,6 +1340,7 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 	struct xhci_slot_ctx *slot_ctx;
 	struct xhci_td *td, *tmp_td;
 	bool deferred = false;
+	char str[XHCI_MSG_MAX];
 
 	ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
 	stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2]));
@@ -1351,6 +1352,11 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 	if (!ep_ring) {
 		xhci_warn(xhci, "WARN Set TR deq ptr command for freed stream ID %u\n",
 				stream_id);
+		xhci_warn(xhci, "MN: %s\n" ,
+			     xhci_decode_trb(str, XHCI_MSG_MAX, le32_to_cpu(trb->generic.field[0]),
+                                           le32_to_cpu(trb->generic.field[1]),
+                                           le32_to_cpu(trb->generic.field[2]),
+	                                   le32_to_cpu(trb->generic.field[3])));
 		/* XXX: Harmless??? */
 		goto cleanup;
 	}
@@ -1386,6 +1392,12 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 					cmd_comp_code);
 			break;
 		}
+		xhci_warn(xhci, "MN: %s\n",
+			  xhci_decode_trb(str, XHCI_MSG_MAX, le32_to_cpu(trb->generic.field[0]),
+					  le32_to_cpu(trb->generic.field[1]),
+					  le32_to_cpu(trb->generic.field[2]),
+					  le32_to_cpu(trb->generic.field[3])));
+
 		/* OK what do we do now?  The endpoint state is hosed, and we
 		 * should never get to this point if the synchronization between
 		 * queueing, and endpoint state are correct.  This might happen
@@ -2864,6 +2876,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 					trb_comp_code);
 				trb_in_td(xhci, td, ep_trb_dma, true);
 
+				if (xhci_halted_host_endpoint(ep_ctx, trb_comp_code)) {
+					xhci_err(xhci, "MN: No TD found, fix halted ep");
+					xhci_handle_halted_endpoint(xhci, ep, NULL, EP_HARD_RESET);
+				} else {
+					xhci_err(xhci, "MN: No TD found, ep not halted");
+				}
 				return -ESHUTDOWN;
 			}
 		}
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index efdf4c228b8c..b5f97f75a5ff 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1749,14 +1749,14 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 
 	i = urb_priv->num_tds_done;
 	if (i < urb_priv->num_tds)
-		xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
-				"Cancel URB %p, dev %s, ep 0x%x, "
-				"starting at offset 0x%llx",
-				urb, urb->dev->devpath,
-				urb->ep->desc.bEndpointAddress,
-				(unsigned long long) xhci_trb_virt_to_dma(
-					urb_priv->td[i].start_seg,
-					urb_priv->td[i].first_trb));
+		xhci_warn(xhci,
+			  "Cancel URB %p, dev %s, ep 0x%x, stream_id %u starting at offset 0x%llx",
+			  urb, urb->dev->devpath,
+			  urb->ep->desc.bEndpointAddress,
+			  urb->stream_id,
+			  (unsigned long long) xhci_trb_virt_to_dma(
+				  urb_priv->td[i].start_seg,
+				  urb_priv->td[i].first_trb));
 
 	for (; i < urb_priv->num_tds; i++) {
 		td = &urb_priv->td[i];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-05 13:52                             ` Mathias Nyman
@ 2024-09-05 15:01                               ` Marc SCHAEFER
  2024-09-05 15:06                                 ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-05 15:01 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Micha?? Pecio, linux-usb

Hello,

On Thu, Sep 05, 2024 at 04:52:41PM +0300, Mathias Nyman wrote:
> From bf7bbf8dbf92dc06e108a103f5f01b3f416339da Mon Sep 17 00:00:00 2001
> From: Mathias Nyman <mathias.nyman@linux.intel.com>
> Date: Thu, 5 Sep 2024 16:20:22 +0300
> Subject: [PATCH] xhci: Debug patch: handle halted ep if TD is not found.

[  608.093834] xhci_hcd 0000:01:00.0: Cancel URB 000000000dbb6adc, dev 1.1.1, ep 0x85, stream_id 4 starting at offset 0xd99660d0
[  608.094084] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
[  608.094092] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 00000000d99660e3 stream 4 slot 9 ep 11 flags C
[  608.102907] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
[  608.102917] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000d99660d0 trb-start 00000000d99660e0 trb-end 00000000d99668d0 seg-start 00000000d9966000 seg-end 00000000d9966ff0
[  608.102932] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
[  615.255042] sd 15:0:0:0: [sdb] tag#1 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN 
[  615.255053] sd 15:0:0:0: [sdb] tag#1 CDB: ATA command pass through(12)/Blank a1 08 2e d0 01 00 4f c2 00 b0 00 00
[  615.255061] xhci_hcd 0000:01:00.0: Cancel URB 00000000d6ee9176, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd9964440
[  627.213658] xhci_hcd 0000:01:00.0: Cancel URB 000000003f481828, dev 1.1.1, ep 0x85, stream_id 4 starting at offset 0xd99660e0
[  627.213702] usb 3-1.1.1: stat urb: no pending cmd for uas-tag 3
[  640.087462] sd 15:0:0:0: [sdb] tag#5 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN 
[  640.087473] sd 15:0:0:0: [sdb] tag#5 CDB: Read(16) 88 00 00 00 00 00 0b b1 38 00 00 00 04 00 00 00
[  640.087480] xhci_hcd 0000:01:00.0: Cancel URB 0000000023438a34, dev 1.1.1, ep 0x85, stream_id 6 starting at offset 0xd996a7a0
[  640.088192] sd 15:0:0:0: [sdb] tag#4 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
[  640.088196] sd 15:0:0:0: [sdb] tag#4 CDB: Read(16) 88 00 00 00 00 00 0b b1 34 00 00 00 04 00 00 00
[  640.088199] xhci_hcd 0000:01:00.0: Cancel URB 000000004a1a39e1, dev 1.1.1, ep 0x85, stream_id 1 starting at offset 0xd99609a0
[  640.088906] sd 15:0:0:0: [sdb] tag#3 uas_eh_abort_handler 0 uas-tag 5 inflight: IN 
[  640.088910] sd 15:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 0b b1 1c 00 00 00 04 00 00 00
[  640.088913] xhci_hcd 0000:01:00.0: Cancel URB 0000000051d05311, dev 1.1.1, ep 0x85, stream_id 5 starting at offset 0xd9968c80
[  640.089630] sd 15:0:0:0: [sdb] tag#2 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN 
[  640.089633] sd 15:0:0:0: [sdb] tag#2 CDB: Read(16) 88 00 00 00 00 00 0b b1 3c 00 00 00 04 00 00 00
[  640.089636] xhci_hcd 0000:01:00.0: Cancel URB 0000000097b7903b, dev 1.1.1, ep 0x85, stream_id 2 starting at offset 0xd9962840
[  640.111444] scsi host15: uas_eh_device_reset_handler start
[  640.111455] xhci_hcd 0000:01:00.0: Cancel URB 0000000009cd9f08, dev 1.1.1, ep 0x87, stream_id 2 starting at offset 0xd999b580
[  640.111874] xhci_hcd 0000:01:00.0: Cancel URB 000000006f25243c, dev 1.1.1, ep 0x87, stream_id 6 starting at offset 0xd9988d60
[  640.112370] xhci_hcd 0000:01:00.0: Cancel URB 00000000e23080f4, dev 1.1.1, ep 0x87, stream_id 1 starting at offset 0x121d280
[  640.203759] usb 3-1.1.1: reset SuperSpeed Plus Gen 2x1 USB device number 6 using xhci_hcd
[  640.251136] scsi host15: uas_eh_device_reset_handler success
[  640.530396] xhci_hcd 0000:01:00.0: Cancel URB 00000000b1755fe5, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x1241110
[  640.530677] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
[  640.530686] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 0000000001241123 stream 2 slot 10 ep 11 flags C
[  640.601761] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
[  640.601773] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001241110 trb-start 0000000001241120 trb-end 0000000001241910 seg-start 0000000001241000 seg-end 0000000001241ff0
[  640.601776] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
[  659.704587] xhci_hcd 0000:01:00.0: Cancel URB 000000006e0f08cb, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x1241120
[  659.704641] xhci_hcd 0000:01:00.0: Cancel URB 00000000d606866e, dev 1.1.2, ep 0x85, stream_id 3 starting at offset 0x1242b10
[  659.704737] xhci_hcd 0000:01:00.0: Move dq deferred: stream 3 URB 00000000d606866e
[  659.797268] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 10 ep 10 with no TDs queued?
[  670.808024] sd 16:0:0:0: [sdc] tag#5 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN 
[  670.808041] sd 16:0:0:0: [sdc] tag#5 CDB: Read(16) 88 00 00 00 00 00 0b b1 e8 00 00 00 04 00 00 00
[  670.808049] xhci_hcd 0000:01:00.0: Cancel URB 00000000083b09de, dev 1.1.2, ep 0x85, stream_id 6 starting at offset 0x1249590
[  670.808671] sd 16:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN 
[  670.808680] sd 16:0:0:0: [sdc] tag#4 CDB: Read(16) 88 00 00 00 00 00 0b b1 e4 00 00 00 04 00 00 00
[  670.808685] xhci_hcd 0000:01:00.0: Cancel URB 00000000fa6ae129, dev 1.1.2, ep 0x85, stream_id 5 starting at offset 0x1246e80
[  670.816805] sd 16:0:0:0: [sdc] tag#3 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN 
[  670.816814] sd 16:0:0:0: [sdc] tag#3 CDB: Read(16) 88 00 00 00 00 00 0b b1 e0 00 00 00 04 00 00 00
[  670.816820] xhci_hcd 0000:01:00.0: Cancel URB 000000004f2648ab, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x1245660
[  670.824382] sd 16:0:0:0: [sdc] tag#2 uas_eh_abort_handler 0 uas-tag 1 inflight: IN 
[  670.824391] sd 16:0:0:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 00 0b b1 d0 00 00 00 04 00 00 00
[  670.824396] xhci_hcd 0000:01:00.0: Cancel URB 00000000dacec9a6, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123e0b0
[  670.839350] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 10 ep 10 with no TDs queued?
[  670.863954] scsi host16: uas_eh_device_reset_handler start
[  670.863965] xhci_hcd 0000:01:00.0: Cancel URB 00000000b5479938, dev 1.1.2, ep 0x87, stream_id 6 starting at offset 0x1238140
[  670.864475] xhci_hcd 0000:01:00.0: Cancel URB 0000000065be6261, dev 1.1.2, ep 0x87, stream_id 5 starting at offset 0x1236250
[  670.865057] xhci_hcd 0000:01:00.0: Cancel URB 00000000f292ee34, dev 1.1.2, ep 0x87, stream_id 4 starting at offset 0x12347a0
[  670.960178] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
[  671.007562] scsi host16: uas_eh_device_reset_handler success
[ 1208.115301] xhci_hcd 0000:01:00.0: Cancel URB 00000000b06303bd, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd99694a0
[ 1208.280245] xhci_hcd 0000:01:00.0: Cancel URB 0000000098d971a6, dev 1.1.4, ep 0x85, stream_id 5 starting at offset 0x12c5370
[ 1208.319764] xhci_hcd 0000:01:00.0: Cancel URB 00000000254728ca, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x124a1e0
[ 1808.217512] xhci_hcd 0000:01:00.0: Cancel URB 00000000e5c0481e, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd996a260
[ 1808.465583] xhci_hcd 0000:01:00.0: Cancel URB 0000000038d97298, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12bc070
[ 1808.513103] xhci_hcd 0000:01:00.0: Cancel URB 0000000077091b68, dev 1.1.2, ep 0x85, stream_id 3 starting at offset 0x1249550
[ 2408.314570] xhci_hcd 0000:01:00.0: Cancel URB 0000000013cfeb9e, dev 1.1.4, ep 0x85, stream_id 2 starting at offset 0x12be890
[ 2408.353399] xhci_hcd 0000:01:00.0: Cancel URB 000000000add27c3, dev 1.1.2, ep 0x85, stream_id 3 starting at offset 0x1248eb0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-05 15:01                               ` Marc SCHAEFER
@ 2024-09-05 15:06                                 ` Marc SCHAEFER
  2024-09-05 17:24                                   ` Marc SCHAEFER
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-05 15:06 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Micha?? Pecio, linux-usb

On Thu, Sep 05, 2024 at 05:01:07PM +0200, Marc SCHAEFER wrote:
> [ 2408.314570] xhci_hcd 0000:01:00.0: Cancel URB 0000000013cfeb9e, dev 1.1.4, ep 0x85, stream_id 2 starting at offset 0x12be890
> [ 2408.353399] xhci_hcd 0000:01:00.0: Cancel URB 000000000add27c3, dev 1.1.2, ep 0x85, stream_id 3 starting at offset 0x1248eb0

[ 3008.111367] xhci_hcd 0000:01:00.0: Cancel URB 000000006fcf715a, dev 1.1.1, ep 0x85, stream_id 2 starting at offset 0xd996ba10
[ 3008.322520] xhci_hcd 0000:01:00.0: Cancel URB 000000005b67abcf, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x124d6e0
[ 3008.322786] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
[ 3008.322791] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 000000000124d6f3 stream 1 slot 10 ep 11 flags C
[ 3008.326893] xhci_hcd 0000:01:00.0: Cancel URB 00000000ac6e4a01, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x124d6f0
[ 3008.415617] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
[ 3008.415633] xhci_hcd 0000:01:00.0: Looking for event-dma 000000000124d6e0 trb-start 000000000124d700 trb-end 000000000124def0 seg-start 000000000124d000 seg-end 000000000124dff0
[ 3008.415640] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
[ 3008.439200] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 4
[ 3008.439221] xhci_hcd 0000:01:00.0: Looking for event-dma 000000000124d6f0 trb-start 000000000124d700 trb-end 000000000124def0 seg-start 000000000124d000 seg-end 000000000124dff0
[ 3008.439225] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
[ 3027.526318] xhci_hcd 0000:01:00.0: Cancel URB 00000000bc2575b3, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x124d700
[ 3027.526352] xhci_hcd 0000:01:00.0: Cancel URB 00000000ca7402e9, dev 1.1.2, ep 0x85, stream_id 3 starting at offset 0x12490d0
[ 3027.557504] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 10 ep 10 with no TDs queued?
[ 3040.380154] sd 16:0:0:0: [sdc] tag#5 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN 
[ 3040.380166] sd 16:0:0:0: [sdc] tag#5 CDB: Read(16) 88 00 00 00 00 00 3b 13 98 00 00 00 04 00 00 00
[ 3040.380175] xhci_hcd 0000:01:00.0: Cancel URB 00000000a59c9666, dev 1.1.2, ep 0x85, stream_id 6 starting at offset 0x1242320
[ 3040.380784] sd 16:0:0:0: [sdc] tag#3 uas_eh_abort_handler 0 uas-tag 2 inflight: IN 
[ 3040.380793] sd 16:0:0:0: [sdc] tag#3 CDB: Read(16) 88 00 00 00 00 00 3b 13 6c 00 00 00 04 00 00 00
[ 3040.380799] xhci_hcd 0000:01:00.0: Cancel URB 00000000cb16251e, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x124a2d0
[ 3040.388880] sd 16:0:0:0: [sdc] tag#1 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN 
[ 3040.388889] sd 16:0:0:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 00 3b 13 94 00 00 00 04 00 00 00
[ 3040.388895] xhci_hcd 0000:01:00.0: Cancel URB 000000002dd4d275, dev 1.1.2, ep 0x85, stream_id 5 starting at offset 0x1245b90
[ 3040.396532] sd 16:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN 
[ 3040.396541] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 3b 13 90 00 00 00 04 00 00 00
[ 3040.396547] xhci_hcd 0000:01:00.0: Cancel URB 00000000262be495, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x12479c0
[ 3040.411468] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 10 ep 10 with no TDs queued?
[ 3040.440138] scsi host16: uas_eh_device_reset_handler start
[ 3040.440148] xhci_hcd 0000:01:00.0: Cancel URB 00000000c9d6ca1e, dev 1.1.2, ep 0x87, stream_id 6 starting at offset 0x1254470
[ 3040.440567] xhci_hcd 0000:01:00.0: Cancel URB 00000000fc41ecc1, dev 1.1.2, ep 0x87, stream_id 5 starting at offset 0x1256010
[ 3040.441159] xhci_hcd 0000:01:00.0: Cancel URB 000000006fcf715a, dev 1.1.2, ep 0x87, stream_id 4 starting at offset 0x1257280
[ 3040.780444] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
[ 3040.828021] scsi host16: uas_eh_device_reset_handler success

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-05 15:06                                 ` Marc SCHAEFER
@ 2024-09-05 17:24                                   ` Marc SCHAEFER
  2024-09-05 18:20                                     ` Marc SCHAEFER
  2024-09-09 15:24                                     ` Mathias Nyman
  0 siblings, 2 replies; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-05 17:24 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Micha?? Pecio, linux-usb

On Thu, Sep 05, 2024 at 05:06:15PM +0200, Marc SCHAEFER wrote:
> [ 3040.780444] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
> [ 3040.828021] scsi host16: uas_eh_device_reset_handler success

[ 3608.147001] xhci_hcd 0000:01:00.0: Cancel URB 00000000cfca4963, dev 1.1.1, ep 0x85, stream_id 5 starting at offset 0xd99656b0
[ 3608.347015] xhci_hcd 0000:01:00.0: Cancel URB 000000008c57d9fc, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12bc300
[ 3608.388293] xhci_hcd 0000:01:00.0: Cancel URB 00000000960695fa, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x1245780
[ 3608.405579] xhci_hcd 0000:01:00.0: Cancel URB 0000000040df0558, dev 1.1.2, ep 0x85, stream_id 5 starting at offset 0x1246b20
[ 4208.131724] xhci_hcd 0000:01:00.0: Cancel URB 00000000a37db473, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd99696c0
[ 4208.175736] xhci_hcd 0000:01:00.0: Cancel URB 00000000bc2575b3, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12bd960
[ 4208.187844] xhci_hcd 0000:01:00.0: Cancel URB 000000006450b3ee, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123e010
[ 4808.165397] xhci_hcd 0000:01:00.0: Cancel URB 00000000bcd65caa, dev 1.1.1, ep 0x85, stream_id 6 starting at offset 0xd9964960
[ 4808.439706] xhci_hcd 0000:01:00.0: Cancel URB 000000000d48b0aa, dev 1.1.4, ep 0x85, stream_id 6 starting at offset 0x12c7ef0
[ 4808.439981] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
[ 4808.439984] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 00000000012c7f02 stream 6 slot 12 ep 11 flags c
[ 4808.450328] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
[ 4808.450339] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c7f00 trb-end 0000000000000000 seg-start 00000000012c7000 seg-end 00000000012c7ff0
[ 4808.450342] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c6000 trb-end 00000000012c6700 seg-start 00000000012c6000 seg-end 00000000012c6ff0
[ 4808.450345] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
[ 4815.511438] sd 18:0:0:0: [sde] tag#2 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD 
[ 4815.511448] sd 18:0:0:0: [sde] tag#2 CDB: ATA command pass through(12)/Blank a1 06 20 da 00 00 4f c2 00 b0 00 00
[ 4827.561107] xhci_hcd 0000:01:00.0: Cancel URB 00000000f889834b, dev 1.1.4, ep 0x85, stream_id 6 starting at offset 0x12c7f00
[ 4827.561149] usb 3-1.1.4: stat urb: no pending cmd for uas-tag 2
[ 4838.551833] sd 18:0:0:0: [sde] tag#5 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN 
[ 4838.551844] sd 18:0:0:0: [sde] tag#5 CDB: Read(16) 88 00 00 00 00 00 5e 0e 68 00 00 00 04 00 00 00
[ 4838.551851] xhci_hcd 0000:01:00.0: Cancel URB 0000000060654a2b, dev 1.1.4, ep 0x85, stream_id 4 starting at offset 0x12c27d0
[ 4838.552637] sd 18:0:0:0: [sde] tag#3 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN 
[ 4838.552646] sd 18:0:0:0: [sde] tag#3 CDB: Read(16) 88 00 00 00 00 00 5e 0e 64 00 00 00 04 00 00 00
[ 4838.552653] xhci_hcd 0000:01:00.0: Cancel URB 00000000c9d6ca1e, dev 1.1.4, ep 0x85, stream_id 3 starting at offset 0x12c16c0
[ 4838.553490] sd 18:0:0:0: [sde] tag#1 uas_eh_abort_handler 0 uas-tag 1 inflight: IN 
[ 4838.553499] sd 18:0:0:0: [sde] tag#1 CDB: Read(16) 88 00 00 00 00 00 5e 0e 4c 00 00 00 04 00 00 00
[ 4838.553505] xhci_hcd 0000:01:00.0: Cancel URB 00000000606f5552, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12bc220
[ 4838.554297] sd 18:0:0:0: [sde] tag#0 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN 
[ 4838.554306] sd 18:0:0:0: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 00 5e 0e 6c 00 00 00 04 00 00 00
[ 4838.554311] xhci_hcd 0000:01:00.0: Cancel URB 0000000073658d32, dev 1.1.4, ep 0x85, stream_id 5 starting at offset 0x12c5b70
[ 4838.571830] scsi host18: uas_eh_device_reset_handler start
[ 4838.571839] xhci_hcd 0000:01:00.0: Cancel URB 00000000a59c9666, dev 1.1.4, ep 0x87, stream_id 5 starting at offset 0x12b3730
[ 4838.572354] xhci_hcd 0000:01:00.0: Cancel URB 000000008f565be3, dev 1.1.4, ep 0x87, stream_id 4 starting at offset 0x12b21b0
[ 4838.572852] xhci_hcd 0000:01:00.0: Cancel URB 0000000030251b45, dev 1.1.4, ep 0x87, stream_id 3 starting at offset 0x12b07c0
[ 4838.911904] usb 3-1.1.4: reset SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
[ 4838.959751] scsi host18: uas_eh_device_reset_handler success
[ 4838.961714] xhci_hcd 0000:01:00.0: Cancel URB 0000000070aecd12, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123e200
[ 5408.178027] xhci_hcd 0000:01:00.0: Cancel URB 000000005133b009, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12ca0a0
[ 5408.197159] xhci_hcd 0000:01:00.0: Cancel URB 0000000042d212a9, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x12446f0
[ 6008.377030] xhci_hcd 0000:01:00.0: Cancel URB 00000000d926a669, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cba60
[ 6008.383668] xhci_hcd 0000:01:00.0: Cancel URB 00000000d926a669, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123f4f0
[ 6608.120797] xhci_hcd 0000:01:00.0: Cancel URB 00000000420fbfe4, dev 1.1.1, ep 0x85, stream_id 1 starting at offset 0xd996e6f0
[ 6608.346528] xhci_hcd 0000:01:00.0: Cancel URB 00000000b5001cab, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x1240990
[ 7208.423485] xhci_hcd 0000:01:00.0: Cancel URB 0000000019740e95, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cb460
[ 7808.207628] xhci_hcd 0000:01:00.0: Cancel URB 0000000003573fb8, dev 1.1.1, ep 0x85, stream_id 2 starting at offset 0xd996b960
[ 7808.225093] xhci_hcd 0000:01:00.0: Cancel URB 00000000bbf3ec9a, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd996a2e0
[ 7808.400889] xhci_hcd 0000:01:00.0: Cancel URB 0000000061d24d64, dev 1.1.4, ep 0x85, stream_id 2 starting at offset 0x12c8730
[ 8408.150859] xhci_hcd 0000:01:00.0: Cancel URB 00000000bbf3ec9a, dev 1.1.1, ep 0x85, stream_id 1 starting at offset 0xd996e550
[ 8408.261059] xhci_hcd 0000:01:00.0: Cancel URB 0000000011ae71fe, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cb470
[ 8408.265920] xhci_hcd 0000:01:00.0: Cancel URB 0000000011ae71fe, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cb480
[ 8408.375649] xhci_hcd 0000:01:00.0: Cancel URB 000000001205207d, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x1241630
[ 9008.424016] xhci_hcd 0000:01:00.0: Cancel URB 0000000071f5017f, dev 1.1.4, ep 0x85, stream_id 5 starting at offset 0x12c27b0
[ 9008.482300] xhci_hcd 0000:01:00.0: Cancel URB 0000000052e28d9e, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123f3e0
[ 9608.216059] xhci_hcd 0000:01:00.0: Cancel URB 0000000030c9f1cb, dev 1.1.1, ep 0x85, stream_id 1 starting at offset 0xd996e440
[ 9608.266058] xhci_hcd 0000:01:00.0: Cancel URB 00000000d926a669, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12ca1c0
[ 9608.442165] xhci_hcd 0000:01:00.0: Cancel URB 00000000cfde56f3, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123ed90
[ 9608.460643] xhci_hcd 0000:01:00.0: Cancel URB 000000007ec69eb5, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x1240520
[10208.257206] xhci_hcd 0000:01:00.0: Cancel URB 00000000757012df, dev 1.1.1, ep 0x85, stream_id 5 starting at offset 0xd9965870
[10208.279752] xhci_hcd 0000:01:00.0: Cancel URB 000000004ce38a09, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd996a830
[10208.454859] xhci_hcd 0000:01:00.0: Cancel URB 00000000911bf602, dev 1.1.4, ep 0x85, stream_id 4 starting at offset 0x12c45d0
[10208.457464] xhci_hcd 0000:01:00.0: Cancel URB 00000000b8b0856c, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123fdb0
[10808.247403] xhci_hcd 0000:01:00.0: Cancel URB 00000000ac6e4a01, dev 1.1.1, ep 0x85, stream_id 5 starting at offset 0xd9966eb0
[10808.266162] xhci_hcd 0000:01:00.0: Cancel URB 000000005b67abcf, dev 1.1.1, ep 0x85, stream_id 1 starting at offset 0xd996e100
[10808.463915] xhci_hcd 0000:01:00.0: Cancel URB 00000000d1d4ead3, dev 1.1.4, ep 0x85, stream_id 3 starting at offset 0x12c7cb0
[11408.271740] xhci_hcd 0000:01:00.0: Cancel URB 00000000958cc608, dev 1.1.1, ep 0x85, stream_id 5 starting at offset 0xd9966e00
[11408.291484] xhci_hcd 0000:01:00.0: Cancel URB 00000000d4318848, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd996acf0
[11408.487625] xhci_hcd 0000:01:00.0: Cancel URB 000000005870ecd2, dev 1.1.4, ep 0x85, stream_id 3 starting at offset 0x12c7cd0
[11408.493209] xhci_hcd 0000:01:00.0: Cancel URB 00000000f35ec432, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cb970
[11408.644558] xhci_hcd 0000:01:00.0: Cancel URB 00000000a8590814, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x1244520
[11408.644806] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
[11408.644812] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 0000000001244532 stream 4 slot 10 ep 11 flags c
[11408.752645] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
[11408.752658] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001244520 trb-start 0000000001244530 trb-end 0000000001244d20 seg-start 0000000001244000 seg-end 0000000001244ff0
[11408.752663] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
[11427.863376] xhci_hcd 0000:01:00.0: Cancel URB 00000000653d8d4e, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x1244530
[11427.863417] xhci_hcd 0000:01:00.0: Cancel URB 000000007e4211ac, dev 1.1.2, ep 0x85, stream_id 6 starting at offset 0x1249630
[11427.863517] xhci_hcd 0000:01:00.0: Move dq deferred: stream 6 URB 000000007e4211ac
[11427.887079] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 10 ep 10 with no TDs queued?
[11439.358616] sd 16:0:0:0: [sdc] tag#5 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN 
[11439.358626] sd 16:0:0:0: [sdc] tag#5 CDB: Read(16) 88 00 00 00 00 00 d7 14 0c 00 00 00 04 00 00 00
[11439.358634] xhci_hcd 0000:01:00.0: Cancel URB 00000000467cea42, dev 1.1.2, ep 0x85, stream_id 3 starting at offset 0x12437c0
[11439.359230] sd 16:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 5 inflight: IN 
[11439.359239] sd 16:0:0:0: [sdc] tag#4 CDB: Read(16) 88 00 00 00 00 00 d7 13 dc 00 00 00 04 00 00 00
[11439.359245] xhci_hcd 0000:01:00.0: Cancel URB 000000005f878ee4, dev 1.1.2, ep 0x85, stream_id 5 starting at offset 0x1246ef0
[11439.367317] sd 16:0:0:0: [sdc] tag#3 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN 
[11439.367326] sd 16:0:0:0: [sdc] tag#3 CDB: Read(16) 88 00 00 00 00 00 d7 14 08 00 00 00 04 00 00 00
[11439.367332] xhci_hcd 0000:01:00.0: Cancel URB 00000000ea5db08c, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x1241370
[11439.374888] sd 16:0:0:0: [sdc] tag#1 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
[11439.374896] sd 16:0:0:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 00 d7 14 10 00 00 00 04 00 00 00
[11439.374901] xhci_hcd 0000:01:00.0: Cancel URB 00000000c4ea53a0, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x123f770
[11439.389862] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 10 ep 10 with no TDs queued?
[11439.414612] scsi host16: uas_eh_device_reset_handler start
[11439.414624] xhci_hcd 0000:01:00.0: Cancel URB 0000000076469c9b, dev 1.1.2, ep 0x87, stream_id 1 starting at offset 0x125e670
[11439.415117] xhci_hcd 0000:01:00.0: Cancel URB 000000005737d298, dev 1.1.2, ep 0x87, stream_id 3 starting at offset 0x1231a00
[11439.415660] xhci_hcd 0000:01:00.0: Cancel URB 0000000016ac95a1, dev 1.1.2, ep 0x87, stream_id 2 starting at offset 0x125ba20
[11439.506778] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
[11439.554356] scsi host16: uas_eh_device_reset_handler success

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-05 17:24                                   ` Marc SCHAEFER
@ 2024-09-05 18:20                                     ` Marc SCHAEFER
  2024-09-09 15:24                                     ` Mathias Nyman
  1 sibling, 0 replies; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-05 18:20 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Micha?? Pecio, linux-usb

On Thu, Sep 05, 2024 at 07:24:09PM +0200, Marc SCHAEFER wrote:
> [11439.415660] xhci_hcd 0000:01:00.0: Cancel URB 0000000016ac95a1, dev 1.1.2, ep 0x87, stream_id 2 starting at offset 0x125ba20
> [11439.506778] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
> [11439.554356] scsi host16: uas_eh_device_reset_handler success

will be the last for today, I need to switch that system off; tell me if you need more testing in the next days,
or if you want me to try another patch.

[12008.284174] xhci_hcd 0000:01:00.0: Cancel URB 0000000090b5176d, dev 1.1.1, ep 0x85, stream_id 4 starting at offset 0xd9968570
[12008.304987] xhci_hcd 0000:01:00.0: Cancel URB 000000002e91894e, dev 1.1.1, ep 0x85, stream_id 1 starting at offset 0xd996d3b0
[12008.455398] xhci_hcd 0000:01:00.0: Cancel URB 000000005dc3cf7a, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cbbd0
[12008.655053] xhci_hcd 0000:01:00.0: Cancel URB 00000000bf6df69e, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x12462c0
[12608.587299] xhci_hcd 0000:01:00.0: Cancel URB 000000000bba8481, dev 1.1.4, ep 0x85, stream_id 2 starting at offset 0x12c8160
[12608.710654] xhci_hcd 0000:01:00.0: Cancel URB 0000000081721183, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x124b320
[12608.769588] xhci_hcd 0000:01:00.0: Cancel URB 000000003b126d1a, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x124d3f0
[13208.378251] xhci_hcd 0000:01:00.0: Cancel URB 00000000660f02e0, dev 1.1.1, ep 0x85, stream_id 1 starting at offset 0xd996eb90
[13208.510815] xhci_hcd 0000:01:00.0: Cancel URB 0000000041b8cbc4, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12caf70
[13208.520577] xhci_hcd 0000:01:00.0: Cancel URB 00000000631d6a7f, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x124dd80
[13808.312954] xhci_hcd 0000:01:00.0: Cancel URB 0000000002d4a85e, dev 1.1.1, ep 0x85, stream_id 5 starting at offset 0xd9966f90
[13808.338110] xhci_hcd 0000:01:00.0: Cancel URB 00000000ac0cd10f, dev 1.1.1, ep 0x85, stream_id 3 starting at offset 0xd99693f0
[13808.491390] xhci_hcd 0000:01:00.0: Cancel URB 0000000004e1b961, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cabd0
[13808.494873] xhci_hcd 0000:01:00.0: Cancel URB 00000000bb1acc45, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cabe0
[13808.575935] xhci_hcd 0000:01:00.0: Cancel URB 0000000030c9f1cb, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x124c1f0
[14408.348038] xhci_hcd 0000:01:00.0: Cancel URB 00000000660f02e0, dev 1.1.1, ep 0x85, stream_id 6 starting at offset 0xd9963ee0
[14408.405056] xhci_hcd 0000:01:00.0: Cancel URB 00000000fa1abcfb, dev 1.1.4, ep 0x85, stream_id 1 starting at offset 0x12cafd0
[14408.536603] xhci_hcd 0000:01:00.0: Cancel URB 00000000cba223fa, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x1246a70
[14408.557641] xhci_hcd 0000:01:00.0: Cancel URB 00000000e7e45b0b, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x124db30


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-05 17:24                                   ` Marc SCHAEFER
  2024-09-05 18:20                                     ` Marc SCHAEFER
@ 2024-09-09 15:24                                     ` Mathias Nyman
  2024-09-09 16:21                                       ` Marc SCHAEFER
  1 sibling, 1 reply; 25+ messages in thread
From: Mathias Nyman @ 2024-09-09 15:24 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: Micha?? Pecio, linux-usb

On 5.9.2024 20.24, Marc SCHAEFER wrote:
> On Thu, Sep 05, 2024 at 05:06:15PM +0200, Marc SCHAEFER wrote:
>> [ 3040.780444] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
>> [ 3040.828021] scsi host16: uas_eh_device_reset_handler success
> 
> [ 4808.165397] xhci_hcd 0000:01:00.0: Cancel URB 00000000bcd65caa, dev 1.1.1, ep 0x85, stream_id 6 starting at offset 0xd9964960
> [ 4808.439706] xhci_hcd 0000:01:00.0: Cancel URB 000000000d48b0aa, dev 1.1.4, ep 0x85, stream_id 6 starting at offset 0x12c7ef0
> [ 4808.439981] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
> [ 4808.439984] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 00000000012c7f02 stream 6 slot 12 ep 11 flags c
> [ 4808.450328] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
> [ 4808.450339] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c7f00 trb-end 0000000000000000 seg-start 00000000012c7000 seg-end 00000000012c7ff0
> [ 4808.450342] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c6000 trb-end 00000000012c6700 seg-start 00000000012c6000 seg-end 00000000012c6ff0
> [ 4808.450345] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep

Thanks.

So the failed 'Set TR Deq' command is the start of the issue.

xHC host fails to move the past that cancelled transfer, but xhci driver gives
back the URB anyway. Once ring continues running it starts from that cancelled
transfer, triggering a babble error, pointing to that given back TD.

Still unclear why that 'Set TR Deq' command failed in the first place.

The 'Set TR Deq' command TRB looks fine. Stream ID is ok, (6) and Stream Context
type (SCT) which is bits 3:1 of dequeue 00000000012c7f02 is looks fine (=1, which
means  primary stream array with dequeue pointing to transfer ring)
  
Maybe stream is dropped, or something else is messing with endpoint context
MaxPSstream, or Linear Stream Address (LSA) fields before the 'Set TD Deq' command.
That would trigger stream ID issues like this.

Can I ask you to add some select traces on top of the patch?
They should show content of endpoint context:

as root:

mount -t debugfs none /sys/kernel/debug/tracing
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_command/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_cmd_set_deq/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_cmd_set_deq_ep/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_handle_cmd_config_ep/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_configure_endpoint/enable
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/xhci_configure_endpoint_ctrl_ctx/enable
echo 1 >  /sys/kernel/debug/tracing/tracing_on
<reproduce issue>
Send content of /sys/kernel/debug/tracing/trace

Sample output:

<idle>-0       [013] d.h2.  1606.344913: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000116f65921 stream 0 slot 4 ep 1 flags C
<idle>-0       [013] d.h2.  1606.344915: xhci_handle_cmd_set_deq: RS 00000 full-speed Ctx Entries 9 MEL 0 us Port# 6/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 4 State configured
<idle>-0       [013] d.h2.  1606.344917: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 0 interval 125 us max ESIT payload 0 CErr 3 Type Ctrl burst 0 maxp 64 deq 0000000116f65921 avg trb len 0
<idle>-0       [013] d.h2.  1606.376754: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 4 ep 1 flags t:C

And then send me both the trace and dmesg of triggered case.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-09 15:24                                     ` Mathias Nyman
@ 2024-09-09 16:21                                       ` Marc SCHAEFER
  2024-09-11 14:25                                         ` Mathias Nyman
  0 siblings, 1 reply; 25+ messages in thread
From: Marc SCHAEFER @ 2024-09-09 16:21 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Micha?? Pecio, linux-usb

Hello,

On Mon, Sep 09, 2024 at 06:24:23PM +0300, Mathias Nyman wrote:
> > [ 4808.165397] xhci_hcd 0000:01:00.0: Cancel URB 00000000bcd65caa, dev 1.1.1, ep 0x85, stream_id 6 starting at offset 0xd9964960
> > [ 4808.439706] xhci_hcd 0000:01:00.0: Cancel URB 000000000d48b0aa, dev 1.1.4, ep 0x85, stream_id 6 starting at offset 0x12c7ef0
> > [ 4808.439981] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
> > [ 4808.439984] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 00000000012c7f02 stream 6 slot 12 ep 11 flags c
> > [ 4808.450328] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
> > [ 4808.450339] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c7f00 trb-end 0000000000000000 seg-start 00000000012c7000 seg-end 00000000012c7ff0
> > [ 4808.450342] xhci_hcd 0000:01:00.0: Looking for event-dma 00000000012c7ef0 trb-start 00000000012c6000 trb-end 00000000012c6700 seg-start 00000000012c6000 seg-end 00000000012c6ff0
> > [ 4808.450345] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
> 
> Thanks.
> 
> So the failed 'Set TR Deq' command is the start of the issue.
> 
> xHC host fails to move the past that cancelled transfer, but xhci driver gives
> back the URB anyway. Once ring continues running it starts from that cancelled
> transfer, triggering a babble error, pointing to that given back TD.
> 
> Still unclear why that 'Set TR Deq' command failed in the first place.
> 
> The 'Set TR Deq' command TRB looks fine. Stream ID is ok, (6) and Stream Context
> type (SCT) which is bits 3:1 of dequeue 00000000012c7f02 is looks fine (=1, which
> means  primary stream array with dequeue pointing to transfer ring)
> Maybe stream is dropped, or something else is messing with endpoint context
> MaxPSstream, or Linear Stream Address (LSA) fields before the 'Set TD Deq' command.
> That would trigger stream ID issues like this.
> 
> Can I ask you to add some select traces on top of the patch?
> They should show content of endpoint context:
> 
> as root:

I did that, however, even when mounting to /sys/kernel/debug/tracing, most of
the pseudo-files were actually at /sys/kernel/tracing and I had to fix the
path of the commands.

> Send content of /sys/kernel/debug/tracing/trace

root@video:~# more /tmp/a1
# tracer: nop
#
# entries-in-buffer/entries-written: 49/49   #P:8
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
          <idle>-0       [005] d.h2.  1209.080113: xhci_handle_command: CMD: Stop Ring Command: slot 7 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.080300: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 00000000d99f6c12 stream 3 slot 7 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.080302: xhci_handle_cmd_set_deq: RS 00011 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 7 State configured
          <idle>-0       [005] d.h2.  1209.080303: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933100 avg trb len 0
          <idle>-0       [005] d.h2.  1209.210387: xhci_handle_command: CMD: Stop Ring Command: slot 12 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.210558: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001296853 stream 4 slot 12 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.210560: xhci_handle_cmd_set_deq: RS 00041 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 12 State configured
          <idle>-0       [005] d.h2.  1209.210561: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933a00 avg trb len 0
          <idle>-0       [005] d.h2.  1209.222205: xhci_handle_command: CMD: Stop Ring Command: slot 11 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.222374: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001251d63 stream 1 slot 11 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.222375: xhci_handle_cmd_set_deq: RS 00031 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 11 State configured
          <idle>-0       [005] d.h2.  1209.222376: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933700 avg trb len 0
          <idle>-0       [005] d.H2.  1209.229176: xhci_handle_command: CMD: Stop Ring Command: slot 11 sp 0 ep 11 flags C
          <idle>-0       [005] d.H2.  1209.229344: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001251d73 stream 1 slot 11 ep 11 flags C
          <idle>-0       [005] d.H2.  1209.229345: xhci_handle_cmd_set_deq: RS 00031 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 11 State configured
          <idle>-0       [005] d.H2.  1209.229346: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933700 avg trb len 0
          <idle>-0       [005] d.h2.  1209.365774: xhci_handle_command: CMD: Stop Ring Command: slot 9 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.365941: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001218983 stream 4 slot 9 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.365943: xhci_handle_cmd_set_deq: RS 00021 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 9 State configured
          <idle>-0       [005] d.h2.  1209.365943: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933400 avg trb len 0
          <idle>-0       [005] d.h2.  1209.393909: xhci_handle_command: CMD: Stop Ring Command: slot 9 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.394076: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 000000000121a1e2 stream 5 slot 9 ep 11 flags C
          <idle>-0       [005] d.h2.  1209.394077: xhci_handle_cmd_set_deq: RS 00021 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 9 State configured
          <idle>-0       [005] d.h2.  1209.394077: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933400 avg trb len 0
          <idle>-0       [005] d.h2.  1809.115095: xhci_handle_command: CMD: Stop Ring Command: slot 7 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.115264: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 00000000d99f2bc2 stream 1 slot 7 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.115265: xhci_handle_cmd_set_deq: RS 00011 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 7 State configured
          <idle>-0       [005] d.h2.  1809.115266: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933100 avg trb len 0
          <idle>-0       [005] d.h2.  1809.128672: xhci_handle_command: CMD: Stop Ring Command: slot 7 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.128843: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 00000000d99f2bd2 stream 1 slot 7 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.128845: xhci_handle_cmd_set_deq: RS 00011 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 7 State configured
          <idle>-0       [005] d.h2.  1809.128845: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933100 avg trb len 0
          <idle>-0       [005] d.h2.  1809.303711: xhci_handle_command: CMD: Stop Ring Command: slot 12 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.303917: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001296782 stream 4 slot 12 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.303919: xhci_handle_cmd_set_deq: RS 00041 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 12 State configured
          <idle>-0       [005] d.h2.  1809.303920: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933a00 avg trb len 0
          <idle>-0       [005] d.h2.  1809.375752: xhci_handle_command: CMD: Stop Ring Command: slot 11 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.375937: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 000000000125b3a3 stream 6 slot 11 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.375939: xhci_handle_cmd_set_deq: RS 00031 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 11 State configured
          <idle>-0       [005] d.h2.  1809.375940: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933700 avg trb len 0
          <idle>-0       [005] d.H2.  1809.497074: xhci_handle_command: CMD: Stop Ring Command: slot 9 sp 0 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.497243: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001217d62 stream 4 slot 9 ep 11 flags C
          <idle>-0       [005] d.h2.  1809.497245: xhci_handle_cmd_set_deq: RS 00021 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 9 State configured
          <idle>-0       [005] d.h2.  1809.497245: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933400 avg trb len 0
          <idle>-0       [005] d.h2.  1809.524165: xhci_handle_command: CMD: Stop Ring Command: slot 9 sp 0 ep 11 flags C
          <idle>-0       [005] dNH2.  1809.524235: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001219f22 stream 5 slot 9 ep 11 flags C
          <idle>-0       [005] dNH2.  1809.524236: xhci_handle_cmd_set_deq: RS 00021 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 9 State configured
          <idle>-0       [005] dNH2.  1809.524237: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933400 avg trb len 0
    kworker/5:1H-150     [005] d.h2.  1809.533666: xhci_handle_command: CMD: Reset Endpoint Command: ctx 0000000000000000 slot 9 ep 11 flags t:C
root@video:~# 

> And then send me both the trace and dmesg of triggered case.

[ 1809.518429] xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd invalid because of stream ID configuration
[ 1809.518434] xhci_hcd 0000:01:00.0: MN: Set TR Dequeue Pointer Command: deq 0000000001219f22 stream 5 slot 9 ep 11 flags C
[ 1809.527627] xhci_hcd 0000:01:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 10 comp_code 3
[ 1809.527637] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001219f10 trb-start 0000000001219f20 trb-end 0000000000000000 seg-start 0000000001219000 seg-end 0000000001219ff0
[ 1809.527641] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000001219f10 trb-start 000000000121a000 trb-end 000000000121a720 seg-start 000000000121a000 seg-end 000000000121aff0
[ 1809.527644] xhci_hcd 0000:01:00.0: MN: No TD found, fix halted ep
[ 1816.575134] sd 16:0:0:0: [sdc] tag#5 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD 
[ 1816.575151] sd 16:0:0:0: [sdc] tag#5 CDB: ATA command pass through(12)/Blank a1 06 20 da 00 00 4f c2 00 b0 00 00
[ 1828.638347] xhci_hcd 0000:01:00.0: Cancel URB 000000008009522e, dev 1.1.2, ep 0x85, stream_id 5 starting at offset 0x1219f20
[ 1828.638390] usb 3-1.1.2: stat urb: no pending cmd for uas-tag 6
[ 1828.731237] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 9 ep 10 with no TDs queued?
[ 1840.227417] sd 16:0:0:0: [sdc] tag#4 uas_eh_abort_handler 0 uas-tag 2 inflight: IN 
[ 1840.227426] sd 16:0:0:0: [sdc] tag#4 CDB: Read(16) 88 00 00 00 00 00 15 59 78 00 00 00 04 00 00 00
[ 1840.227433] xhci_hcd 0000:01:00.0: Cancel URB 0000000088ada5bf, dev 1.1.2, ep 0x85, stream_id 2 starting at offset 0x12134d0
[ 1840.228047] sd 16:0:0:0: [sdc] tag#2 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN 
[ 1840.228058] sd 16:0:0:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 00 15 59 98 00 00 00 04 00 00 00
[ 1840.228072] xhci_hcd 0000:01:00.0: Cancel URB 00000000498a4d69, dev 1.1.2, ep 0x85, stream_id 4 starting at offset 0x1218d70
[ 1840.236069] sd 16:0:0:0: [sdc] tag#1 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN 
[ 1840.236072] sd 16:0:0:0: [sdc] tag#1 CDB: Read(16) 88 00 00 00 00 00 15 59 90 00 00 00 04 00 00 00
[ 1840.236075] xhci_hcd 0000:01:00.0: Cancel URB 00000000ce78030e, dev 1.1.2, ep 0x85, stream_id 3 starting at offset 0x1216730
[ 1840.243820] sd 16:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN 
[ 1840.243823] sd 16:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 15 59 94 00 00 00 04 00 00 00
[ 1840.243836] xhci_hcd 0000:01:00.0: Cancel URB 000000006af8d826, dev 1.1.2, ep 0x85, stream_id 1 starting at offset 0x1211590
[ 1840.258742] xhci_hcd 0000:01:00.0: WARN Event TRB for slot 9 ep 10 with no TDs queued?
[ 1840.283445] scsi host16: uas_eh_device_reset_handler start
[ 1840.283455] xhci_hcd 0000:01:00.0: Cancel URB 00000000ab683c42, dev 1.1.2, ep 0x87, stream_id 4 starting at offset 0x12064c0
[ 1840.283906] xhci_hcd 0000:01:00.0: Cancel URB 0000000084685914, dev 1.1.2, ep 0x87, stream_id 1 starting at offset 0xd9959d80
[ 1840.284415] xhci_hcd 0000:01:00.0: Cancel URB 0000000045b3d678, dev 1.1.2, ep 0x87, stream_id 3 starting at offset 0x1204ee0
[ 1840.375427] usb 3-1.1.2: reset SuperSpeed Plus Gen 2x1 USB device number 7 using xhci_hcd
[ 1840.423055] scsi host16: uas_eh_device_reset_handler success


Have a nice evening.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-09 16:21                                       ` Marc SCHAEFER
@ 2024-09-11 14:25                                         ` Mathias Nyman
  2024-09-12 15:22                                           ` Mathias Nyman
  0 siblings, 1 reply; 25+ messages in thread
From: Mathias Nyman @ 2024-09-11 14:25 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: Micha?? Pecio, linux-usb

>> Can I ask you to add some select traces on top of the patch?
>> They should show content of endpoint context:
>>
>> as root:
> 
> I did that, however, even when mounting to /sys/kernel/debug/tracing, most of
> the pseudo-files were actually at /sys/kernel/tracing and I had to fix the
> path of the commands.

Sorry about that,

Ftrace control files used to be part of debugfs, located at
/sys/kernel/debug/tracing when mounting debugfs

> 
>            <idle>-0       [005] d.h2.  1809.524165: xhci_handle_command: CMD: Stop Ring Command: slot 9 sp 0 ep 11 flags C
>            <idle>-0       [005] dNH2.  1809.524235: xhci_handle_command: CMD: Set TR Dequeue Pointer Command: deq 0000000001219f22 stream 5 slot 9 ep 11 flags C
>            <idle>-0       [005] dNH2.  1809.524236: xhci_handle_cmd_set_deq: RS 00021 super-speed plus Ctx Entries 15 MEL 0 us Port# 1/0 [TT Slot 0 Port# 0 TTT 0 Intr 0] Addr 9 State configured
>            <idle>-0       [005] dNH2.  1809.524237: xhci_handle_cmd_set_deq_ep: State stopped mult 1 max P. Streams 3 LSA interval 125 us max ESIT payload 0 CErr 3 Type Bulk IN HIDburst 15 maxp 1024 deq 00000000d9933400 avg trb len 0

I can't find anything wrong in the endpoint context after the failed 'Set TR Deq' command.
MaxPStreams is 3, so endpoint supports 16 streams (2^(maxPstreams + 1)), stream ID was 5 is in range.
LSA flag is set so endpoint uses Linear Stream Addressing, which matches the SCT value of 1.

Only reason left why 'Set TR Deq' would fail is if the stream context TR deq pointer
field is zero, see xhci spec 4.12.2.1 'Stream Array Bounds Checking'

xhci driver doesn't yet have tracing for stream contexts, I started writing those but
didn't finish them yet.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Strange issues with UAS URB cancellation
  2024-09-11 14:25                                         ` Mathias Nyman
@ 2024-09-12 15:22                                           ` Mathias Nyman
  0 siblings, 0 replies; 25+ messages in thread
From: Mathias Nyman @ 2024-09-12 15:22 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: Micha?? Pecio, linux-usb

On 11.9.2024 17.25, Mathias Nyman wrote:

> xhci driver doesn't yet have tracing for stream contexts, I started writing those but
> didn't finish them yet.

A set of tracing and debugging patches for this is now available in my
fix_uas_babble_error branch

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git fix_uas_babble_error
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=fix_uas_babble_error

Much appreciated if you have the time to try them out.
This time I think we should enable _all_ xhci traces.
The trace file might grow fast but it would be good to get traces starting
before the uas device is connected.

Updated steps:

mount -t tracefs none /sys/kernel/tracing/
echo 81920 > /sys/kernel/tracing/buffer_size_kb
echo 1 > /sys/kernel/tracing/events/xhci-hcd/enable
echo 1 > /sys/kernel/tracing/tracing_on
<connect uas device>
<trigger issue>
send content of /sys/kernel/tracing/trace and dmesg to me

Thanks
Mathias

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-09-12 15:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-23 19:18 Strange issues with USB device Marc SCHAEFER
2024-08-24  6:44 ` Michał Pecio
2024-08-24  6:51   ` Marc SCHAEFER
2024-08-24  8:44     ` Michał Pecio
2024-08-26  5:17       ` Marc SCHAEFER
2024-09-03  7:48         ` Strange issues with UAS URB cancellation Michał Pecio
2024-09-03 12:55           ` Marc SCHAEFER
2024-09-03 13:22             ` Michał Pecio
2024-09-03 13:50               ` Marc SCHAEFER
2024-09-03 13:52                 ` Marc SCHAEFER
2024-09-03 13:55                   ` Marc SCHAEFER
2024-09-03 15:45                     ` Michał Pecio
2024-09-03 19:40                       ` Marc SCHAEFER
2024-09-04 14:26                         ` Mathias Nyman
2024-09-04 16:36                           ` Marc SCHAEFER
2024-09-05 13:52                             ` Mathias Nyman
2024-09-05 15:01                               ` Marc SCHAEFER
2024-09-05 15:06                                 ` Marc SCHAEFER
2024-09-05 17:24                                   ` Marc SCHAEFER
2024-09-05 18:20                                     ` Marc SCHAEFER
2024-09-09 15:24                                     ` Mathias Nyman
2024-09-09 16:21                                       ` Marc SCHAEFER
2024-09-11 14:25                                         ` Mathias Nyman
2024-09-12 15:22                                           ` Mathias Nyman
2024-08-25 16:32   ` Strange issues with USB device Marc SCHAEFER

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).