linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
@ 2024-08-18 20:56 FPS
  2024-08-19 13:38 ` Mathias Nyman
  0 siblings, 1 reply; 23+ messages in thread
From: FPS @ 2024-08-18 20:56 UTC (permalink / raw)
  To: linux-usb

Hi!

I hope this mail is not off-topic for this list :)

I have recently acquired a mini PC equipped with an N97 CPU for use in
my electric guitar setup (I have good experiences with my previous Intel
Celeron J4125-based mini PC), but sadly I can't get it to work reliably
with ALSA/Jack using a USB audio class 2.0 sound card.

The system is running a decently setup rt-kernel:

Linux ogfx97 6.6.44-rt39 #1-NixOS SMP PREEMPT_RT Sat Aug  3 06:54:42 UTC
2024 x86_64 GNU/Linux

Cyclictest gives sufficiently nice results that give me hope that it
should be possible to use this system for realtime-audio. This is just a
short example run but it's quite representative:

[fps@ogfx97:~]$ sudo cyclictest -m -p 90 -S
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 2.07 1.36 0.74 2/644 9634

T: 0 ( 9615) P:90 I:1000 C:   8715 Min:      1 Act:    1 Avg:    2 Max:
     14
T: 1 ( 9616) P:90 I:1500 C:   5808 Min:      1 Act:    5 Avg:    2 Max:
     28
T: 2 ( 9617) P:90 I:2000 C:   4355 Min:      1 Act:    1 Avg:    2 Max:
     15
T: 3 ( 9618) P:90 I:2500 C:   3483 Min:      1 Act:    1 Avg:    3 Max:
     24

As usual I have elevated the xhci and ahci IRQ kernel threads to
SCHED_FIFO priority 90 and done some other shenanigans like idle=poll,
disabling C-states in the UEFI firmware, etc..

The xhci in this system is:

00:14.0 USB controller: Intel Corporation Alder Lake-N PCH USB 3.2 xHCI
Host Controller (prog-if 30 [XHCI])
         DeviceName: Onboard - Other
         Subsystem: Intel Corporation Device 7270
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
 >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0
         Interrupt: pin A routed to IRQ 124
         Region 0: Memory at 6001100000 (64-bit, non-prefetchable)
[size=64K]
         Capabilities: <access denied>
         Kernel driver in use: xhci_hcd
         Kernel modules: xhci_pci

Sadly there's some built-in junk on the USB besides my keyboard, mouse
and audio interface:

[fps@ogfx97:~]$ lsusb -tv
/:  Bus 001.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/12p, 480M
     ID 1d6b:0002 Linux Foundation 2.0 root hub
     |__ Port 001: Dev 004, If 0, Class=Human Interface Device,
Driver=usbhid, 1.5M
         ID 046d:c31c Logitech, Inc. Keyboard K120
     |__ Port 001: Dev 004, If 1, Class=Human Interface Device,
Driver=usbhid, 1.5M
         ID 046d:c31c Logitech, Inc. Keyboard K120
     |__ Port 002: Dev 005, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
         ID 046d:c07d Logitech, Inc. G502 Mouse
     |__ Port 002: Dev 005, If 1, Class=Human Interface Device,
Driver=usbhid, 12M
         ID 046d:c07d Logitech, Inc. G502 Mouse
     |__ Port 004: Dev 014, If 0, Class=Audio, Driver=snd-usb-audio, 480M
         ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
     |__ Port 004: Dev 014, If 1, Class=Audio, Driver=snd-usb-audio, 480M
         ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
     |__ Port 004: Dev 014, If 2, Class=Audio, Driver=snd-usb-audio, 480M
         ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
     |__ Port 004: Dev 014, If 3, Class=Vendor Specific Class,
Driver=[none], 480M
         ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
     |__ Port 005: Dev 002, If 0, Class=Wireless, Driver=[none], 12M
         ID 0bda:c821 Realtek Semiconductor Corp.
     |__ Port 005: Dev 002, If 1, Class=Wireless, Driver=[none], 12M
         ID 0bda:c821 Realtek Semiconductor Corp.
     |__ Port 006: Dev 003, If 0, Class=Audio, Driver=snd-usb-audio, 12M
         ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
     |__ Port 006: Dev 003, If 1, Class=Audio, Driver=snd-usb-audio, 12M
         ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
     |__ Port 006: Dev 003, If 2, Class=Audio, Driver=snd-usb-audio, 12M
         ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
     |__ Port 006: Dev 003, If 3, Class=Human Interface Device,
Driver=[none], 12M
         ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
/:  Bus 002.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/4p, 10000M
     ID 1d6b:0003 Linux Foundation 3.0 root hub

OK, with that out of the way, what's the problem? Well, with an USB
audio class 2.0 device I would expect a maximum jitter of a microframe
(125 us) regardless of the buffer size and I have used this audio
interface successfully on different xHCI HCDs with buffer sizes as low
as 24 samples at 48000 Hz sampling rate (0.5 ms period duration).

But in this mini PC I cannot run such low period durations/buffer sizes
at all. Sporadically I see delays of up to 3 or 4 ms - which makes it
absolutely unusable for real-time audio stuff. It can't really be NMIs
from what I can tell - they would show up in the cyclictest results. So
I suspect the xHCI is doing something funky.

Here's another observation: When I run the audio interface with 2
periods of 48 frames at 48000 Hz sampling rate I get constant Xruns from
jack of about 0.004-0.008 ms. It seems that the xHCI is not willing to
deliver all microframes in a timely manner but somehow wants a minimum
time interval of 1 ms between them (I read the term interrupt moderation
somewhere on the web - but I don't know if the linux xhci driver does
something like that at all.)

How could I debug this further? All help appreciated :)

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-18 20:56 Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller FPS
@ 2024-08-19 13:38 ` Mathias Nyman
  2024-08-19 20:12   ` FPS
  0 siblings, 1 reply; 23+ messages in thread
From: Mathias Nyman @ 2024-08-19 13:38 UTC (permalink / raw)
  To: FPS, linux-usb

On 18.8.2024 23.56, FPS wrote:
> Hi!
> 
> I hope this mail is not off-topic for this list :)
> 
> I have recently acquired a mini PC equipped with an N97 CPU for use in
> my electric guitar setup (I have good experiences with my previous Intel
> Celeron J4125-based mini PC), but sadly I can't get it to work reliably
> with ALSA/Jack using a USB audio class 2.0 sound card.
> 
> The system is running a decently setup rt-kernel:
> 
> Linux ogfx97 6.6.44-rt39 #1-NixOS SMP PREEMPT_RT Sat Aug  3 06:54:42 UTC
> 2024 x86_64 GNU/Linux
> 
> Cyclictest gives sufficiently nice results that give me hope that it
> should be possible to use this system for realtime-audio. This is just a
> short example run but it's quite representative:
> 
> [fps@ogfx97:~]$ sudo cyclictest -m -p 90 -S
> # /dev/cpu_dma_latency set to 0us
> policy: fifo: loadavg: 2.07 1.36 0.74 2/644 9634
> 
> T: 0 ( 9615) P:90 I:1000 C:   8715 Min:      1 Act:    1 Avg:    2 Max:
>      14
> T: 1 ( 9616) P:90 I:1500 C:   5808 Min:      1 Act:    5 Avg:    2 Max:
>      28
> T: 2 ( 9617) P:90 I:2000 C:   4355 Min:      1 Act:    1 Avg:    2 Max:
>      15
> T: 3 ( 9618) P:90 I:2500 C:   3483 Min:      1 Act:    1 Avg:    3 Max:
>      24
> 
> As usual I have elevated the xhci and ahci IRQ kernel threads to
> SCHED_FIFO priority 90 and done some other shenanigans like idle=poll,
> disabling C-states in the UEFI firmware, etc..
> 
> The xhci in this system is:
> 
> 00:14.0 USB controller: Intel Corporation Alder Lake-N PCH USB 3.2 xHCI
> Host Controller (prog-if 30 [XHCI])
>          DeviceName: Onboard - Other
>          Subsystem: Intel Corporation Device 7270
>          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>          Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>  >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>          Latency: 0
>          Interrupt: pin A routed to IRQ 124
>          Region 0: Memory at 6001100000 (64-bit, non-prefetchable)
> [size=64K]
>          Capabilities: <access denied>
>          Kernel driver in use: xhci_hcd
>          Kernel modules: xhci_pci
> 
> Sadly there's some built-in junk on the USB besides my keyboard, mouse
> and audio interface:
> 
> [fps@ogfx97:~]$ lsusb -tv
> /:  Bus 001.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/12p, 480M
>      ID 1d6b:0002 Linux Foundation 2.0 root hub
>      |__ Port 001: Dev 004, If 0, Class=Human Interface Device,
> Driver=usbhid, 1.5M
>          ID 046d:c31c Logitech, Inc. Keyboard K120
>      |__ Port 001: Dev 004, If 1, Class=Human Interface Device,
> Driver=usbhid, 1.5M
>          ID 046d:c31c Logitech, Inc. Keyboard K120
>      |__ Port 002: Dev 005, If 0, Class=Human Interface Device,
> Driver=usbhid, 12M
>          ID 046d:c07d Logitech, Inc. G502 Mouse
>      |__ Port 002: Dev 005, If 1, Class=Human Interface Device,
> Driver=usbhid, 12M
>          ID 046d:c07d Logitech, Inc. G502 Mouse
>      |__ Port 004: Dev 014, If 0, Class=Audio, Driver=snd-usb-audio, 480M
>          ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
>      |__ Port 004: Dev 014, If 1, Class=Audio, Driver=snd-usb-audio, 480M
>          ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
>      |__ Port 004: Dev 014, If 2, Class=Audio, Driver=snd-usb-audio, 480M
>          ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
>      |__ Port 004: Dev 014, If 3, Class=Vendor Specific Class,
> Driver=[none], 480M
>          ID 1235:8202 Focusrite-Novation Focusrite Scarlett 2i2 2nd Gen
>      |__ Port 005: Dev 002, If 0, Class=Wireless, Driver=[none], 12M
>          ID 0bda:c821 Realtek Semiconductor Corp.
>      |__ Port 005: Dev 002, If 1, Class=Wireless, Driver=[none], 12M
>          ID 0bda:c821 Realtek Semiconductor Corp.
>      |__ Port 006: Dev 003, If 0, Class=Audio, Driver=snd-usb-audio, 12M
>          ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
>      |__ Port 006: Dev 003, If 1, Class=Audio, Driver=snd-usb-audio, 12M
>          ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
>      |__ Port 006: Dev 003, If 2, Class=Audio, Driver=snd-usb-audio, 12M
>          ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
>      |__ Port 006: Dev 003, If 3, Class=Human Interface Device,
> Driver=[none], 12M
>          ID 0573:1573 Zoran Co. Personal Media Division (Nogatech)
> /:  Bus 002.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/4p, 10000M
>      ID 1d6b:0003 Linux Foundation 3.0 root hub
> 
> OK, with that out of the way, what's the problem? Well, with an USB
> audio class 2.0 device I would expect a maximum jitter of a microframe
> (125 us) regardless of the buffer size and I have used this audio
> interface successfully on different xHCI HCDs with buffer sizes as low
> as 24 samples at 48000 Hz sampling rate (0.5 ms period duration).
> 
> But in this mini PC I cannot run such low period durations/buffer sizes
> at all. Sporadically I see delays of up to 3 or 4 ms - which makes it
> absolutely unusable for real-time audio stuff. It can't really be NMIs
> from what I can tell - they would show up in the cyclictest results. So
> I suspect the xHCI is doing something funky.
> 
> Here's another observation: When I run the audio interface with 2
> periods of 48 frames at 48000 Hz sampling rate I get constant Xruns from
> jack of about 0.004-0.008 ms. It seems that the xHCI is not willing to
> deliver all microframes in a timely manner but somehow wants a minimum
> time interval of 1 ms between them (I read the term interrupt moderation
> somewhere on the web - but I don't know if the linux xhci driver does
> something like that at all.)

The default interrupt moderation is 1ms for xHC hardware, but the xhci
PCI driver should set it to 40 microseconds.

Interupt moderation value can be checked from debugfs:
# mount -t debugfs none /sys/kernel/debug/
# cat /sys/kernel/debug/usb/xhci/0000:00:14.0/reg-runtime
MFINDEX = 0x00002570
IR0_IMAN = 0x00000002
IR0_IMOD = 0x000000a0
IR0_ERSTSZ = 0x00000002
IR0_ERSTBA_LOW = 0x05069000
IR0_ERSTBA_HIGH = 0x00000001
IR0_ERDP_LOW = 0x05067b80
IR0_ERDP_HIGH = 0x00000001

IR0_IMOD shows the interrupt moderation value in 250 nanosecond steps.
i.e. 0xa0 is 160 * 250ns =>  40000ns

> 
> How could I debug this further? All help appreciated :)

Was looking at a similar issue which turned out to be wifi driver debug
blocking softirqs for a long time. This caused the the URBs that were given
back to audio class driver in a tasklet (softirq) context to be delayed.

This changed recently a bit. USB core now uses workqueues instead of tasklets,
but issue could be similar.

I'm not familiar with cyclictest. I was debugging using ftrace myslelf.
maybe ftrace preemptoff could show if something is blocking for a long time?

echo preemptoff > /sys/kernel/debug/tracing/current_tracer

xhci dynamic debug could show if xHC controller has any issues processing
TRBs on ring buffer in time. This debug should be limited to handle_tx_event()
as dynamic debug causes some delays itself.

If there are any issues on xHC side it should print underrun/overrun, or miss
service interval errors.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-19 13:38 ` Mathias Nyman
@ 2024-08-19 20:12   ` FPS
  2024-08-20 11:01     ` Michał Pecio
  0 siblings, 1 reply; 23+ messages in thread
From: FPS @ 2024-08-19 20:12 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb

On 8/19/24 15:38, Mathias Nyman wrote:
> On 18.8.2024 23.56, FPS wrote:
>> Hi!
>>
>> Here's another observation: When I run the audio interface with 2
>> periods of 48 frames at 48000 Hz sampling rate I get constant Xruns from
>> jack of about 0.004-0.008 ms. It seems that the xHCI is not willing to
>> deliver all microframes in a timely manner but somehow wants a minimum
>> time interval of 1 ms between them (I read the term interrupt moderation
>> somewhere on the web - but I don't know if the linux xhci driver does
>> something like that at all.)
>
> The default interrupt moderation is 1ms for xHC hardware, but the xhci
> PCI driver should set it to 40 microseconds.
>
> Interupt moderation value can be checked from debugfs:
> # mount -t debugfs none /sys/kernel/debug/
> # cat /sys/kernel/debug/usb/xhci/0000:00:14.0/reg-runtime
> MFINDEX = 0x00002570
> IR0_IMAN = 0x00000002
> IR0_IMOD = 0x000000a0
> IR0_ERSTSZ = 0x00000002
> IR0_ERSTBA_LOW = 0x05069000
> IR0_ERSTBA_HIGH = 0x00000001
> IR0_ERDP_LOW = 0x05067b80
> IR0_ERDP_HIGH = 0x00000001
>
> IR0_IMOD shows the interrupt moderation value in 250 nanosecond steps.
> i.e. 0xa0 is 160 * 250ns =>  40000ns

Hi! Thanks for your reply! I checked. It's 40 microseconds for this xhci:

# cat /sys/kernel/debug/usb/xhci/0000\:00\:14.0/reg-runtime
MFINDEX = 0x00001b55
IR0_IMAN = 0x00000002
IR0_IMOD = 0x000000a0
IR0_ERSTSZ = 0x00000001
IR0_ERSTBA_LOW = 0x02c00000
IR0_ERSTBA_HIGH = 0x00000001
IR0_ERDP_LOW = 0x02c01570
IR0_ERDP_HIGH = 0x00000001

>> How could I debug this further? All help appreciated :)
>
> Was looking at a similar issue which turned out to be wifi driver debug
> blocking softirqs for a long time. This caused the the URBs that were given
> back to audio class driver in a tasklet (softirq) context to be delayed.
>
> This changed recently a bit. USB core now uses workqueues instead of
> tasklets,
> but issue could be similar.
>
> I'm not familiar with cyclictest. I was debugging using ftrace myslelf.
> maybe ftrace preemptoff could show if something is blocking for a long
> time?
>
> echo preemptoff > /sys/kernel/debug/tracing/current_tracer

I will have to look into enabling this tracer on my kernel first. I'll
come back to it. Thanks for the tip!

> xhci dynamic debug could show if xHC controller has any issues processing
> TRBs on ring buffer in time. This debug should be limited to
> handle_tx_event()
> as dynamic debug causes some delays itself.
>
> If there are any issues on xHC side it should print underrun/overrun, or
> miss
> service interval errors.

OK, I read up a little on dynamic debug and performed:

[root@ogfx97:/sys/kernel/debug]# echo "func handle_tx_event +p" >
dynamic_debug/control

which then gives me this log output (snipped - it merrily continues on
like this) upon running jackd with a period size of 48 and 2 periods at
48000 Hz:

Aug 19 22:06:34.029680 ogfx97 kernel: xhci_hcd 0000:00:14.0: Stopped on
Transfer TRB for slot 18 ep 4
Aug 19 22:06:34.029990 ogfx97 kernel: xhci_hcd 0000:00:14.0: Stopped on
Transfer TRB for slot 18 ep 1
Aug 19 22:06:34.034685 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.035034 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.035272 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.035403 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.035545 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.035682 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.035821 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.035958 ogfx97 kernel: xhci_hcd 0000:00:14.0: underrun
event on endpoint
Aug 19 22:06:34.036080 ogfx97 kernel: xhci_hcd 0000:00:14.0: Underrun
Event for slot 18 ep 1 still with TDs queued?
Aug 19 22:06:34.036192 ogfx97 kernel: xhci_hcd 0000:00:14.0: WARN Event
TRB for slot 18 ep 1 with no TDs queued?
Aug 19 22:06:34.036309 ogfx97 kernel: xhci_hcd 0000:00:14.0: td_list is
empty while skip flag set. Clear skip flag for slot 18 ep 1.
Aug 19 22:06:34.036424 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 4, set skip flag
Aug 19 22:06:34.036539 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 4, set skip flag
Aug 19 22:06:34.036648 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 4, set skip flag
Aug 19 22:06:34.036794 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 4, set skip flag
Aug 19 22:06:34.036911 ogfx97 kernel: xhci_hcd 0000:00:14.0: Found td.
Clear skip flag for slot 18 ep 4.
Aug 19 22:06:34.042676 ogfx97 kernel: xhci_hcd 0000:00:14.0: Stopped on
Transfer TRB for slot 18 ep 4
Aug 19 22:06:34.042909 ogfx97 kernel: xhci_hcd 0000:00:14.0: Stopped on
Transfer TRB for slot 18 ep 1
Aug 19 22:06:34.048684 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.048927 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.049045 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.049158 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.049269 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.049379 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.049484 ogfx97 kernel: xhci_hcd 0000:00:14.0: Miss
service interval error for slot 18 ep 1, set skip flag
Aug 19 22:06:34.049595 ogfx97 kernel: xhci_hcd 0000:00:14.0: underrun
event on endpoint
Aug 19 22:06:34.049722 ogfx97 kernel: xhci_hcd 0000:00:14.0: Underrun
Event for slot 18 ep 1 still with TDs queued?
Aug 19 22:06:34.049833 ogfx97 kernel: xhci_hcd 0000:00:14.0: WARN Event
TRB for slot 18 ep 1 with no TDs queued?
Aug 19 22:06:34.049944 ogfx97 kernel: xhci_hcd 0000:00:14.0: td_list is
empty while skip flag set. Clear skip flag for slot 18 ep 1.

I have to note that these "WARN Event TRB for slot 18 ep 1 with no TDs
queued?" were there before enabling this dynamic debug feature, I just
forgot to mention them in my original mail.

So does this actually point to the xHCI?

And sorry for the newline-wrapped lines. I'll try and see if I can teach
my email client not to do that ;)

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-19 20:12   ` FPS
@ 2024-08-20 11:01     ` Michał Pecio
  2024-08-20 21:04       ` FPS
  0 siblings, 1 reply; 23+ messages in thread
From: Michał Pecio @ 2024-08-20 11:01 UTC (permalink / raw)
  To: mista.tapas; +Cc: linux-usb, mathias.nyman

Hi,


I can offer a few quick suggestions:

1. When kernel bugs are suspected, try other kernels offered by your
distribution. See if there is any chance that -rt paches are causing
issues.

2. Does any of that go away when ALSA buffer size is increased or is it
always there on this machine?

3. When posting wall of text errors, start at the beginning because it
may offer clues about what originally went wrong ('dmesg -W' helps).

4. Playing a tiny file with 'aplay --period-size=48 --buffer-size=96'
is a simpler way to reproduce the problem and generates a shorter log.


> I have to note that these "WARN Event TRB for slot 18 ep 1 with no TDs
> queued?" were there before enabling this dynamic debug feature, I just
> forgot to mention them in my original mail.

This particular part is probably caused by our failure to properly
handle the preceding condition ("underrun event still with TDs queued").
I can't know for sure, but assuming no hardware bugs, it appears that a
new transfer descriptor is queued after the hardware reports a ring
underrun but before we actually process the report. While processing
the underrun we are surprised by this unexpected TD, then we see that
skip flag is set so we erronously report all TDs (most likely including
the new one) as failed to the audio driver. Meanwhile the hardware may
execute this transfer and report its completion later, at which point
we have already forgotten about it.

*Maybe* this creates enough chaos that some sort of infinite loop of
cascading errors is established as a result of one recoverable error.
Or maybe your problem is elsewhere and this bug is only a side effect.

Are you able to test kernel patches?


Mathias may have other ideas to try.

Regards,
Michal

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-20 11:01     ` Michał Pecio
@ 2024-08-20 21:04       ` FPS
  2024-08-21 13:02         ` Michał Pecio
  0 siblings, 1 reply; 23+ messages in thread
From: FPS @ 2024-08-20 21:04 UTC (permalink / raw)
  To: Michał Pecio; +Cc: linux-usb, mathias.nyman

On 8/20/24 13:01, Michał Pecio wrote:
> I can offer a few quick suggestions:
>
> 1. When kernel bugs are suspected, try other kernels offered by your
> distribution. See if there is any chance that -rt paches are causing
> issues.

I have tried other kernels, including 4.19.319, 5.15.163-rt78 and
6.6.45. They all show the same behaviour of the audio device not
operating correctly at all with settings like 48 frames, 2 periods.

I have another Alder Lake system with an N100 CPU which has a similar
xHCI and it also shows problem 1 (see my answer to your question 2
below) but not problem 2 (see below as well):

00:14.0 USB controller: Intel Corporation Alder Lake-N PCH USB 3.2 xHCI
Host Controller (prog-if 30 [XHCI])
	Subsystem: ASRock Incorporation Device 54ed
	Flags: bus master, medium devsel, latency 0, IRQ 126
	Memory at 6001100000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Capabilities: [b0] Vendor Specific Information: Len=00 <?>
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci

I guess it's the same chip but integrated somewhat differently.

In that system I also have a PCIE usb controller:

01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host
Controller (rev 03) (prog-if 30 [XHCI])
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at 80a00000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [50] Power Management version 3
	Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
	Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
	Capabilities: [a0] Express Endpoint, IntMsgNum 0
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [150] Latency Tolerance Reporting
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci

And that one just works perfectly with full duplex operation even down
to buffer sizes of 12 frames / 2 buffers:

$ jackd -R -P 70 -d alsa -d hw:USB -n 2 -p 12

$ ./jack_wakeup -n 2000
min: 0.236729 ms; mean: 0.249978ms;  max: 0.258432 ms

0.25 ms would be perfect (2 USB microframes.)

So the -rt kernel is not to fault here per se.

(see note below on jack_wakeup)


> 2. Does any of that go away when ALSA buffer size is increased or is it
> always there on this machine?

Very good question! I suppose my initial report was a little unclear on
this. On this specific system there are two problems:

1. Certain combinations of buffer size and number of buffers do not work
reliably at all (i.e. xruns). These are basically all combinations of
buffer sizes < 128 and 2 buffers. I tested buffer sizes of 64 frames, 48
frames and 24 frames.

2. For those combinations that do seem to work (for example buffer size
64 with 3 buffers or buffer size of 128 and 2 buffers and up from there)
there are sporadic (about every 2-10 seconds or so) extra delays of
about 2 - 4 ms and they seem to be not randomly distributed at all but
rather always pretty close to full milliseconds.

Buffer sizes above 4 ms seem to work reliably with the light load I
tested but that is expected since the extra delay is then just masked by
the large buffers.

About problem 2: On the 4.19.319 kernel I tried problem 2 went away. It
also has not resurfaced after I rebooted into 6.6.44-rt39. I will do a
full power cycle and see if it resurfaces.

> 3. When posting wall of text errors, start at the beginning because it
> may offer clues about what originally went wrong ('dmesg -W' helps).

Sure!

> 4. Playing a tiny file with 'aplay --period-size=48 --buffer-size=96'
> is a simpler way to reproduce the problem and generates a shorter log.

Good point! But I played around some more and it seems that the problem
actually manifests in this precise way only if I actually do full duplex
audio processing. aplay would just use the playback direction, and also
it does not really do correct realtime scheduling.

If I just use the capture direction buffer sizes like 24 or 48 with 2
buffers appear to work and give me expected jitter. E.g for 48:

$ jackd -R -P 70 -d alsa -d hw:USB -n 2 -p 48 -C

$ ./src/jack_wakeup/jack_wakeup
min: 0.993217 ms; mean: 1.00002ms;  max: 1.00574 ms

Or for buffer size 24:

$ jackd -R -P 70 -d alsa -d hw:USB -n 2 -p 24 -C

$ ./src/jack_wakeup/jack_wakeup
min: 0.488788 ms; mean: 0.499921ms;  max: 0.510261 ms

For buffer size 12 things break though.

Note: jack_wakeup is just a small utility I wrote that measures the
interval between consecutive process callbacks which, in an unloaded
jack graph is useful to measure jitter in the system.

If you do not like the complexity that jack (jack1 in this case)
introduces I can probably cook up a small C program that just sets up
SCHED_FIFO, mlockall, etc, and does either simplex or full duplex
operations on the audio interface.

>> I have to note that these "WARN Event TRB for slot 18 ep 1 with no TDs
>> queued?" were there before enabling this dynamic debug feature, I just
>> forgot to mention them in my original mail.
>
> This particular part is probably caused by our failure to properly
> handle the preceding condition ("underrun event still with TDs queued").
> I can't know for sure, but assuming no hardware bugs, it appears that a
> new transfer descriptor is queued after the hardware reports a ring
> underrun but before we actually process the report. While processing
> the underrun we are surprised by this unexpected TD, then we see that
> skip flag is set so we erronously report all TDs (most likely including
> the new one) as failed to the audio driver. Meanwhile the hardware may
> execute this transfer and report its completion later, at which point
> we have already forgotten about it.
>
> *Maybe* this creates enough chaos that some sort of infinite loop of
> cascading errors is established as a result of one recoverable error.
> Or maybe your problem is elsewhere and this bug is only a side effect.
>
> Are you able to test kernel patches?

Yes, of course. It's been a couple of years since I did the whole
menuconfig, patch, rebuild, reboot, test - dance. I did this quite often
when Ingo posted early versions of the realtime preemption patches. E.g.

https://lkml.org/lkml/2004/11/21/184

Kind regards and thanks for your suggestions,
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-20 21:04       ` FPS
@ 2024-08-21 13:02         ` Michał Pecio
  2024-08-21 13:46           ` fps
  2024-08-21 14:15           ` Mathias Nyman
  0 siblings, 2 replies; 23+ messages in thread
From: Michał Pecio @ 2024-08-21 13:02 UTC (permalink / raw)
  To: FPS; +Cc: linux-usb, mathias.nyman

[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]

The reason I suggested aplay is only because it terminates after a
finite time, reducing the amount of output produced.

Today I found that I have similar issues if I plug my audio dongle into
one particular Asmedia USB 3.1 controller. I quickly confirmed your
finding that aplay has different symptoms - audio is slowed down and
distorted, but no errors in dmesg. Same for Jack in playback only mode.
I need to use Jack in full duplex to reproduce your symptoms.

I automatically stop Jack with timeout to control dmesg pollution:
timeout -s KILL 0.1 jackd -d alsa -d hw:3 -p 48 -n 2


The root cause appears to be that there are those "missed service" and
"underrun" errors in the first place. One means that the controller
failed to execute a transfer in its scheduled (micro)frame for internal
reasons or that the transfer was queued too late by software, the other
means that the controller ran out of queued data to send.

I could believe that maybe we are scheduling transfers too late,
causing these problems, but no idea why it only happens on particular
"bad" host controller, while othres are OK even on the same machine.


In my case, I don't observe the sequence of "missed service" followed
by "underrun" and I don't see the subsequent "WARNs" either. However,
I produced a patch which should fix this problem.

I produced a second patch which improves handling of "missed service"
errors, by reporting them faster to the audio driver. My hope was that
*maybe* this would speed up recovery from such condition and reduce the
disruption created, but it frankly doesn't seem to make much difference.


If you would like to play with the patches, I recommend using the
distribution's kernel config (/lib/modules/`uname -r`/build/.config)
because doing 'make menuconfig' from scratch takes a while nowadays.

Regards,
Michal

[-- Attachment #2: 0001-xhci-don-t-skip-on-isoc-overrun-underrun.patch --]
[-- Type: text/x-patch, Size: 802 bytes --]

From 22be329f3276f98f71a485b6d8ed1063246f0efc Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio@gmail.com>
Date: Wed, 21 Aug 2024 14:14:47 +0200
Subject: [PATCH 1/2] xhci: don't skip on isoc overrun / underrun

---
 drivers/usb/host/xhci-ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index a7d343b78557..376bcbe7b6e9 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2978,6 +2978,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 cleanup:
 		handling_skipped_tds = ep->skip &&
 			trb_comp_code != COMP_MISSED_SERVICE_ERROR &&
+			trb_comp_code != COMP_RING_UNDERRUN &&
+			trb_comp_code != COMP_RING_OVERRUN &&
 			trb_comp_code != COMP_NO_PING_RESPONSE_ERROR;
 
 		/*
-- 
2.43.0


[-- Attachment #3: 0002-xhci-process-isoc-Missed-Service-faster-on-modern-ho.patch --]
[-- Type: text/x-patch, Size: 1824 bytes --]

From adf5386fa4eba9d12f51db74162760b852d90034 Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio@gmail.com>
Date: Wed, 21 Aug 2024 14:15:30 +0200
Subject: [PATCH 2/2] xhci: process isoc Missed Service faster on modern hosts

---
 drivers/usb/host/xhci-ring.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 376bcbe7b6e9..7b3a466705ab 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2451,6 +2451,14 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		if (ep_trb != td->last_trb)
 			td->error_mid_td = true;
 		break;
+	case COMP_MISSED_SERVICE_ERROR:
+		/* seen in hosts conforming to xHCI revision 1.1 and later */
+		xhci_info(xhci, "Handling Missed Service Error\n");
+		frame->status = -EXDEV;
+		sum_trbs_for_length = true;
+		if (ep_trb != td->last_trb)
+			td->error_mid_td = true;
+		break;
 	case COMP_INCOMPATIBLE_DEVICE_ERROR:
 	case COMP_STALL_ERROR:
 		frame->status = -EPROTO;
@@ -2781,6 +2789,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		 * short transfer when process the ep_ring next time.
 		 */
 		ep->skip = true;
+		td_num = list_count_nodes(&ep_ring->td_list);
 		xhci_dbg(xhci,
 			 "Miss service interval error for slot %u ep %u, set skip flag\n",
 			 slot_id, ep_index);
@@ -2977,7 +2986,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 			process_bulk_intr_td(xhci, ep, ep_ring, td, ep_trb, event);
 cleanup:
 		handling_skipped_tds = ep->skip &&
-			trb_comp_code != COMP_MISSED_SERVICE_ERROR &&
+			(trb_comp_code != COMP_MISSED_SERVICE_ERROR || ep_trb_dma) &&
 			trb_comp_code != COMP_RING_UNDERRUN &&
 			trb_comp_code != COMP_RING_OVERRUN &&
 			trb_comp_code != COMP_NO_PING_RESPONSE_ERROR;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-21 13:02         ` Michał Pecio
@ 2024-08-21 13:46           ` fps
  2024-08-21 14:15           ` Mathias Nyman
  1 sibling, 0 replies; 23+ messages in thread
From: fps @ 2024-08-21 13:46 UTC (permalink / raw)
  To: Michał Pecio; +Cc: linux-usb, mathias.nyman

Hi again!

Thanks for the patches! I will try and see if I get them applied to my kernel and test them later tonight - I have a baby daughter at home, which might cause me to miss my service deadline ;)

FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-21 13:02         ` Michał Pecio
  2024-08-21 13:46           ` fps
@ 2024-08-21 14:15           ` Mathias Nyman
  2024-08-21 16:49             ` Michał Pecio
  2024-08-23 11:43             ` FPS
  1 sibling, 2 replies; 23+ messages in thread
From: Mathias Nyman @ 2024-08-21 14:15 UTC (permalink / raw)
  To: Michał Pecio, FPS; +Cc: linux-usb

On 21.8.2024 16.02, Michał Pecio wrote:
> The reason I suggested aplay is only because it terminates after a
> finite time, reducing the amount of output produced.
> 
> Today I found that I have similar issues if I plug my audio dongle into
> one particular Asmedia USB 3.1 controller. I quickly confirmed your
> finding that aplay has different symptoms - audio is slowed down and
> distorted, but no errors in dmesg. Same for Jack in playback only mode.
> I need to use Jack in full duplex to reproduce your symptoms.
> 
> I automatically stop Jack with timeout to control dmesg pollution:
> timeout -s KILL 0.1 jackd -d alsa -d hw:3 -p 48 -n 2
> 
> 
> The root cause appears to be that there are those "missed service" and
> "underrun" errors in the first place. One means that the controller
> failed to execute a transfer in its scheduled (micro)frame for internal
> reasons or that the transfer was queued too late by software, the other
> means that the controller ran out of queued data to send.

My best guess is that these are caused by a full xhci event ring.

If event ring is full it would stop transfer rings, and we would see
missed service events. This matches the logs.

The events get handled and cleared in the interrupt handler, but
not every event generates an interrupt. for isoc transfers we intentionally
block a lot of event triggered interrupts.

The event ring size has doubled since 6.6.45 kernel, using a newer kernel
could help. This changed in 6.7 kernel:
28084d3fcc3c ("xhci: Use more than one Event Ring segment")

I also now see that there is a driver issue in handling "missed service" events

If we get a "missed service" event for the last TD in a URB, then that URB won't
be returned to audio driver until we get a _successful_ event that generates
interrupts at some point later in the future. This could explain the underrun/overrun
events.

If possible I'd like to see xhci traces of this issue.

steps:
mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
< Reproduce issue >
Send content of /sys/kernel/debug/tracing/trace

Trace grows fast, so please copy the /sys/kernel/debug/tracing/trace file
somewhere as soon as issue is seen.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-21 14:15           ` Mathias Nyman
@ 2024-08-21 16:49             ` Michał Pecio
  2024-08-23 11:43             ` FPS
  1 sibling, 0 replies; 23+ messages in thread
From: Michał Pecio @ 2024-08-21 16:49 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: FPS, linux-usb

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]

> My best guess is that these are caused by a full xhci event ring.
I'm not sure. I'm testing with a full-speed audio dongle which has two
endpoints with one packet per millisecond each. Doesn't sound like much.
There is nothing more on this bus, besides hubs (part of the PCIe card).

(Is it possible that the hubs are causing issues due to TT or whatnot?)

/:  Bus 008.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 480M
    |__ Port 001: Dev 002, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 002: Dev 004, If 0, Class=Audio, Driver=snd-usb-audio, 12M
        |__ Port 002: Dev 004, If 1, Class=Audio, Driver=snd-usb-audio, 12M
        |__ Port 002: Dev 004, If 2, Class=Audio, Driver=snd-usb-audio, 12M
        |__ Port 002: Dev 004, If 3, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 002: Dev 003, If 0, Class=Hub, Driver=hub/4p, 480M
/:  Bus 009.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 001: Dev 002, If 0, Class=Hub, Driver=hub/4p, 10000M
    |__ Port 002: Dev 003, If 0, Class=Hub, Driver=hub/4p, 10000M

I tried 6.11-rc4 and cherry-picking 28084d3fcc3c on 6.6. Neither works.


An archive with short traces and corresponding dmesg logs is attached.
One run on vanilla v6.11-rc4, one run with my patches (don't skip on
underrun, do skip immediately on missed service if ep_trb_dma != NULL).

Regards,
Michal

[-- Attachment #2: traces.tgz --]
[-- Type: application/x-compressed-tar, Size: 10469 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-21 14:15           ` Mathias Nyman
  2024-08-21 16:49             ` Michał Pecio
@ 2024-08-23 11:43             ` FPS
  2024-08-24 21:22               ` FPS
  2024-08-27 12:38               ` Mathias Nyman
  1 sibling, 2 replies; 23+ messages in thread
From: FPS @ 2024-08-23 11:43 UTC (permalink / raw)
  To: Mathias Nyman, Michał Pecio; +Cc: linux-usb

On Wed, Aug 21, 2024 at 05:15:00PM +0300, Mathias Nyman wrote:
 > My best guess is that these are caused by a full xhci event ring.
 >
 > If event ring is full it would stop transfer rings, and we would see
 > missed service events. This matches the logs.
 >
 > The events get handled and cleared in the interrupt handler, but
 > not every event generates an interrupt. for isoc transfers we
intentionally
 > block a lot of event triggered interrupts.

This has piqued my interest. Can you point me to the code in the kernel
source where this filtering happens?

 > The event ring size has doubled since 6.6.45 kernel, using a newer kernel
 > could help. This changed in 6.7 kernel:
 > 28084d3fcc3c ("xhci: Use more than one Event Ring segment")

I tried with a 6.10.6 kernel from archlinux and the symptoms didnt'
change. Right now I'm running 6.7.0.6.realtime1-7-rt from archlinux and
the symptoms are also the same.

 > If possible I'd like to see xhci traces of this issue.

Here is a curl'able trace (from running timeout -s KILL 0.1 jackd -R -P
70 -d alsa -d hw:USB -p 48 -n 2 (thanks to Michal for the suggestion)):

https://uni-bielefeld.sciebo.de/s/1ixBodEpmSg8M2k/download

Or should I include it inline (it's ca. 1M.)

Kind regards,
FPS



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-23 11:43             ` FPS
@ 2024-08-24 21:22               ` FPS
  2024-08-25  4:58                 ` Michał Pecio
  2024-08-27 12:37                 ` Mathias Nyman
  2024-08-27 12:38               ` Mathias Nyman
  1 sibling, 2 replies; 23+ messages in thread
From: FPS @ 2024-08-24 21:22 UTC (permalink / raw)
  To: Mathias Nyman, Michał Pecio; +Cc: linux-usb



On 8/23/24 1:43 PM, FPS wrote:
>  > If possible I'd like to see xhci traces of this issue.
>
> Here is a curl'able trace (from running timeout -s KILL 0.1 jackd -R -P
> 70 -d alsa -d hw:USB -p 48 -n 2 (thanks to Michal for the suggestion)):
>
> https://uni-bielefeld.sciebo.de/s/1ixBodEpmSg8M2k/download
>
> Or should I include it inline (it's ca. 1M.)

OK, that trace is possibly unnecessarily complex. Maybe it makes sense
to look at a more isolated issue first. I mentioned before that just
using the capture direction with jackd seems to be fine, but the
playback direction is broken. It is broken in a somewhat "weird" way.
Jack doesn't report XRuns, but playback is definitely broken (Sound
plays back as if there is a zero buffer inserted after every buffer and
with ca. half the speed.

I ran jackd like this:

timeout -s KILL 2 jackd -R -P 70 -d alsa -d hw:USB -p 48 -n 2 -P 2>&1 |
logger

This gave me (somewhat imprecise) timestamps for the jack output:

[119172.966606] arch97 fps[177753]: jackd 0.126.0
[119172.966616] arch97 fps[177753]: Copyright 2001-2009 Paul Davis,
Stephane Letz, Jack O'Quinn, Torben Hohn and others.
[119172.966619] arch97 fps[177753]: jackd comes with ABSOLUTELY NO WARRANTY
[119172.966621] arch97 fps[177753]: This is free software, and you are
welcome to redistribute it
[119172.966622] arch97 fps[177753]: under certain conditions; see the
file COPYING for details
[119172.966624] arch97 fps[177753]:
[119172.973620] arch97 fps[177753]: no message buffer overruns
[119172.975503] arch97 fps[177753]: JACK compiled with System V SHM support.
[119172.989889] arch97 fps[177753]: loading driver ..
[119172.990672] arch97 fps[177753]: creating alsa driver ...
hw:USB|-|48|2|48000|0|0|nomon|swmeter|-|32bit
[119172.991245] arch97 fps[177753]: configuring for 48000Hz, period = 48
frames (1.0 ms), buffer = 2 periods
[119172.991249] arch97 fps[177753]: ALSA: final selected sample format
for playback: 32bit integer little-endian
[119172.991252] arch97 fps[177753]: ALSA: use 2 periods for playback
[119174.533438] arch97 fps[177753]: delay of 1003.000 usecs exceeds
estimated spare time of 966.000; restart ...
[119174.533463] arch97 fps[177753]:
[119174.536202] arch97 fps[177753]: delay of 1051.000 usecs exceeds
estimated spare time of 966.000; restart ...
[119174.536222] arch97 fps[177753]:
[119174.539134] arch97 fps[177753]: delay of 1060.000 usecs exceeds
estimated spare time of 966.000; restart ...
[119174.539152] arch97 fps[177753]:
[119174.542173] arch97 fps[177753]: delay of 1049.000 usecs exceeds
estimated spare time of 966.000; restart ...
[119174.542192] arch97 fps[177753]:
[119174.545145] arch97 fps[177753]: delay of 1047.000 usecs exceeds
estimated spare time of 966.000; restart ...
[119174.545164] arch97 fps[177753]:
[119174.548423] arch97 fps[177753]: delay of 993.000 usecs exceeds
estimated spare time of 966.000; restart ...
[119174.548447] arch97 fps[177753]:

which continues for a while until jackd is killed.. The timestamps allow
us to correlate (again, imprecisely) what's in the trace (curl'able link):

https://uni-bielefeld.sciebo.de/s/5gXyqY0cqNrQZdE/download

It is quite peculiar. Jack seems to start up fine and is done
configuring it's "driver" at 119172.991252 and seems to run fine until
119174.533438 which is ca. 1.5 secs later where weird things start to
happen. A delay of 1.003 ms is reported which is just above the 1 ms
period duration of 48 samples @ 48000 Hz sampling rate. So the last
buffer would have been sent off to the soundcard about 1 ms before this
time. Now looking at the trace I noticed a little thing. In the part pf
the trace before the excessive delays start I see things like this:

  irq/125-xhci_hc-169     [003] ..s.3 119173.246016:
xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
  irq/125-xhci_hc-169     [003] b...3 119173.247898: xhci_handle_event:
EVENT: TRB 000000010a3f14d0 status 'Success' len 0 slot 3 ep 2 type
'Transfer Event' flags e:c

or

  irq/125-xhci_hc-169     [003] ..s.3 119173.248024:
xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
  irq/125-xhci_hc-169     [003] b...3 119173.248883: xhci_handle_event:
EVENT: TRB 000000010a3f1560 status 'Success' len 0 slot 3 ep 2 type
'Transfer Event' flags e:c

The time between ringing the doorbell and the next xhci_handle_event is
mostly below 2 ms. But lateron I see things like this:

  irq/125-xhci_hc-169     [003] ..s.3 119174.323020:
xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
  irq/125-xhci_hc-169     [003] b...3 119174.324897: xhci_handle_event:
EVENT: TRB 000000010a3f1d30 status 'Success' len 0 slot 3 ep 2 type
'Transfer Event' flags e:C

Suddenly this time has jumped to almost 3 ms. I'm still getting my head
around the low level USB stuff but possibly this is a hint?

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-24 21:22               ` FPS
@ 2024-08-25  4:58                 ` Michał Pecio
  2024-08-25  7:46                   ` fps
  2024-08-27 12:37                 ` Mathias Nyman
  1 sibling, 1 reply; 23+ messages in thread
From: Michał Pecio @ 2024-08-25  4:58 UTC (permalink / raw)
  To: FPS; +Cc: Mathias Nyman, linux-usb

I believe what happens is that in full duplex, either by request from
Jack or on its own, snd-usb-audio attempts to synchronize the playback
and recording streams by scheduling packets to move in specific USB
(micro)frames.

With playback only, the queued packets are only marked "ASAP" and the
hardware can cheat by delaying them to the next frame if either:
- it fails to do the transfer on time for internal reasons
- we fail to queue the transfer before its due time

See discussion in xHCI spec 4.10.3.1, 4.10.3.2.

Regarding capture, snd-usb-audio seems to always queue about 10 packets
or so. Latency depends only on the hardware returning them quickly and
no delays in the IRO-drivers-application pipeline. There is almost zero
risk that the URBs won't be resubmitted on time and the hardware will
face either an empty queue or an already stale packet.

That being said, I'm seeing Missed Service Errors on the recording side,
so it's possible that in my case the ASmedia controller is to blame...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-25  4:58                 ` Michał Pecio
@ 2024-08-25  7:46                   ` fps
  2024-08-25 15:15                     ` Michał Pecio
  2024-08-27 10:30                     ` FPS
  0 siblings, 2 replies; 23+ messages in thread
From: fps @ 2024-08-25  7:46 UTC (permalink / raw)
  To: Michał Pecio; +Cc: Mathias Nyman, linux-usb

On August 25, 2024 6:58:31 AM GMT+02:00, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>I believe what happens is that in full duplex, either by request from
>Jack or on its own, snd-usb-audio attempts to synchronize the playback
>and recording streams by scheduling packets to move in specific USB
>(micro)frames.

Yes, at least in libasound2 there is a call snd_pcm_link

<https://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gac6c33091b049985baa6466e8fe93917e>

which jack uses in the full duplex case:

<https://github.com/jackaudio/jack1/blob/ab2e7363cacd0bf4b961c0466c13b0b4c1086ed9/drivers/alsa/alsa_driver.c#L2431>

I know too little about alsa internals and USB to follow through the libasound2 > snd_usb_audio > usb class driver > usb core > xhci chain to grasp the specific way this is achieved though :( 

Bugs could be lurking everywhere ;)

>With playback only, the queued packets are only marked "ASAP" and the
>hardware can cheat by delaying them to the next frame if either:
>- it fails to do the transfer on time for internal reasons
>- we fail to queue the transfer before its due time
>
>See discussion in xHCI spec 4.10.3.1, 4.10.3.2.

Thanks for the pointer.

>Regarding capture, snd-usb-audio seems to always queue about 10 packets
>or so. Latency depends only on the hardware returning them quickly and
>no delays in the IRO-drivers-application pipeline. There is almost zero
>risk that the URBs won't be resubmitted on time and the hardware will
>face either an empty queue or an already stale packet.
>
>That being said, I'm seeing Missed Service Errors on the recording side,
>so it's possible that in my case the ASmedia controller is to blame...

I will try and do a little more "comparative tracing" for the capture/playback/full duplex cases on 1. the intel xhci and 2. the renesas xhci which seems to work perfectly.

Kind regards,
FPS


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-25  7:46                   ` fps
@ 2024-08-25 15:15                     ` Michał Pecio
  2024-08-25 20:38                       ` FPS
  2024-08-27 10:30                     ` FPS
  1 sibling, 1 reply; 23+ messages in thread
From: Michał Pecio @ 2024-08-25 15:15 UTC (permalink / raw)
  To: fps; +Cc: Mathias Nyman, linux-usb

What is the last digit of your HCSPARAMS2 register?
Here's my three hosts, guess which is the one with problems.

grep HCSPARAMS2 /sys/kernel/debug/usb/xhci/*/reg-cap
/sys/kernel/debug/usb/xhci/0000:02:00.0/reg-cap:HCSPARAMS2 = 0xfc0000fa
/sys/kernel/debug/usb/xhci/0000:03:00.0/reg-cap:HCSPARAMS2 = 0xfc000031
/sys/kernel/debug/usb/xhci/0000:04:00.0/reg-cap:HCSPARAMS2 = 0x00000011

See xHCI section 5.3.4. If IST is high, you are probably out of luck.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-25 15:15                     ` Michał Pecio
@ 2024-08-25 20:38                       ` FPS
  0 siblings, 0 replies; 23+ messages in thread
From: FPS @ 2024-08-25 20:38 UTC (permalink / raw)
  To: Michał Pecio; +Cc: Mathias Nyman, linux-usb

On 8/25/24 5:15 PM, Michał Pecio wrote:
> What is the last digit of your HCSPARAMS2 register?
> Here's my three hosts, guess which is the one with problems.
>
> grep HCSPARAMS2 /sys/kernel/debug/usb/xhci/*/reg-cap
> /sys/kernel/debug/usb/xhci/0000:02:00.0/reg-cap:HCSPARAMS2 = 0xfc0000fa
> /sys/kernel/debug/usb/xhci/0000:03:00.0/reg-cap:HCSPARAMS2 = 0xfc000031
> /sys/kernel/debug/usb/xhci/0000:04:00.0/reg-cap:HCSPARAMS2 = 0x00000011

Well, the only one with IST[3] set is 0000:02:00.0 and not only that,
but 0x0A & 0x07 == 2 so it wants the TRBs at least 2 _frames_ == 2 ms
ahead of time. Or did I completely miss something here? :)

> See xHCI section 5.3.4. If IST is high, you are probably out of luck.

In my case (on the N97 system) it's a little different though:

# grep HCSPARAMS2 /sys/kernel/debug/usb/xhci/0000\:00\:14.0/reg-cap
HCSPARAMS2 = 0x14200054

== b00010 1 00001 0000000000000 0101 0100

IST[3] is low, and IST[2:0] == 4, so I would expect that (taking
paragraph 2 of xHCI section 4.14.2.1.4 into account) 1 ms would be
enough: 4 + 1 microframes == 675 us.

On my N100 system, the working Renesas controller has:

/sys/kernel/debug/usb/xhci/0000:01:00.0/reg-cap:HCSPARAMS2 = 0x24000011

1 (+1) microframes - not too shabby. And the Intel one:

/sys/kernel/debug/usb/xhci/0000:00:14.0/reg-cap:HCSPARAMS2 = 0x14200054

again the same as on my N97 system.

BTW: Should I still try your 2 patches from the earlier email?

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-25  7:46                   ` fps
  2024-08-25 15:15                     ` Michał Pecio
@ 2024-08-27 10:30                     ` FPS
  2024-08-27 11:31                       ` Mathias Nyman
  1 sibling, 1 reply; 23+ messages in thread
From: FPS @ 2024-08-27 10:30 UTC (permalink / raw)
  To: Michał Pecio; +Cc: Mathias Nyman, linux-usb

On 8/25/24 9:46 AM, fps wrote:
> I will try and do a little more "comparative tracing" for the capture/playback/full duplex cases on 1. the intel xhci and 2. the renesas xhci which seems to work perfectly.

Hi again!

I captured a trace from the xHCI from the Renesas and the Intel
controllers and I noticed a couple of differences (this is for the
playback case only). I didn't find info on what precisely these fields
tell me.

I show here only partial traces from where it seems that regular
playback has started. I also added a couple of new lines since I haven't
yet found out how to stop thunderbird from line-wrapping :(

Intel:

           jackd-177754  [000] ....2 119172.988707:
xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
  irq/125-xhci_hc-169     [003] b...3 119172.991808: xhci_handle_event:
EVENT: TRB 000000010a3f2000 status 'Success' len 0 slot 3 ep 2 type
'Transfer Event' flags e:c

  irq/125-xhci_hc-169     [003] b...3 119172.991811:
xhci_handle_transfer: ISOC: Buffer 000000011730a000 length 48 TD size 0
intr 0 type 'Isoch' flags B:i:I:c:s:i:e:C

  irq/125-xhci_hc-169     [003] b...3 119172.991813: xhci_inc_deq: ISOC
000000001e4e95c7: enq 0x000000010a3f20f0(0x000000010a3f2000) deq
0x000000010a3f2010(0x000000010a3f2000) segs 2 stream 0 bounce 200 cycle 1

  irq/125-xhci_hc-169     [003] b...3 119172.991814: xhci_inc_deq: EVENT
000000001a8ce64f: enq 0x00000001095d4000(0x00000001095d4000) deq
0x00000001095d40c0(0x00000001095d4000) segs 2 stream 0 bounce 0 cycle 0

Renesas:

           jackd-683428  [003] ....2 564371.947885:
xhci_ring_ep_doorbell: Ring doorbell for Slot 1 ep1out
  irq/127-xhci_hc-173     [002] b...3 564371.948914: xhci_handle_event:
EVENT: TRB 0000000115b2d000 status 'Success' len 0 slot 1 ep 2 type
'Transfer Event' flags e:C

  irq/127-xhci_hc-173     [002] b...3 564371.948916:
xhci_handle_transfer: ISOC: Buffer 0000000153a6a800 length 48 TD size 0
intr 0 type 'Isoch' flags B:i:I:c:s:i:e:C

  irq/127-xhci_hc-173     [002] b...3 564371.948920: xhci_inc_deq: ISOC
0000000059645549: enq 0x0000000115b2d0f0(0x0000000115b2d000) deq
0x0000000115b2d010(0x0000000115b2d000) segs 2 stream 0 bounce 200 cycle 1

  irq/127-xhci_hc-173     [002] b...3 564371.948921: xhci_inc_deq: EVENT
00000000ee8f0a5d: enq 0x000000010662e000(0x000000010662e000) deq
0x000000010662e470(0x000000010662e000) segs 1 stream 0 bounce 0 cycle 1

The first difference is that in the first line of the respective traces
the 'Transfer Event' flags have a capital 'C' for the Renesas and a
lower case 'c' for the Intel controller.

The second difference is that the fourth line has 'segs 2' for the
Intel, but 'segs 1' for the Renesas controller.

A third difference is that the fourth line ends with 'cycle 0' in the
Intel case and 'cycle 1' in the Renesas case.

I suppose these differences are mostly harmless?

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-27 10:30                     ` FPS
@ 2024-08-27 11:31                       ` Mathias Nyman
  2024-08-27 12:18                         ` FPS
  0 siblings, 1 reply; 23+ messages in thread
From: Mathias Nyman @ 2024-08-27 11:31 UTC (permalink / raw)
  To: FPS, Michał Pecio; +Cc: linux-usb

On 27.8.2024 13.30, FPS wrote:
> On 8/25/24 9:46 AM, fps wrote:
>> I will try and do a little more "comparative tracing" for the capture/playback/full duplex cases on 1. the intel xhci and 2. the renesas xhci which seems to work perfectly.
> 
> Hi again!
> 
> I captured a trace from the xHCI from the Renesas and the Intel
> controllers and I noticed a couple of differences (this is for the
> playback case only). I didn't find info on what precisely these fields
> tell me.
> 
> I show here only partial traces from where it seems that regular
> playback has started. I also added a couple of new lines since I haven't
> yet found out how to stop thunderbird from line-wrapping :(
> 
> Intel:
> 
>            jackd-177754  [000] ....2 119172.988707:
> xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
>   irq/125-xhci_hc-169     [003] b...3 119172.991808: xhci_handle_event:
> EVENT: TRB 000000010a3f2000 status 'Success' len 0 slot 3 ep 2 type
> 'Transfer Event' flags e:c
> 
>   irq/125-xhci_hc-169     [003] b...3 119172.991811:
> xhci_handle_transfer: ISOC: Buffer 000000011730a000 length 48 TD size 0
> intr 0 type 'Isoch' flags B:i:I:c:s:i:e:C
> 
>   irq/125-xhci_hc-169     [003] b...3 119172.991813: xhci_inc_deq: ISOC
> 000000001e4e95c7: enq 0x000000010a3f20f0(0x000000010a3f2000) deq
> 0x000000010a3f2010(0x000000010a3f2000) segs 2 stream 0 bounce 200 cycle 1
> 
>   irq/125-xhci_hc-169     [003] b...3 119172.991814: xhci_inc_deq: EVENT
> 000000001a8ce64f: enq 0x00000001095d4000(0x00000001095d4000) deq
> 0x00000001095d40c0(0x00000001095d4000) segs 2 stream 0 bounce 0 cycle 0
> 
> Renesas:
> 
>            jackd-683428  [003] ....2 564371.947885:
> xhci_ring_ep_doorbell: Ring doorbell for Slot 1 ep1out
>   irq/127-xhci_hc-173     [002] b...3 564371.948914: xhci_handle_event:
> EVENT: TRB 0000000115b2d000 status 'Success' len 0 slot 1 ep 2 type
> 'Transfer Event' flags e:C
> 
>   irq/127-xhci_hc-173     [002] b...3 564371.948916:
> xhci_handle_transfer: ISOC: Buffer 0000000153a6a800 length 48 TD size 0
> intr 0 type 'Isoch' flags B:i:I:c:s:i:e:C
> 
>   irq/127-xhci_hc-173     [002] b...3 564371.948920: xhci_inc_deq: ISOC
> 0000000059645549: enq 0x0000000115b2d0f0(0x0000000115b2d000) deq
> 0x0000000115b2d010(0x0000000115b2d000) segs 2 stream 0 bounce 200 cycle 1
> 
>   irq/127-xhci_hc-173     [002] b...3 564371.948921: xhci_inc_deq: EVENT
> 00000000ee8f0a5d: enq 0x000000010662e000(0x000000010662e000) deq
> 0x000000010662e470(0x000000010662e000) segs 1 stream 0 bounce 0 cycle 1
> 
> The first difference is that in the first line of the respective traces
> the 'Transfer Event' flags have a capital 'C' for the Renesas and a
> lower case 'c' for the Intel controller.
> 
> The second difference is that the fourth line has 'segs 2' for the
> Intel, but 'segs 1' for the Renesas controller.
> 
> A third difference is that the fourth line ends with 'cycle 0' in the
> Intel case and 'cycle 1' in the Renesas case.
> 
> I suppose these differences are mostly harmless?

Should be harmless, the cycle bit and capital 'C' changes each time the
ringbuffer wraps around.
This is how TRB consumers/producers keep track of where we are in the ring.

segs 1 vs 2 just tells that we have allocated 2 segments for *Intel host
event ringbuffer while only one for Renesas.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-27 11:31                       ` Mathias Nyman
@ 2024-08-27 12:18                         ` FPS
  2024-08-27 13:49                           ` Mathias Nyman
  0 siblings, 1 reply; 23+ messages in thread
From: FPS @ 2024-08-27 12:18 UTC (permalink / raw)
  To: Mathias Nyman, Michał Pecio; +Cc: linux-usb



On 8/27/24 1:31 PM, Mathias Nyman wrote:
> Should be harmless, the cycle bit and capital 'C' changes each time the
> ringbuffer wraps around.
> This is how TRB consumers/producers keep track of where we are in the ring.
>
> segs 1 vs 2 just tells that we have allocated 2 segments for *Intel host
> event ringbuffer while only one for Renesas.

OK, thanks for that explanation. I uploaded the full Intel controller
trace here (curl'able link):

https://uni-bielefeld.sciebo.de/s/0O4XIzW529sKYQM/download

And here is the Renesas trace:

https://uni-bielefeld.sciebo.de/s/jB4qqFL0okPlYwN/download

Another difference which I find maybe more interesting then. If you
scroll down to where the steady state has been reached, e.g. timestamp
119173.008004 for the Intel trace and timestamp 564371.959089 for the
Renesas trace, then there are always 8 xhci_handle_transfer calls for
TDs of size 48 and 8 queue_trb calls between doorbell rings for the
Renesas controller, but for the Intel controller it looks different:
There are also always 8 xhci_queue_trb calls, but either 7 or 9
xhci_handle_transfer calls. This is quite odd, no?

Kind

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-24 21:22               ` FPS
  2024-08-25  4:58                 ` Michał Pecio
@ 2024-08-27 12:37                 ` Mathias Nyman
  2024-08-28  9:15                   ` FPS
  1 sibling, 1 reply; 23+ messages in thread
From: Mathias Nyman @ 2024-08-27 12:37 UTC (permalink / raw)
  To: FPS, Michał Pecio; +Cc: linux-usb

On 25.8.2024 0.22, FPS wrote:
> 
> 
> On 8/23/24 1:43 PM, FPS wrote:
>>  > If possible I'd like to see xhci traces of this issue.
>>
>> Here is a curl'able trace (from running timeout -s KILL 0.1 jackd -R -P
>> 70 -d alsa -d hw:USB -p 48 -n 2 (thanks to Michal for the suggestion)):
>>
>> https://uni-bielefeld.sciebo.de/s/1ixBodEpmSg8M2k/download
>>
>> Or should I include it inline (it's ca. 1M.)
> 
> OK, that trace is possibly unnecessarily complex. Maybe it makes sense
> to look at a more isolated issue first. I mentioned before that just
> using the capture direction with jackd seems to be fine, but the
> playback direction is broken. It is broken in a somewhat "weird" way.
> Jack doesn't report XRuns, but playback is definitely broken (Sound
> plays back as if there is a zero buffer inserted after every buffer and
> with ca. half the speed.

It's a bit complex, but it shows the missed service events started from
out transfers.

Quick background, audio class driver queues URBs. One URB containis one or several
TDs (transfer descriptors). One TD is consumed each interval, in this case each 125us.
In this case audio driver uses two URBs, usually containing 7 or 8 TDs, each transferring
48 bytes. TD (Transfer descriptor) equals one TRB (Transfer Block) in this particular case.


USB audio driver queues two URBs:
   185.943950: xhci_urb_enqueue: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/384  8 x 48 bytes 0x10a3f5000 - 0x10a3f5070
   185.943966: xhci_urb_enqueue: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336  7 x 48 bytes 0x10a3f5080 - 0x10a3f50e0

xHCI gives back URBs once completed. About 1ms apart which is ok.
Odd that audio driver doesn't queue back the fist URB immediately after receiving it, this could cause missed service event.
   185.946834: xhci_urb_giveback: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 384/384
   185.947705: xhci_urb_giveback: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 336/336

Audio driver queues both URBs after it has processed their content.
very odd that the other URB only contains 1 TD (covering only 125us)
   185.947732: xhci_urb_enqueue: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/48  1 x 48 bytes (0x10a3f50f0)
   185.947745: xhci_urb_enqueue: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336 7 x 48 bytes (0x10a3f5100 - 10a3f5160)

Audio driver cancels both URBs, Odd, why does it do that?
   185.949843: xhci_urb_dequeue: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/48    CANCEL why?? (0x10a3f50f0)
   185.949848: xhci_urb_dequeue: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336   CANCEL why?? (0x10a3f5100 - 10a3f5160) turned to no-ops, deq at af5100

xhci gives back both cancelled URBs (empty)
   185.949902: xhci_urb_giveback: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336
   185.949908: xhci_urb_giveback: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/48

Audio driver queues both URBs, almost at the same time
   185.950017: xhci_urb_enqueue: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/384  8 x 48 bytes (10a3f5170 - 10a3f51e0)
   185.950031: xhci_urb_enqueue: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336  7 x 48 bytes (10a3f51f0 - 10a3f5250)

xhci driver successfully completes the first URB.
   185.952765: xhci_urb_giveback: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 384/384

xhci reports missed Service error for the first TD (TRB) of the second URB (at buffer location 000000010a3f51f0)
   185.952876: xhci_handle_event: EVENT: TRB 000000010a3f51f0 status 'Missed Service Error' len 48 slot 3 ep 2 type 'Transfer Event' flags e:C

This is again odd.
The two last URBs were queued at almost the same time, and the TDs of the first URBs
completed successfully. I would understand a 'Missed service event' at an earlier stage,
for example at  0x10a3f50f0.

Maybe the missed service event TRB pointer isn't that exact.
Also, audio driver cancelling URBs right after queuing them does not help.

> 
> I ran jackd like this:
> 
> timeout -s KILL 2 jackd -R -P 70 -d alsa -d hw:USB -p 48 -n 2 -P 2>&1 |
> logger
> 
> This gave me (somewhat imprecise) timestamps for the jack output:
> 
> [119172.966606] arch97 fps[177753]: jackd 0.126.0
> [119172.966616] arch97 fps[177753]: Copyright 2001-2009 Paul Davis,
> Stephane Letz, Jack O'Quinn, Torben Hohn and others.
> [119172.966619] arch97 fps[177753]: jackd comes with ABSOLUTELY NO WARRANTY
> [119172.966621] arch97 fps[177753]: This is free software, and you are
> welcome to redistribute it
> [119172.966622] arch97 fps[177753]: under certain conditions; see the
> file COPYING for details
> [119172.966624] arch97 fps[177753]:
> [119172.973620] arch97 fps[177753]: no message buffer overruns
> [119172.975503] arch97 fps[177753]: JACK compiled with System V SHM support.
> [119172.989889] arch97 fps[177753]: loading driver ..
> [119172.990672] arch97 fps[177753]: creating alsa driver ...
> hw:USB|-|48|2|48000|0|0|nomon|swmeter|-|32bit
> [119172.991245] arch97 fps[177753]: configuring for 48000Hz, period = 48
> frames (1.0 ms), buffer = 2 periods
> [119172.991249] arch97 fps[177753]: ALSA: final selected sample format
> for playback: 32bit integer little-endian
> [119172.991252] arch97 fps[177753]: ALSA: use 2 periods for playback
> [119174.533438] arch97 fps[177753]: delay of 1003.000 usecs exceeds
> estimated spare time of 966.000; restart ...
> [119174.533463] arch97 fps[177753]:
> [119174.536202] arch97 fps[177753]: delay of 1051.000 usecs exceeds
> estimated spare time of 966.000; restart ...
> [119174.536222] arch97 fps[177753]:
> [119174.539134] arch97 fps[177753]: delay of 1060.000 usecs exceeds
> estimated spare time of 966.000; restart ...
> [119174.539152] arch97 fps[177753]:
> [119174.542173] arch97 fps[177753]: delay of 1049.000 usecs exceeds
> estimated spare time of 966.000; restart ...
> [119174.542192] arch97 fps[177753]:
> [119174.545145] arch97 fps[177753]: delay of 1047.000 usecs exceeds
> estimated spare time of 966.000; restart ...
> [119174.545164] arch97 fps[177753]:
> [119174.548423] arch97 fps[177753]: delay of 993.000 usecs exceeds
> estimated spare time of 966.000; restart ...
> [119174.548447] arch97 fps[177753]:
> 
> which continues for a while until jackd is killed.. The timestamps allow
> us to correlate (again, imprecisely) what's in the trace (curl'able link):
> 
> https://uni-bielefeld.sciebo.de/s/5gXyqY0cqNrQZdE/download
> 
> It is quite peculiar. Jack seems to start up fine and is done
> configuring it's "driver" at 119172.991252 and seems to run fine until
> 119174.533438 which is ca. 1.5 secs later where weird things start to
> happen. A delay of 1.003 ms is reported which is just above the 1 ms
> period duration of 48 samples @ 48000 Hz sampling rate. So the last
> buffer would have been sent off to the soundcard about 1 ms before this
> time. Now looking at the trace I noticed a little thing. In the part pf
> the trace before the excessive delays start I see things like this:
> 
>   irq/125-xhci_hc-169     [003] ..s.3 119173.246016:
> xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
>   irq/125-xhci_hc-169     [003] b...3 119173.247898: xhci_handle_event:
> EVENT: TRB 000000010a3f14d0 status 'Success' len 0 slot 3 ep 2 type
> 'Transfer Event' flags e:c
> 
> or
> 
>   irq/125-xhci_hc-169     [003] ..s.3 119173.248024:
> xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
>   irq/125-xhci_hc-169     [003] b...3 119173.248883: xhci_handle_event:
> EVENT: TRB 000000010a3f1560 status 'Success' len 0 slot 3 ep 2 type
> 'Transfer Event' flags e:c
> 
> The time between ringing the doorbell and the next xhci_handle_event is
> mostly below 2 ms. But lateron I see things like this:
> 
>   irq/125-xhci_hc-169     [003] ..s.3 119174.323020:
> xhci_ring_ep_doorbell: Ring doorbell for Slot 3 ep1out
>   irq/125-xhci_hc-169     [003] b...3 119174.324897: xhci_handle_event:
> EVENT: TRB 000000010a3f1d30 status 'Success' len 0 slot 3 ep 2 type
> 'Transfer Event' flags e:C
> 
> Suddenly this time has jumped to almost 3 ms. I'm still getting my head
> around the low level USB stuff but possibly this is a hint?

It's interesting to see that there are no 'Missed service events' or
buffer under/overruns even if host clearly doesn't handle one TD per 125us.

10 returned URBs from last log show additional delays.
grep "xhci_urb_giveback: ep1out-isoc" trace_playback.txt:
...
119174.201925: xhci_urb_giveback: ep1out-isoc: urb 000000005f2dc90a pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.202916: xhci_urb_giveback: ep1out-isoc: urb 0000000032ac090e pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.204924: xhci_urb_giveback: ep1out-isoc: urb 000000005f2dc90a pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.205913: xhci_urb_giveback: ep1out-isoc: urb 0000000032ac090e pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.207926: xhci_urb_giveback: ep1out-isoc: urb 000000005f2dc90a pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.208912: xhci_urb_giveback: ep1out-isoc: urb 0000000032ac090e pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.210923: xhci_urb_giveback: ep1out-isoc: urb 000000005f2dc90a pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.211916: xhci_urb_giveback: ep1out-isoc: urb 0000000032ac090e pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.213925: xhci_urb_giveback: ep1out-isoc: urb 000000005f2dc90a pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004
119174.214912: xhci_urb_giveback: ep1out-isoc: urb 0000000032ac090e pipe 33792 slot 3 length 384/384 sgs 0/0 stream 0 flags 00000004

URBs contain 8TDs with 125us interval, and should complete 1ms apart on average.
This won't be true for every URB as sometimes urb giveback might be blocked for
a short while, but it should average out. Next URB would then return a bit earlier.

In this case it doesn't. URB are permanently delayed.

We use "SIA" (Start Isoc ASAP) flag for Isoc transfers, this allows controller
to queue TD to be transferred "as soon as possible" without forcing it to a
specific microframe.  Could be that this flag in combination with queueing TDs
very close to the IST limit causes the delay but not triggering missed service
events, or under/overruns.

Adding more URBs could help, but causes a bit more latency.

Or maybe see if we can get audio driver to use 3 URBs with 6TDs each?
  
Thanks
Mathias


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-23 11:43             ` FPS
  2024-08-24 21:22               ` FPS
@ 2024-08-27 12:38               ` Mathias Nyman
  2024-08-27 20:49                 ` FPS
  1 sibling, 1 reply; 23+ messages in thread
From: Mathias Nyman @ 2024-08-27 12:38 UTC (permalink / raw)
  To: FPS, Michał Pecio; +Cc: linux-usb

On 23.8.2024 14.43, FPS wrote:
> On Wed, Aug 21, 2024 at 05:15:00PM +0300, Mathias Nyman wrote:
>  > My best guess is that these are caused by a full xhci event ring.
>  >
>  > If event ring is full it would stop transfer rings, and we would see
>  > missed service events. This matches the logs.
>  >
>  > The events get handled and cleared in the interrupt handler, but
>  > not every event generates an interrupt. for isoc transfers we
> intentionally
>  > block a lot of event triggered interrupts.
> 
> This has piqued my interest. Can you point me to the code in the kernel
> source where this filtering happens?

When a transfer completes the xHC hw will add an event on the event ring pointing
to this completed transfer block (TRB), and trigger an interrupt.
We can prevent xHC from generating the interrupt by setting a BEI (Block Event Interrupt)
flag in the Transfer Block (TRB)

To make sure every event triggers an interrupt just comment out:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4ea2c3e072a9..de61a1018155 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -4108,8 +4108,8 @@ static int xhci_queue_isoc_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
                                 td->last_trb = ep_ring->enqueue;
                                 td->last_trb_seg = ep_ring->enq_seg;
                                 field |= TRB_IOC;
-                               if (trb_block_event_intr(xhci, num_tds, i, ir))
-                                       field |= TRB_BEI;
+//                             if (trb_block_event_intr(xhci, num_tds, i, ir))
+//                                     field |= TRB_BEI;

But I don't think this is the issue after looking at the traces.

Thanks
Mathias

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-27 12:18                         ` FPS
@ 2024-08-27 13:49                           ` Mathias Nyman
  0 siblings, 0 replies; 23+ messages in thread
From: Mathias Nyman @ 2024-08-27 13:49 UTC (permalink / raw)
  To: FPS, Michał Pecio; +Cc: linux-usb

On 27.8.2024 15.18, FPS wrote:
> 
> 
> On 8/27/24 1:31 PM, Mathias Nyman wrote:
>> Should be harmless, the cycle bit and capital 'C' changes each time the
>> ringbuffer wraps around.
>> This is how TRB consumers/producers keep track of where we are in the ring.
>>
>> segs 1 vs 2 just tells that we have allocated 2 segments for *Intel host
>> event ringbuffer while only one for Renesas.
> 
> OK, thanks for that explanation. I uploaded the full Intel controller
> trace here (curl'able link):
> 
> https://uni-bielefeld.sciebo.de/s/0O4XIzW529sKYQM/download
> 
> And here is the Renesas trace:
> 
> https://uni-bielefeld.sciebo.de/s/jB4qqFL0okPlYwN/download
> 
> Another difference which I find maybe more interesting then. If you
> scroll down to where the steady state has been reached, e.g. timestamp
> 119173.008004 for the Intel trace and timestamp 564371.959089 for the
> Renesas trace, then there are always 8 xhci_handle_transfer calls for
> TDs of size 48 and 8 queue_trb calls between doorbell rings for the
> Renesas controller, but for the Intel controller it looks different:
> There are also always 8 xhci_queue_trb calls, but either 7 or 9
> xhci_handle_transfer calls. This is quite odd, no?

The 9th event means that xHC hardware managed to handle one more isoc transfer
before the xhci interrupt handler was called and started handling events.

xhci interrupt handler handles all pending events on the event ring, but an
actual interrupt is generated only on last (8th) TD.

xHC completes 1st TD of URB X, writes event on event ring
   125us passes
xHC completes 2nd TD of URB X, writes event on event ring
   ... (same for all TDs 3-7)
xHC completes 8th TD (last in URB X), writes event on event ring, triggers interrupt.
   125us passes
xHC completes 1st TD of URB Y, writes event on event ring.
xhci interrupt handler is finally running, processing all 9 pending transfers,
it will give back URB X after processing its last (8th) event, but continue
processing the first TD of URB Y

So there is an at least 125us delay between xHC completing 8th TD, triggering
an interrupt and interrupt handler actually running, finishing processing events.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-27 12:38               ` Mathias Nyman
@ 2024-08-27 20:49                 ` FPS
  0 siblings, 0 replies; 23+ messages in thread
From: FPS @ 2024-08-27 20:49 UTC (permalink / raw)
  To: Mathias Nyman, Michał Pecio; +Cc: linux-usb

On 8/27/24 2:38 PM, Mathias Nyman wrote:
> To make sure every event triggers an interrupt just comment out:
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 4ea2c3e072a9..de61a1018155 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -4108,8 +4108,8 @@ static int xhci_queue_isoc_tx(struct xhci_hcd 
> *xhci, gfp_t mem_flags,
>                                  td->last_trb = ep_ring->enqueue;
>                                  td->last_trb_seg = ep_ring->enq_seg;
>                                  field |= TRB_IOC;
> -                               if (trb_block_event_intr(xhci, num_tds, 
> i, ir))
> -                                       field |= TRB_BEI;
> +//                             if (trb_block_event_intr(xhci, num_tds, 
> i, ir))
> +//                                     field |= TRB_BEI;
> 
> But I don't think this is the issue after looking at the traces.

Yep, confirmed. Makes no difference at all it seems :) Much appreciated 
though - so thanks again!

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller
  2024-08-27 12:37                 ` Mathias Nyman
@ 2024-08-28  9:15                   ` FPS
  0 siblings, 0 replies; 23+ messages in thread
From: FPS @ 2024-08-28  9:15 UTC (permalink / raw)
  To: Mathias Nyman, Michał Pecio; +Cc: linux-usb



On 8/27/24 2:37 PM, Mathias Nyman wrote:
> It's a bit complex, but it shows the missed service events started from
> out transfers.
>
> Quick background, audio class driver queues URBs. One URB containis one or several
> TDs (transfer descriptors). One TD is consumed each interval, in this case each 125us.
> In this case audio driver uses two URBs, usually containing 7 or 8 TDs, each transferring
> 48 bytes. TD (Transfer descriptor) equals one TRB (Transfer Block) in this particular case.

Thanks for the explanation!

> USB audio driver queues two URBs:
>   185.943950: xhci_urb_enqueue: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/384  8 x 48 bytes 0x10a3f5000 - 0x10a3f5070
>   185.943966: xhci_urb_enqueue: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336  7 x 48 bytes 0x10a3f5080 - 0x10a3f50e0
>
> xHCI gives back URBs once completed. About 1ms apart which is ok.
> Odd that audio driver doesn't queue back the fist URB immediately after receiving it, this could cause missed service event.
>   185.946834: xhci_urb_giveback: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 384/384
>   185.947705: xhci_urb_giveback: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 336/336
>
> Audio driver queues both URBs after it has processed their content.
> very odd that the other URB only contains 1 TD (covering only 125us)
>   185.947732: xhci_urb_enqueue: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/48  1 x 48 bytes (0x10a3f50f0)
>   185.947745: xhci_urb_enqueue: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336 7 x 48 bytes (0x10a3f5100 - 10a3f5160)
>
> Audio driver cancels both URBs, Odd, why does it do that?
>   185.949843: xhci_urb_dequeue: ep1out-isoc: urb 000000007faac5c4 pipe 33792 slot 3 length 0/48    CANCEL why?? (0x10a3f50f0)
>   185.949848: xhci_urb_dequeue: ep1out-isoc: urb 0000000094a36c26 pipe 33792 slot 3 length 0/336   CANCEL why?? (0x10a3f5100 - 10a3f5160) turned to no-ops, deq at af5100

I looked a bit at the snd_usb_audio driver for places where urb unlinking or
killing might happen:

[fps@arch97 usb]$ grep -n usb_kill_urb *.c -R
midi2.c:242:            usb_kill_urb(ep->urbs[i].urb);
midi.c:1496:                            usb_kill_urb(ep->out->urbs[j].urb);
midi.c:1507:                            usb_kill_urb(ep->in->urbs[j]);
midi.c:2364:                            usb_kill_urb(ep->in->urbs[j]);
mixer.c:3647:           usb_kill_urb(mixer->urb);
mixer.c:3649:           usb_kill_urb(mixer->rc_urb);
mixer.c:3658:   usb_kill_urb(mixer->urb);
mixer.c:3659:   usb_kill_urb(mixer->rc_urb);
[fps@arch97 usb]$ grep -n usb_unlink_urb *.c -R
endpoint.c:1053:                                usb_unlink_urb(u);

The only really possibly relevant file here is endpoint.c since we're only
dealing with pcm streams.

endpoint.c:1053 is part of

*
 * Stop active urbs
 *
 * This function moves the EP to STOPPING state if it's being RUNNING.
 */
static int stop_urbs(struct snd_usb_endpoint *ep, bool force, bool keep_pending)

But none of the call sites for this function look as if they would be called in
the middle of an active use of the endpoint. Weird.

[...]

> URBs contain 8TDs with 125us interval, and should complete 1ms apart on average.
> This won't be true for every URB as sometimes urb giveback might be blocked for
> a short while, but it should average out. Next URB would then return a bit earlier.
>
> In this case it doesn't. URB are permanently delayed.
>
> We use "SIA" (Start Isoc ASAP) flag for Isoc transfers, this allows controller
> to queue TD to be transferred "as soon as possible" without forcing it to a
> specific microframe.  Could be that this flag in combination with queueing TDs
> very close to the IST limit causes the delay but not triggering missed service
> events, or under/overruns.
>
> Adding more URBs could help, but causes a bit more latency.
>
> Or maybe see if we can get audio driver to use 3 URBs with 6TDs each?

I will play around with the code a little bit. I'll probably only break things
though :)

Thanks!
FPS

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-08-28  9:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-18 20:56 Misbehaving Alder Lake-N PCH USB 3.2 xHCI Host Controller FPS
2024-08-19 13:38 ` Mathias Nyman
2024-08-19 20:12   ` FPS
2024-08-20 11:01     ` Michał Pecio
2024-08-20 21:04       ` FPS
2024-08-21 13:02         ` Michał Pecio
2024-08-21 13:46           ` fps
2024-08-21 14:15           ` Mathias Nyman
2024-08-21 16:49             ` Michał Pecio
2024-08-23 11:43             ` FPS
2024-08-24 21:22               ` FPS
2024-08-25  4:58                 ` Michał Pecio
2024-08-25  7:46                   ` fps
2024-08-25 15:15                     ` Michał Pecio
2024-08-25 20:38                       ` FPS
2024-08-27 10:30                     ` FPS
2024-08-27 11:31                       ` Mathias Nyman
2024-08-27 12:18                         ` FPS
2024-08-27 13:49                           ` Mathias Nyman
2024-08-27 12:37                 ` Mathias Nyman
2024-08-28  9:15                   ` FPS
2024-08-27 12:38               ` Mathias Nyman
2024-08-27 20:49                 ` FPS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).