public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
* ath11k: WCN6855: possible ring buffer corruption
@ 2024-04-16 15:40 Johan Hovold
  2025-02-28 12:28 ` Johan Hovold
  0 siblings, 1 reply; 5+ messages in thread
From: Johan Hovold @ 2024-04-16 15:40 UTC (permalink / raw)
  To: Kalle Valo, Jeff Johnson; +Cc: ath11k, linux-wireless, linux-kernel

Hi Kalle and Jeff,

Over the past year I've received occasional reports from users of the
Lenovo ThinkPad X13s (aarch64) that the wifi sometimes stops working.
When this happens the kernel log is filled with errors like:

[ 1164.962227] ath11k_warn: 222 callbacks suppressed
[ 1164.962238] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
[ 1164.962309] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
[ 1164.962994] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1476, expected 1484
[ 1164.963405] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1488
[ 1164.963701] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1480, expected 1484
[ 1164.963852] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1468, expected 1480
[ 1164.964491] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
[ 1164.964733] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1488, expected 1492
[ 1165.198329] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1488
[ 1165.198470] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1476
[ 1166.266513] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 2699 at byte 348 (1132 bytes left, 64788 expected)
[ 1166.542803] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 4270 at byte 348 (1128 bytes left, 63772 expected)
[ 1166.768238] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 0 at byte 376 (1112 bytes left, 11730 expected)
[ 1166.900152] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 3 at byte 790 (694 bytes left, 16256 expected)
[ 1168.499073] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 1 at byte 62 (1426 bytes left, 3089 expected)
[ 1168.818086] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 63063 at byte 1466 (10 bytes left, 50467 expected)
[ 1169.032885] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 0 at byte 364 (1120 bytes left, 12483 expected)
[ 1169.308546] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 3092 at byte 348 (1128 bytes left, 64780 expected)
[ 1169.563928] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 1 at byte 348 (1124 bytes left, 44062 expected)

which after a quick look at the driver seems to suggest that we may be
hitting some kind of ring buffer corruption.

Rebinding the driver supposedly sometimes make things work again, but
not always.

The issue has been confirmed with the 6.8 kernel and the latest firmware
WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37.

I've triggered this issue twice myself with 6.6 and .23 firmware, but
the reports date back to at least 6.2 and likely when using even older
firmware.

An unconfirmed hypothesis is that we may be hitting this more often when
enabling the GIC ITS so that the interrupt processing is spread out over
all cores (unlike when using the DWC controller's internal MSI
implementation). This change is now merged for 6.10.

Do you have any immediate theories about what could be causing this?
Does it look like a firmware or driver issue to you, for example? Is it
something you've seen before?

Note that I've previously reported this here:

	https://bugzilla.kernel.org/show_bug.cgi?id=218623

Johan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ath11k: WCN6855: possible ring buffer corruption
  2024-04-16 15:40 ath11k: WCN6855: possible ring buffer corruption Johan Hovold
@ 2025-02-28 12:28 ` Johan Hovold
  2025-02-28 17:07   ` Jeff Johnson
  0 siblings, 1 reply; 5+ messages in thread
From: Johan Hovold @ 2025-02-28 12:28 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: ath11k, linux-wireless, linux-kernel, Kalle Valo

Hi Jeff,

The ath11k ring-buffer corruption issue is hurting some users of the
Lenovo ThinkPad X13s quite bad so I promised to try to escalate this
with you and Qualcomm.

The chance of hitting the bug seems to depend on the AP/network, and it
also seems my hypothesis that enabling the GIC ITS, which increases
parallelism by spreading interrupt handling over all cores, do indeed
make it easier to hit this.

The latter could indicate a driver bug, even this could very well be a
firmware issue.

Have you had a chance to look into this yet? Can you tell from the logs
and reported symptoms whether this is a firmware bug or not?

On Tue, Apr 16, 2024 at 05:40:43PM +0200, Johan Hovold wrote:

> Over the past year I've received occasional reports from users of the
> Lenovo ThinkPad X13s (aarch64) that the wifi sometimes stops working.
> When this happens the kernel log is filled with errors like:
> 
> [ 1164.962227] ath11k_warn: 222 callbacks suppressed
> [ 1164.962238] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
> [ 1164.962309] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
> [ 1164.962994] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1476, expected 1484
> [ 1164.963405] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1488
> [ 1164.963701] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1480, expected 1484
> [ 1164.963852] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1468, expected 1480
> [ 1164.964491] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
> [ 1164.964733] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1488, expected 1492
> [ 1165.198329] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1488
> [ 1165.198470] ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1476
> [ 1166.266513] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 2699 at byte 348 (1132 bytes left, 64788 expected)
> [ 1166.542803] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 4270 at byte 348 (1128 bytes left, 63772 expected)
> [ 1166.768238] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 0 at byte 376 (1112 bytes left, 11730 expected)
> [ 1166.900152] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 3 at byte 790 (694 bytes left, 16256 expected)
> [ 1168.499073] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 1 at byte 62 (1426 bytes left, 3089 expected)
> [ 1168.818086] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 63063 at byte 1466 (10 bytes left, 50467 expected)
> [ 1169.032885] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 0 at byte 364 (1120 bytes left, 12483 expected)
> [ 1169.308546] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 3092 at byte 348 (1128 bytes left, 64780 expected)
> [ 1169.563928] ath11k_pci 0006:01:00.0: wmi tlv parse failure of tag 1 at byte 348 (1124 bytes left, 44062 expected)
> 
> which after a quick look at the driver seems to suggest that we may be
> hitting some kind of ring buffer corruption.
> 
> Rebinding the driver supposedly sometimes make things work again, but
> not always.
> 
> The issue has been confirmed with the 6.8 kernel and the latest firmware
> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37.
> 
> I've triggered this issue twice myself with 6.6 and .23 firmware, but
> the reports date back to at least 6.2 and likely when using even older
> firmware.
> 
> An unconfirmed hypothesis is that we may be hitting this more often when
> enabling the GIC ITS so that the interrupt processing is spread out over
> all cores (unlike when using the DWC controller's internal MSI
> implementation). This change is now merged for 6.10.
> 
> Do you have any immediate theories about what could be causing this?
> Does it look like a firmware or driver issue to you, for example? Is it
> something you've seen before?
> 
> Note that I've previously reported this here:
> 
> 	https://bugzilla.kernel.org/show_bug.cgi?id=218623
 
Johan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ath11k: WCN6855: possible ring buffer corruption
  2025-02-28 12:28 ` Johan Hovold
@ 2025-02-28 17:07   ` Jeff Johnson
  2025-03-03  7:10     ` Johan Hovold
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Johnson @ 2025-02-28 17:07 UTC (permalink / raw)
  To: Johan Hovold, Jeff Johnson
  Cc: ath11k, linux-wireless, linux-kernel, Kalle Valo

On 2/28/2025 4:28 AM, Johan Hovold wrote:
> Hi Jeff,
> 
> The ath11k ring-buffer corruption issue is hurting some users of the
> Lenovo ThinkPad X13s quite bad so I promised to try to escalate this
> with you and Qualcomm.

I've escalated this with the development team.

/jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ath11k: WCN6855: possible ring buffer corruption
  2025-02-28 17:07   ` Jeff Johnson
@ 2025-03-03  7:10     ` Johan Hovold
  2025-03-03 16:01       ` Jeff Johnson
  0 siblings, 1 reply; 5+ messages in thread
From: Johan Hovold @ 2025-03-03  7:10 UTC (permalink / raw)
  To: Jeff Johnson
  Cc: Jeff Johnson, ath11k, linux-wireless, linux-kernel, Kalle Valo

On Fri, Feb 28, 2025 at 09:07:32AM -0800, Jeff Johnson wrote:
> On 2/28/2025 4:28 AM, Johan Hovold wrote:

> > The ath11k ring-buffer corruption issue is hurting some users of the
> > Lenovo ThinkPad X13s quite bad so I promised to try to escalate this
> > with you and Qualcomm.
> 
> I've escalated this with the development team.

Thanks, Jeff. Just let me know if you need any help with testing patches
or firmware updates. We have a couple of users that can reproduce this
very easily and that are also able to test patches.

Johan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ath11k: WCN6855: possible ring buffer corruption
  2025-03-03  7:10     ` Johan Hovold
@ 2025-03-03 16:01       ` Jeff Johnson
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Johnson @ 2025-03-03 16:01 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Jeff Johnson, ath11k, linux-wireless, linux-kernel, Kalle Valo

On 3/2/2025 11:10 PM, Johan Hovold wrote:
> On Fri, Feb 28, 2025 at 09:07:32AM -0800, Jeff Johnson wrote:
>> On 2/28/2025 4:28 AM, Johan Hovold wrote:
> 
>>> The ath11k ring-buffer corruption issue is hurting some users of the
>>> Lenovo ThinkPad X13s quite bad so I promised to try to escalate this
>>> with you and Qualcomm.
>>
>> I've escalated this with the development team.
> 
> Thanks, Jeff. Just let me know if you need any help with testing patches
> or firmware updates. We have a couple of users that can reproduce this
> very easily and that are also able to test patches.

There is a patch under development -- you should see it this week.

/jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-03-03 16:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-16 15:40 ath11k: WCN6855: possible ring buffer corruption Johan Hovold
2025-02-28 12:28 ` Johan Hovold
2025-02-28 17:07   ` Jeff Johnson
2025-03-03  7:10     ` Johan Hovold
2025-03-03 16:01       ` Jeff Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox