public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
From: Johan Hovold <johan@kernel.org>
To: Miaoqing Pan <quic_miaoqing@quicinc.com>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>,
	ath11k@lists.infradead.org, linux-wireless@vger.kernel.org,
	linux-kernel@vger.kernel.org, johan+linaro@kernel.org
Subject: Re: [PATCH v2 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient length
Date: Mon, 17 Mar 2025 14:04:52 +0100	[thread overview]
Message-ID: <Z9gd9Aw5Bug8IKSV@hovoldconsulting.com> (raw)
In-Reply-To: <ecfe850c-b263-4bee-b888-c34178e690fc@quicinc.com>

On Mon, Mar 17, 2025 at 01:52:15PM +0800, Miaoqing Pan wrote:
> On 3/14/2025 4:06 PM, Johan Hovold wrote:
> > On Fri, Mar 14, 2025 at 09:01:36AM +0800, Miaoqing Pan wrote:
 
> >> I think the hardware has already ensured synchronization between
> >> descriptor and head pointer, which isn't difficult to achieve. The issue
> >> is likely caused by something else and requires further debugging.
> > 
> > Yeah, but you still need memory barriers on the kernel side.
> > 
> > It could be that we are looking at two different causes for those
> > zero-length descriptors.
> > 
> > The error handling for that obviously needs to be fixed either way, but
> > I haven't heard anyone hitting the corruption with the memory barriers
> > in place on the X13s yet (even if we'd need some more time to test
> > this).

> After multiple and prolonged verifications, adding dma_rmb() did not 
> improve the issue at all. I think this Status Descriptor is updated by 
> hardware (Copy Engine) controlled by another system, not involving DMA 
> or out-of-order CPU access within the same system, so memory barriers do 
> not take effect.

Then it seems we are looking at two separate root causes for the
corruption as the memory barrier appears to be all that is needed to fix
the X13s issue.

A user who hit the corruption after 2 h without the fix has been running
over the weekend with the memory barrier without any problems. I'll ask
further users to test, but it certainly looks like it is working as
intended.

And the memory barrier is de-facto missing as the head pointer and
descriptor are accessed through (two separate) coherent mappings so
there are no ordering guarantees without explicit barriers.

Now obviously there are further issues in your system, which we should
make sure we understand before adding workarounds to the driver.

Do you have a pointer to the downstream kernel sources you are testing
with? Or even better, can you reproduce the issue with mainline after
adding the PCIe patches that were posted to the lists for these
platforms?

Apparently the descriptors can also be passed in non-coherent memory
for some devices (.alloc_cacheable_memory / HAL_SRNG_FLAGS_CACHED). That
implementation looks suspicious and could possibly result in similar
problems. Are you using .alloc_cacheable_memory in your setup?

Does it make any difference if you use a full rmb() barrier?

And after modifying ath11k_hal_ce_dst_status_get_length() so that it
does not clear the length, how many times you need to retry? Does it
always work on the second try?

Johan

  reply	other threads:[~2025-03-17 13:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10  1:02 [PATCH v2 ath-next 0/2] wifi: ath11k: fix HTC rx insufficient length Miaoqing Pan
2025-03-10  1:02 ` [PATCH v2 ath-next 1/2] wifi: ath11k: add function to get next srng desc Miaoqing Pan
2025-03-10  1:02 ` [PATCH v2 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient length Miaoqing Pan
2025-03-10 10:09   ` Johan Hovold
2025-03-11  8:29     ` Miaoqing Pan
2025-03-11 15:20       ` Jeff Johnson
2025-03-12  1:11         ` Miaoqing Pan
2025-03-12 16:43           ` Johan Hovold
2025-03-13  1:41             ` Miaoqing Pan
2025-03-13 15:57               ` Johan Hovold
2025-03-14  0:46                 ` Miaoqing Pan
2025-03-13 13:31             ` Miaoqing Pan
2025-03-13 16:14               ` Johan Hovold
2025-03-14  1:01                 ` Miaoqing Pan
2025-03-14  8:06                   ` Johan Hovold
2025-03-14  8:19                     ` Miaoqing Pan
2025-03-17  5:52                     ` Miaoqing Pan
2025-03-17 13:04                       ` Johan Hovold [this message]
2025-03-18  7:53                         ` Miaoqing Pan
2025-03-18 17:42                           ` Johan Hovold
2025-03-19  6:47                             ` Miaoqing Pan
2025-03-21  9:35                               ` Johan Hovold
2025-03-25  1:04                                 ` Miaoqing Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9gd9Aw5Bug8IKSV@hovoldconsulting.com \
    --to=johan@kernel.org \
    --cc=ath11k@lists.infradead.org \
    --cc=jeff.johnson@oss.qualcomm.com \
    --cc=johan+linaro@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=quic_miaoqing@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox