public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
From: Johan Hovold <johan@kernel.org>
To: Miaoqing Pan <quic_miaoqing@quicinc.com>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>,
	ath11k@lists.infradead.org, linux-wireless@vger.kernel.org,
	linux-kernel@vger.kernel.org, johan+linaro@kernel.org
Subject: Re: [PATCH v2 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient length
Date: Tue, 18 Mar 2025 18:42:55 +0100	[thread overview]
Message-ID: <Z9mwn3GzpPPZSiTG@hovoldconsulting.com> (raw)
In-Reply-To: <f30cf771-a9cd-4d8f-8d10-1640afd33c23@quicinc.com>

On Tue, Mar 18, 2025 at 03:53:39PM +0800, Miaoqing Pan wrote:
> On 3/17/2025 9:04 PM, Johan Hovold wrote:

> > Then it seems we are looking at two separate root causes for the
> > corruption as the memory barrier appears to be all that is needed to fix
> > the X13s issue.
> > 
> > A user who hit the corruption after 2 h without the fix has been running
> > over the weekend with the memory barrier without any problems. I'll ask
> > further users to test, but it certainly looks like it is working as
> > intended.
> > 
> > And the memory barrier is de-facto missing as the head pointer and
> > descriptor are accessed through (two separate) coherent mappings so
> > there are no ordering guarantees without explicit barriers.
> 
> This situation should occur when there is only one descriptor in the 
> ring. If, as you mentioned, the CPU tries to load the descriptor first, 
> but the descriptor fetch fails before the HP load because the ring 
> returns empty, it won't trigger the current issue.

It could if the CPU observes the updates out of order due to the missing
barrier. The driver could be processing an earlier interrupt when the
new descriptor is added and head pointer updated. If for example the CPU
speculatively fetches the descriptor before the head pointer is updated,
then the descriptor length may be zero when the CPU sees the updated
head pointer.

This seems to be what is happening on the X13s since adding the memory
barrier makes the zero-length descriptors go away.
 
> The Copy Engine hardware module copies the metadata to the Status 
> Descriptor after the DMA is complete, then updates the HP to trigger an 
> interrupt. I think there might be some issues in this process, such as 
> the lack of a wmb instruction after the copy is complete, causing the HP 
> to be updated first.

Yeah, possibly. At least it seems there are more issues than the missing
barrier on the machines you test.
 
> > Now obviously there are further issues in your system, which we should
> > make sure we understand before adding workarounds to the driver.
> > 
> > Do you have a pointer to the downstream kernel sources you are testing
> > with? Or even better, can you reproduce the issue with mainline after
> > adding the PCIe patches that were posted to the lists for these
> > platforms?
> > 
> https://github.com/qualcomm-linux/meta-qcom-hwe/blob/scarthgap/recipes-kernel/linux/linux-qcom-base_6.6.bb

Thanks for the pointer. That's a lot of out-of-tree patches on top of
stable so not that easy to check the state of the resulting tree.

> > Does it make any difference if you use a full rmb() barrier?
> > 
> I've also tried rmb() and mb(), but they didn't work either.

Thanks for checking.

Just to be sure, you did add the barrier in the same place as my patch
(i.e. just before the descriptor read)?

> > And after modifying ath11k_hal_ce_dst_status_get_length() so that it
> > does not clear the length, how many times you need to retry? Does it
> > always work on the second try?
> 
> Yes, the test has been running continuously for over 48 hours, always 
> work on the second try, updated in patch v4.

Good, at least the descriptor-length-sometimes-never-updated issue is
solved.

Johan

  reply	other threads:[~2025-03-18 17:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10  1:02 [PATCH v2 ath-next 0/2] wifi: ath11k: fix HTC rx insufficient length Miaoqing Pan
2025-03-10  1:02 ` [PATCH v2 ath-next 1/2] wifi: ath11k: add function to get next srng desc Miaoqing Pan
2025-03-10  1:02 ` [PATCH v2 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient length Miaoqing Pan
2025-03-10 10:09   ` Johan Hovold
2025-03-11  8:29     ` Miaoqing Pan
2025-03-11 15:20       ` Jeff Johnson
2025-03-12  1:11         ` Miaoqing Pan
2025-03-12 16:43           ` Johan Hovold
2025-03-13  1:41             ` Miaoqing Pan
2025-03-13 15:57               ` Johan Hovold
2025-03-14  0:46                 ` Miaoqing Pan
2025-03-13 13:31             ` Miaoqing Pan
2025-03-13 16:14               ` Johan Hovold
2025-03-14  1:01                 ` Miaoqing Pan
2025-03-14  8:06                   ` Johan Hovold
2025-03-14  8:19                     ` Miaoqing Pan
2025-03-17  5:52                     ` Miaoqing Pan
2025-03-17 13:04                       ` Johan Hovold
2025-03-18  7:53                         ` Miaoqing Pan
2025-03-18 17:42                           ` Johan Hovold [this message]
2025-03-19  6:47                             ` Miaoqing Pan
2025-03-21  9:35                               ` Johan Hovold
2025-03-25  1:04                                 ` Miaoqing Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9mwn3GzpPPZSiTG@hovoldconsulting.com \
    --to=johan@kernel.org \
    --cc=ath11k@lists.infradead.org \
    --cc=jeff.johnson@oss.qualcomm.com \
    --cc=johan+linaro@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=quic_miaoqing@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox