public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Remi Pommarel <repk@triplefau.lt>
To: Johan Hovold <johan@kernel.org>
Cc: Johan Hovold <johan+linaro@kernel.org>,
	Jeff Johnson <jjohnson@kernel.org>,
	Miaoqing Pan <quic_miaoqing@quicinc.com>,
	Steev Klimaszewski <steev@kali.org>,
	Clayton Craft <clayton@craftyguy.net>,
	Jens Glathe <jens.glathe@oldschoolsolutions.biz>,
	Nicolas Escande <nico.escande@gmail.com>,
	ath12k@lists.infradead.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH] wifi: ath12k: fix ring-buffer corruption
Date: Mon, 26 May 2025 14:58:51 +0200	[thread overview]
Message-ID: <aDRli6uIbnuQK3nN@pilgrim> (raw)
In-Reply-To: <aDRR5oYBU0Z-DaWr@hovoldconsulting.com>

On Mon, May 26, 2025 at 01:35:02PM +0200, Johan Hovold wrote:
> On Thu, May 22, 2025 at 05:11:21PM +0200, Remi Pommarel wrote:
> > On Fri, Mar 21, 2025 at 10:52:19AM +0100, Johan Hovold wrote:
> > > Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
> > > breaks and the log fills up with errors like:
> > > 
> > >     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
> > >     ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
> > > 
> > > which based on a quick look at the ath11k driver seemed to indicate some
> > > kind of ring-buffer corruption.
> > > 
> > > Miaoqing Pan tracked it down to the host seeing the updated destination
> > > ring head pointer before the updated descriptor, and the error handling
> > > for that in turn leaves the ring buffer in an inconsistent state.
> > > 
> > > While this has not yet been observed with ath12k, the ring-buffer
> > > implementation is very similar to the ath11k one and it suffers from the
> > > same bugs.
> 
> > > Note that the READ_ONCE() are only needed to avoid compiler mischief in
> > > case the ring-buffer helpers are ever inlined.
> 
> > > @@ -343,11 +343,10 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe,
> > >  		goto err;
> > >  	}
> > >  
> > > +	/* Make sure descriptor is read after the head pointer. */
> > > +	dma_rmb();
> > > +
> > 
> > That does not seem to be the only place descriptor is read just after
> > the head pointer, ath12k_dp_rx_process{,err,reo_status,wbm_err} seem to
> > also suffer the same sickness.
> 
> Indeed, I only started with the corruption issues that users were
> reporting (with ath11k) and was gonna follow up with further fixes once
> the initial ones were merged (and when I could find more time).
> 
> > Why not move the dma_rmb() in ath12k_hal_srng_access_begin() as below,
> > that would look to me as a good place to do it.
> > 
> > @@ -2133,6 +2133,9 @@ void ath12k_hal_srng_access_begin(struct
> > ath12k_base *ab, struct hal_srng *srng)
> >                         *(volatile u32 *)srng->u.src_ring.tp_addr;
> >         else
> >                 srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> > +
> > +       /* Make sure descriptors are read after the head pointer. */
> > +       dma_rmb();
> >  }
> > 
> > This should ensure the issue does not happen anywhere not just for
> > ath12k_ce_recv_process_cb().
> 
> We only need the read barrier for dest rings so the barrier would go in
> the else branch, but I prefer keeping it in the caller so that it is
> more obvious when it is needed and so that we can skip the barrier when
> the ring is empty (e.g. as done above).

Thanks for taking time to clarify this.

Yes I messed up doing the patch by hand sorry, internally I test with
the dma_rmb() in the else part. I tend to prefer having it in
ath12k_hal_srng_access_begin() as caller does not have to take care of
the barrier itself. Which for me seems a little bit risky if further
refactoring (or adding other ring processing) is done in the future;
the barrier could easily be forgotten don't you think ?

> 
> I've gone through and reviewed the remaining call sites now and will
> send a follow-on fix for them.
> 
> > Note that ath12k_hal_srng_dst_get_next_entry() does not need a barrier
> > as it uses cached_hp from ath12k_hal_srng_access_begin().
> 
> Yeah, it's only needed before accessing the descriptor fields.
> 
> > > @@ -1962,7 +1962,7 @@ u32 ath12k_hal_ce_dst_status_get_length(struct hal_ce_srng_dst_status_desc *desc
> > >  {
> > >  	u32 len;
> > >  
> > > -	len = le32_get_bits(desc->flags, HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> > > +	len = le32_get_bits(READ_ONCE(desc->flags), HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> > >  	desc->flags &= ~cpu_to_le32(HAL_CE_DST_STATUS_DESC_FLAGS_LEN);
> > >  
> > >  	return len;
> > > @@ -2132,7 +2132,7 @@ void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng)
> > >  		srng->u.src_ring.cached_tp =
> > >  			*(volatile u32 *)srng->u.src_ring.tp_addr;
> > >  	else
> > > -		srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
> > > +		srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
> > 
> > dma_rmb() acting also as a compiler barrier why the need for both
> > READ_ONCE() ?
> 
> Yeah, I was being overly cautious here and it should be fine with plain
> accesses when reading the descriptor after the barrier, but the memory
> model seems to require READ_ONCE() when fetching the head pointer.
> Currently, hp_addr is marked as volatile so READ_ONCE() could be
> dropped for that reason, but I'd rather keep it here explicitly (e.g. in
> case someone decides to drop the volatile).

Yes actually after more thinking, the READ_ONCE for fetching hp does make
sense and is also in the patch I am currently testing.

Also for source rings don't we need a dma_wmb()/WRITE_ONCE before
modifying the tail pointer (see ath12k_hal_srng_access_end()) for quite
the same reason (updates of the descriptor have to be visible before
write to tail pointer) ?

Thanks

-- 
Remi

  reply	other threads:[~2025-05-26 13:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-21  9:52 [PATCH] wifi: ath12k: fix ring-buffer corruption Johan Hovold
2025-05-07  0:17 ` Miaoqing Pan
2025-05-19 17:48 ` Jeff Johnson
2025-05-22 15:11 ` Remi Pommarel
2025-05-26 11:35   ` Johan Hovold
2025-05-26 12:58     ` Remi Pommarel [this message]
2025-05-28 15:20       ` Johan Hovold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDRli6uIbnuQK3nN@pilgrim \
    --to=repk@triplefau.lt \
    --cc=ath12k@lists.infradead.org \
    --cc=clayton@craftyguy.net \
    --cc=jens.glathe@oldschoolsolutions.biz \
    --cc=jjohnson@kernel.org \
    --cc=johan+linaro@kernel.org \
    --cc=johan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nico.escande@gmail.com \
    --cc=quic_miaoqing@quicinc.com \
    --cc=stable@vger.kernel.org \
    --cc=steev@kali.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox