From: Johan Hovold <johan@kernel.org>
To: Miaoqing Pan <quic_miaoqing@quicinc.com>
Cc: quic_jjohnson@quicinc.com, ath11k@lists.infradead.org,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
johan+linaro@kernel.org
Subject: Re: [PATCH v2 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient length
Date: Mon, 10 Mar 2025 11:09:52 +0100 [thread overview]
Message-ID: <Z866cCj8SWyZjCoP@hovoldconsulting.com> (raw)
In-Reply-To: <20250310010217.3845141-3-quic_miaoqing@quicinc.com>
On Mon, Mar 10, 2025 at 09:02:17AM +0800, Miaoqing Pan wrote:
> A relatively unusual race condition occurs between host software
> and hardware, where the host sees the updated destination ring head
> pointer before the hardware updates the corresponding descriptor.
> When this situation occurs, the length of the descriptor returns 0.
I still think this description is too vague and it doesn't explain how
this race is even possible. It sounds like there's a bug somewhere in
the driver or firmware, but if this really is an indication the hardware
is broken as your reply here seems to suggest:
https://lore.kernel.org/lkml/bc187777-588c-4fa0-ba8c-847e91c78d43@quicinc.com/
then that too should be highlighted in the commit message (e.g. by
describing this as "working around broken hardware").
> The current error handling method is to increment descriptor tail
> pointer by 1, but 'sw_index' is not updated, causing descriptor and
> skb to not correspond one-to-one, resulting in the following error:
>
> ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1488, expected 1492
> ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
>
> To address this problem, temporarily skip processing the current
> descriptor and handle it again next time. However, to prevent this
> descriptor from continuously returning 0, use skb cb to set a flag.
> If the length returns 0 again, this descriptor will be discarded.
The ath12k ring-buffer handling looks very similar. Do you need a
corresponding workaround in ath12k_ce_completed_recv_next()? Or are you
sure that this (hardware) bug only affects ath11k devices?
> *nbytes = ath11k_hal_ce_dst_status_get_length(desc);
> - if (*nbytes == 0) {
> - ret = -EIO;
> - goto err;
> + if (unlikely(*nbytes == 0)) {
> + struct ath11k_skb_rxcb *rxcb =
> + ATH11K_SKB_RXCB(pipe->dest_ring->skb[sw_index]);
> +
> + /* A relatively unusual race condition occurs between host
> + * software and hardware, where the host sees the updated
> + * destination ring head pointer before the hardware updates
> + * the corresponding descriptor.
> + *
> + * Temporarily skip processing the current descriptor and handle
> + * it again next time. However, to prevent this descriptor from
> + * continuously returning 0, set 'is_desc_len0' flag. If the
> + * length returns 0 again, this descriptor will be discarded.
> + */
> + if (!rxcb->is_desc_len0) {
> + rxcb->is_desc_len0 = true;
> + ret = -EIO;
> + goto err;
> + }
> }
I'm still waiting for feedback from one user that can reproduce the
ring-buffer corruption very easily, but another user mentioned seeing
multiple zero-length descriptor warnings over the weekend when running
with this patch:
ath11k_pci 0006:01:00.0: rxed invalid length (nbytes 0, max 2048)
Are there ever any valid reasons for seeing a zero-length descriptor
(i.e. unrelated to the race at hand)? IIUC the warning would only be
printed when processing such descriptors a second time (i.e. when
is_desc_len0 is set).
Johan
next prev parent reply other threads:[~2025-03-10 10:19 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-10 1:02 [PATCH v2 ath-next 0/2] wifi: ath11k: fix HTC rx insufficient length Miaoqing Pan
2025-03-10 1:02 ` [PATCH v2 ath-next 1/2] wifi: ath11k: add function to get next srng desc Miaoqing Pan
2025-03-10 1:02 ` [PATCH v2 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient length Miaoqing Pan
2025-03-10 10:09 ` Johan Hovold [this message]
2025-03-11 8:29 ` Miaoqing Pan
2025-03-11 15:20 ` Jeff Johnson
2025-03-12 1:11 ` Miaoqing Pan
2025-03-12 16:43 ` Johan Hovold
2025-03-13 1:41 ` Miaoqing Pan
2025-03-13 15:57 ` Johan Hovold
2025-03-14 0:46 ` Miaoqing Pan
2025-03-13 13:31 ` Miaoqing Pan
2025-03-13 16:14 ` Johan Hovold
2025-03-14 1:01 ` Miaoqing Pan
2025-03-14 8:06 ` Johan Hovold
2025-03-14 8:19 ` Miaoqing Pan
2025-03-17 5:52 ` Miaoqing Pan
2025-03-17 13:04 ` Johan Hovold
2025-03-18 7:53 ` Miaoqing Pan
2025-03-18 17:42 ` Johan Hovold
2025-03-19 6:47 ` Miaoqing Pan
2025-03-21 9:35 ` Johan Hovold
2025-03-25 1:04 ` Miaoqing Pan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z866cCj8SWyZjCoP@hovoldconsulting.com \
--to=johan@kernel.org \
--cc=ath11k@lists.infradead.org \
--cc=johan+linaro@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=quic_jjohnson@quicinc.com \
--cc=quic_miaoqing@quicinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.