From: Paolo Abeni <pabeni@redhat.com>
To: Ding Hui <dinghui1111@163.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Maxime Coquelin <mcoquelin.stm32@gmail.com>,
Alexandre Torgue <alexandre.torgue@foss.st.com>,
"Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>,
Maxime Chevallier <maxime.chevallier@bootlin.com>,
Ding Hui <dinghui@lixiang.com>,
"open list:STMMAC ETHERNET DRIVER" <netdev@vger.kernel.org>,
"moderated list:ARM/STM32 ARCHITECTURE"
<linux-stm32@st-md-mailman.stormreply.com>,
"moderated list:ARM/STM32 ARCHITECTURE"
<linux-arm-kernel@lists.infradead.org>,
open list <linux-kernel@vger.kernel.org>
Cc: xiasanbo@lixiang.com, yangchen11@lixiang.com, liuxuanjun@lixiang.com
Subject: Re: [PATCH v4] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
Date: Tue, 30 Jun 2026 12:55:09 +0200 [thread overview]
Message-ID: <b42c9095-aa2a-4e2f-b65e-11adbd2dbec6@redhat.com> (raw)
In-Reply-To: <20260627122533.1165324-1-dinghui1111@163.com>
On 6/27/26 2:25 PM, Ding Hui wrote:
> From: Ding Hui <dinghui@lixiang.com>
>
> On suspend, stmmac_suspend() calls stmmac_disable_all_queues() which
> stops the RX NAPI, but the RX DMA engine may still be running for a
> short window before stmmac_stop_all_dma() takes effect. During that
> window the hardware can write incoming frames into the buffers pointed
> to by the RX descriptors and write back the descriptors (clearing the
> OWN bit and overwriting RDES0/1/2 with status/timestamp data). Because
> NAPI is already disabled, the driver never refills these descriptors,
> so the RX ring is left in a "consumed but not refilled" state with
> stale content in the descriptor buffer-address fields.
>
> On resume, stmmac_clear_descriptors() only re-arms the OWN bit and
> does not repopulate the RX buffer address fields. When the DMA is
> restarted it dereferences these stale addresses and triggers a fatal
> bus error (not kernel panic, just a Fatal Bus Error interrupt and
> RX DMA engine halts).
>
> Fix this by introducing stmmac_reinit_rx_descriptors(), called from
> stmmac_resume() immediately after stmmac_clear_descriptors(). The
> helper iterates every RX descriptor slot and re-programs its buffer
> address fields:
>
> - For normal (page_pool) queues: restore RDES0/1 from buf->addr and
> RDES2 from buf->sec_addr. The DMA mapping has remained valid across
> suspend/resume because no pages were freed. Slots left NULL by a
> prior GFP_ATOMIC failure in stmmac_rx_refill() before suspend
> are re-allocated here with GFP_KERNEL;
> -ENOMEM is returned and resume is aborted if allocation fails.
> The slots with null buffer are unacceptable, because they will
> cause a DMA suspend dead lock problem by the condition of
> Current Descriptor Pointer == Descriptor Tail Pointer.
>
> - For AF_XDP zero-copy queues: restore the DMA address from
> xsk_buff_xdp_get_dma(buf->xdp). Slots with no xdp buffer
> (e.g. TX-only socket, empty fill ring) attempt xsk_buff_alloc()
> first; on failure the descriptor is zeroed so the DMA engine skips
> the slot safely via an RBU event.
>
> - For chain mode: call stmmac_mode_init() to rebuild the des3 next-
> descriptor pointer chain, which hardware may have overwritten with
> a PTP timestamp value (as noted in chain_mode.c:refill_desc3()).
>
> After reprogramming all address fields, a final pass restores OWN=1
> on every valid slot. This is necessary because set_sec_addr and
> chain-mode init unconditionally overwrite des3 (clearing the OWN bit
> set by stmmac_clear_descriptors()), and must run after all address
> writes are complete.
>
> Also fix stmmac_init_rx_buffers() to actually use its gfp_t flags
> parameter instead of the hardcoded GFP_ATOMIC | __GFP_NOWARN.
>
> Signed-off-by: Ding Hui <dinghui@lixiang.com>
This looks like 'net' material, it should specify 'net' into the subj
prefix and include a suitable Fixes tag.
> ---
> Changes in v4:
> - Just add description for return value of 'stmmac_reinit_rx_descriptors'.
> - Link to v3:
> https://lore.kernel.org/netdev/20260604144557.3175399-1-dinghui1111@163.com/
>
> Changes in v3:
> - Re-allocate page_pool NULL slots (from prior GFP_ATOMIC failures)
> with GFP_KERNEL in stmmac_reinit_rx_descriptors(); return -ENOMEM and
> abort resume.
> - For XSK NULL slots, attempt xsk_buff_alloc() first; fall back to
> stmmac_clear_desc() only when allocation fails.
> - Add a re-arm loop at the end of stmmac_reinit_rx_descriptors() to
> restore OWN=1 on all valid slots, since set_sec_addr and
> chain-mode init both write des3 unconditionally.
> - stmmac_reinit_rx_descriptors() now returns int; stmmac_resume()
> checks the return value and propagates -ENOMEM with mutex/rtnl cleanup.
> - Fix stmmac_init_rx_buffers() to use its flags parameter instead of
> hardcoded GFP_ATOMIC | __GFP_NOWARN.
> (884d2b845477 ("net: stmmac: Add GFP_DMA32 for rx buffers if no 64
> capability"))
> - Run stmmac_reinit_rx_descriptors() after stmmac_clear_descriptors()
> so that stmmac_clear_desc() on XSK NULL slots overrides the OWN
> bit set by stmmac_clear_descriptors().
> - Update commit message.
> - Link to v2:
> https://lore.kernel.org/netdev/20260526022620.501229-1-dinghui1111@163.com/
>
> Changes in v2:
> - Introducing stmmac_reinit_rx_descriptors() to reinitializing rx
> buffers without any allocation.
> - Modify commit log.
> - Link to v1:
> https://lore.kernel.org/netdev/20260515053856.2310369-1-dinghui1111@163.com/
> ---
> .../net/ethernet/stmicro/stmmac/stmmac_main.c | 164 +++++++++++++++++-
> 1 file changed, 163 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 3591755ea30b..c82f3d5dbd43 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1660,7 +1660,7 @@ static int stmmac_init_rx_buffers(struct stmmac_priv *priv,
> {
> struct stmmac_rx_queue *rx_q = &dma_conf->rx_queue[queue];
> struct stmmac_rx_buffer *buf = &rx_q->buf_pool[i];
> - gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
> + gfp_t gfp = flags;
The above should go via a separate (net) patch.
>
> if (priv->dma_cap.host_dma_width <= 32)
> gfp |= GFP_DMA32;
> @@ -1693,6 +1693,148 @@ static int stmmac_init_rx_buffers(struct stmmac_priv *priv,
> return 0;
> }
>
> +/**
> + * stmmac_reinit_rx_descriptors - re-program RX descriptor buffer addresses
> + * after stmmac_clear_descriptors()
> + * @priv: driver private structure
> + * @dma_conf: structure holding the dma data
> + * @queue: RX queue index
> + *
> + * Description: Called in the resume path after stmmac_clear_descriptors()
> + * has re-armed the OWN bit on every descriptor. Walk buf_pool[] and
> + * re-program the buffer-address fields of every RX descriptor from the
> + * buffers that are already attached to the queue. Slots whose page was
> + * never allocated (GFP_ATOMIC failure before suspend) are re-allocated
> + * here with GFP_KERNEL; the resume path is in process context.
> + *
> + * Between suspend and resume the hardware may have written back status/
> + * length information into the descriptor address fields (RDESx are reused
> + * for status on completion for GMAC4/XGMAC), so the address fields must be
> + * repopulated before the DMA is restarted.
> + *
> + * For XSK slots that have no xdp buffer at suspend time (TX-only socket,
> + * empty fill ring for Rx), xsk_buff_alloc() is attempted but does not
> + * return an error on failure because we can't identify a real TX-only
> + * socket from an alloc error (same as stmmac_alloc_rx_buffers_zc() in
> + * __init_dma_rx_desc_rings); on failure the descriptor is zeroed so the DMA
> + * engine skips the slot safely.
> + *
> + * To avoid the DMA stall after resume in non-XSK mode, this function
> + * re-allocates pages for NULL slots using GFP_KERNEL (the resume path runs
> + * in process context). If allocation fails, -%ENOMEM is returned immediately
> + * and the resume is aborted; the caller should report the error.
> + *
> + * This helper must be called after stmmac_clear_descriptors() and before
> + * stmmac_hw_setup() in stmmac_resume() because we need to wipe the OWN bit
> + * set in stmmac_clear_descriptors() for NULL slots in XSK mode.
Please try to condense the above text in one or 2 paragraph.
> + *
> + * Returns: 0 on success, or a negative errno on allocation failure in
> + * non-XSK mode (e.g. -%ENOMEM).
> + */
> +static int stmmac_reinit_rx_descriptors(struct stmmac_priv *priv,
> + struct stmmac_dma_conf *dma_conf,
> + u32 queue)
> +{
> + struct stmmac_rx_queue *rx_q = &dma_conf->rx_queue[queue];
> + struct stmmac_rx_buffer *buf;
> + struct dma_desc *p;
> + int i;
> +
> + if (rx_q->xsk_pool) {
> + for (i = 0; i < dma_conf->dma_rx_size; i++) {
> + buf = &rx_q->buf_pool[i];
> + p = stmmac_get_rx_desc(priv, rx_q, i);
> +
> + /* The XSK pool may not be fully populated (e.g.
> + * xdpsock TX-only, empty fill ring). Try to refill
> + * from the pool; on failure zero the descriptor so the
> + * DMA engine skips this slot safely.
> + */
> + if (!buf->xdp) {
> + buf->xdp = xsk_buff_alloc(rx_q->xsk_pool);
> + if (!buf->xdp) {
> + stmmac_clear_desc(priv, p);
> + continue;
> + }
> + }
> +
> + stmmac_set_desc_addr(priv, p,
> + xsk_buff_xdp_get_dma(buf->xdp));
> + stmmac_set_desc_sec_addr(priv, p, 0, false);
> + }
> + } else {
> + for (i = 0; i < dma_conf->dma_rx_size; i++) {
> + buf = &rx_q->buf_pool[i];
> + p = stmmac_get_rx_desc(priv, rx_q, i);
> +
> + /* buf->page can be NULL when stmmac_rx_refill() hit a
> + * GFP_ATOMIC failure before suspend and left the slot
> + * without a buffer. The resume path runs in process
> + * context, so re-allocate with GFP_KERNEL. Allocation
> + * failure aborts the resume.
> + */
> + if (!buf->page) {
> + int err;
> +
> + err = stmmac_init_rx_buffers(priv, dma_conf, p,
> + i, GFP_KERNEL,
> + queue);
> + if (err)
> + return err;
> + /* stmmac_init_rx_buffers() already programmed
> + * the descriptor; skip the reprogramming below.
> + */
> + continue;
> + }
> +
> + stmmac_set_desc_addr(priv, p, buf->addr);
> + stmmac_set_desc_sec_addr(priv, p, buf->sec_addr,
> + priv->sph_active &&
> + buf->sec_page);
AFAICS stmmac_rx_refill() can error out after successfully allocating
buf->page, but leaving a NULL sec_page, I think you should try to
realloc even the latter.
Finally this chunk shares quite a bit of code with stmmac_rx_refill()
and stmmac_rx_refill_zc() it would be better try to factor out common
helpers.
/P
prev parent reply other threads:[~2026-06-30 10:55 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-27 12:25 [PATCH v4] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers Ding Hui
2026-06-30 10:55 ` Paolo Abeni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b42c9095-aa2a-4e2f-b65e-11adbd2dbec6@redhat.com \
--to=pabeni@redhat.com \
--cc=alexandre.torgue@foss.st.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dinghui1111@163.com \
--cc=dinghui@lixiang.com \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-stm32@st-md-mailman.stormreply.com \
--cc=liuxuanjun@lixiang.com \
--cc=maxime.chevallier@bootlin.com \
--cc=mcoquelin.stm32@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=rmk+kernel@armlinux.org.uk \
--cc=xiasanbo@lixiang.com \
--cc=yangchen11@lixiang.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox