[PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
@ 2026-04-15  2:39 Sam Edwards
  2026-04-15 12:56 ` Russell King (Oracle)
  0 siblings, 1 reply; 5+ messages in thread
From: Sam Edwards @ 2026-04-15  2:39 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Maxime Coquelin, Alexandre Torgue, Russell King (Oracle),
	Maxime Chevallier, Ovidiu Panait, Vladimir Oltean, Baruch Siach,
	Serge Semin, Giuseppe Cavallaro, netdev, linux-stm32,
	linux-arm-kernel, linux-kernel, Sam Edwards, stable

The CPU receives frames from the MAC through conventional DMA: the CPU
allocates buffers for the MAC, then the MAC fills them and returns
ownership to the CPU. For each hardware RX queue, the CPU and MAC
coordinate through a shared ring array of DMA descriptors: one
descriptor per DMA buffer. Each descriptor includes the buffer's
physical address and a status flag ("OWN") indicating which side owns
the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set
the flag and the MAC is only allowed to clear it, and both must move
through the ring in sequence: thus the ring is used for both
"submissions" and "completions."

In the stmmac driver, stmmac_rx() bookmarks its position in the ring
with the `cur_rx` index. The main receive loop in that function checks
for rx_descs[cur_rx].own=0, gives the corresponding buffer to the
network stack (NULLing the pointer), and increments `cur_rx` modulo the
ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its
position with `dirty_rx`, allocates fresh buffers and rearms the
descriptors (setting OWN=1). If it fails any allocation, it simply stops
early (leaving OWN=0) and will retry where it left off when next called.

This means descriptors have a three-stage lifecycle (terms my own):
- `empty` (OWN=1, buffer valid)
- `full` (OWN=0, buffer valid and populated)
- `dirty` (OWN=0, buffer NULL)

But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In
the past (see 'Fixes:'), there was a bug where the loop could cycle
`cur_rx` all the way back to the first descriptor it dirtied, resulting
in a NULL dereference when mistaken for `full`. The aforementioned
commit resolved that *specific* failure by capping the loop's iteration
limit at `dma_rx_size - 1`, but this is only a partial fix: if the
previous stmmac_rx_refill() didn't complete, then there are leftover
`dirty` descriptors that the loop might encounter without needing to
cycle fully around. The current code therefore panics (see 'Closes:')
when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to
catch up to `dirty_rx`.

Fix this by further tightening the clamp from `dma_rx_size - 1` to
`dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty
entries and limiting the loop so that `cur_rx` cannot catch back up to
`dirty_rx`. This carries no risk of arithmetic underflow: since the
maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`,
the worst the clamp can do is prevent the loop from running at all.

Fixes: b6cb4541853c7 ("net: stmmac: avoid rx queue overrun")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221010
Cc: stable@vger.kernel.org
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
---

Hi list,

This is a single patch broken out of [1]. The second patch in that series,
which proactively refills the RX ring buffer when memory is low, still has some
unresolved feedback: it should use a timer to avoid nuisance polling while the
system is suffering OOM.

Further discussion makes me wonder whether that second patch should even be
threshold-triggered at all, or if it should be a handler for the RBU
("Receive Buffer Unavailable") interrupt instead.

So, while that patch is back at the drawing board, I am submitting this one
(which is higher-priority as it resolves a *panic*) separately.

Regards,
Sam

[1] https://lore.kernel.org/all/20260401041929.12392-1-CFSworks@gmail.com/

---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 13d3cac056be..fc11f75f7dc0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5609,7 +5609,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)

 	dma_dir = page_pool_get_dma_dir(rx_q->page_pool);
 	bufsz = DIV_ROUND_UP(priv->dma_conf.dma_buf_sz, PAGE_SIZE) * PAGE_SIZE;
-	limit = min(priv->dma_conf.dma_rx_size - 1, (unsigned int)limit);
+	limit = min(priv->dma_conf.dma_rx_size - stmmac_rx_dirty(priv, queue) - 1,
+		    (unsigned int)limit);

 	if (netif_msg_rx_status(priv)) {
 		void *rx_head;
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
  2026-04-15  2:39 [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted Sam Edwards
@ 2026-04-15 12:56 ` Russell King (Oracle)
  2026-04-15 16:28   ` Russell King (Oracle)
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King (Oracle) @ 2026-04-15 12:56 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable

On Tue, Apr 14, 2026 at 07:39:47PM -0700, Sam Edwards wrote:
> The CPU receives frames from the MAC through conventional DMA: the CPU
> allocates buffers for the MAC, then the MAC fills them and returns
> ownership to the CPU. For each hardware RX queue, the CPU and MAC
> coordinate through a shared ring array of DMA descriptors: one
> descriptor per DMA buffer. Each descriptor includes the buffer's
> physical address and a status flag ("OWN") indicating which side owns
> the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set
> the flag and the MAC is only allowed to clear it, and both must move
> through the ring in sequence: thus the ring is used for both
> "submissions" and "completions."
> 
> In the stmmac driver, stmmac_rx() bookmarks its position in the ring
> with the `cur_rx` index. The main receive loop in that function checks
> for rx_descs[cur_rx].own=0, gives the corresponding buffer to the
> network stack (NULLing the pointer), and increments `cur_rx` modulo the
> ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its
> position with `dirty_rx`, allocates fresh buffers and rearms the
> descriptors (setting OWN=1). If it fails any allocation, it simply stops
> early (leaving OWN=0) and will retry where it left off when next called.
> 
> This means descriptors have a three-stage lifecycle (terms my own):
> - `empty` (OWN=1, buffer valid)
> - `full` (OWN=0, buffer valid and populated)
> - `dirty` (OWN=0, buffer NULL)
> 
> But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In
> the past (see 'Fixes:'), there was a bug where the loop could cycle
> `cur_rx` all the way back to the first descriptor it dirtied, resulting
> in a NULL dereference when mistaken for `full`. The aforementioned
> commit resolved that *specific* failure by capping the loop's iteration
> limit at `dma_rx_size - 1`, but this is only a partial fix: if the
> previous stmmac_rx_refill() didn't complete, then there are leftover
> `dirty` descriptors that the loop might encounter without needing to
> cycle fully around. The current code therefore panics (see 'Closes:')
> when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to
> catch up to `dirty_rx`.
> 
> Fix this by further tightening the clamp from `dma_rx_size - 1` to
> `dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty
> entries and limiting the loop so that `cur_rx` cannot catch back up to
> `dirty_rx`. This carries no risk of arithmetic underflow: since the
> maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`,
> the worst the clamp can do is prevent the loop from running at all.
> 
> Fixes: b6cb4541853c7 ("net: stmmac: avoid rx queue overrun")
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221010
> Cc: stable@vger.kernel.org
> Signed-off-by: Sam Edwards <CFSworks@gmail.com>

Locally, while debugging my issues, I used this to prevent cur_rx
catching up with dirty_rx:

                status = stmmac_rx_status(priv, &priv->xstats, p);
                /* check if managed by the DMA otherwise go ahead */
                if (unlikely(status & dma_own))
                        break;

                next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
                                               priv->dma_conf.dma_rx_size);
                if (unlikely(next_entry == rx_q->dirty_rx))
                        break;

                rx_q->cur_rx = next_entry;

If we care about the cost of reloading rx_q->dirty_rx on every
iteration, then I'd suggest that the cost we already incur reading and
writing rx_q->cur_rx is something that should be addressed, and
eliminating that would counter the cost of reading rx_q->dirty_rx. I
suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
likely in the same cache line.

It looks like any fix to stmmac_rx() will also need a corresponding
fix for stmmac_rx_zc().

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
  2026-04-15 12:56 ` Russell King (Oracle)
@ 2026-04-15 16:28   ` Russell King (Oracle)
  2026-04-15 17:53     ` Sam Edwards
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King (Oracle) @ 2026-04-15 16:28 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable

On Wed, Apr 15, 2026 at 01:56:32PM +0100, Russell King (Oracle) wrote:
> On Tue, Apr 14, 2026 at 07:39:47PM -0700, Sam Edwards wrote:
> > The CPU receives frames from the MAC through conventional DMA: the CPU
> > allocates buffers for the MAC, then the MAC fills them and returns
> > ownership to the CPU. For each hardware RX queue, the CPU and MAC
> > coordinate through a shared ring array of DMA descriptors: one
> > descriptor per DMA buffer. Each descriptor includes the buffer's
> > physical address and a status flag ("OWN") indicating which side owns
> > the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set
> > the flag and the MAC is only allowed to clear it, and both must move
> > through the ring in sequence: thus the ring is used for both
> > "submissions" and "completions."
> > 
> > In the stmmac driver, stmmac_rx() bookmarks its position in the ring
> > with the `cur_rx` index. The main receive loop in that function checks
> > for rx_descs[cur_rx].own=0, gives the corresponding buffer to the
> > network stack (NULLing the pointer), and increments `cur_rx` modulo the
> > ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its
> > position with `dirty_rx`, allocates fresh buffers and rearms the
> > descriptors (setting OWN=1). If it fails any allocation, it simply stops
> > early (leaving OWN=0) and will retry where it left off when next called.
> > 
> > This means descriptors have a three-stage lifecycle (terms my own):
> > - `empty` (OWN=1, buffer valid)
> > - `full` (OWN=0, buffer valid and populated)
> > - `dirty` (OWN=0, buffer NULL)
> > 
> > But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In
> > the past (see 'Fixes:'), there was a bug where the loop could cycle
> > `cur_rx` all the way back to the first descriptor it dirtied, resulting
> > in a NULL dereference when mistaken for `full`. The aforementioned
> > commit resolved that *specific* failure by capping the loop's iteration
> > limit at `dma_rx_size - 1`, but this is only a partial fix: if the
> > previous stmmac_rx_refill() didn't complete, then there are leftover
> > `dirty` descriptors that the loop might encounter without needing to
> > cycle fully around. The current code therefore panics (see 'Closes:')
> > when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to
> > catch up to `dirty_rx`.
> > 
> > Fix this by further tightening the clamp from `dma_rx_size - 1` to
> > `dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty
> > entries and limiting the loop so that `cur_rx` cannot catch back up to
> > `dirty_rx`. This carries no risk of arithmetic underflow: since the
> > maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`,
> > the worst the clamp can do is prevent the loop from running at all.
> > 
> > Fixes: b6cb4541853c7 ("net: stmmac: avoid rx queue overrun")
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221010
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sam Edwards <CFSworks@gmail.com>
> 
> Locally, while debugging my issues, I used this to prevent cur_rx
> catching up with dirty_rx:
> 
>                 status = stmmac_rx_status(priv, &priv->xstats, p);
>                 /* check if managed by the DMA otherwise go ahead */
>                 if (unlikely(status & dma_own))
>                         break;
> 
>                 next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
>                                                priv->dma_conf.dma_rx_size);
>                 if (unlikely(next_entry == rx_q->dirty_rx))
>                         break;
> 
>                 rx_q->cur_rx = next_entry;
> 
> If we care about the cost of reloading rx_q->dirty_rx on every
> iteration, then I'd suggest that the cost we already incur reading and
> writing rx_q->cur_rx is something that should be addressed, and
> eliminating that would counter the cost of reading rx_q->dirty_rx. I
> suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
> likely in the same cache line.
> 
> It looks like any fix to stmmac_rx() will also need a corresponding
> fix for stmmac_rx_zc().

I have some further information, but a new curveball has just been
chucked... and I've no idea what this will mean at this stage. Just
take it that I won't be responding for a while.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
  2026-04-15 16:28   ` Russell King (Oracle)
@ 2026-04-15 17:53     ` Sam Edwards
  2026-04-15 19:58       ` Russell King (Oracle)
  0 siblings, 1 reply; 5+ messages in thread
From: Sam Edwards @ 2026-04-15 17:53 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable

On Wed, Apr 15, 2026 at 9:28 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, Apr 15, 2026 at 01:56:32PM +0100, Russell King (Oracle) wrote:
> > Locally, while debugging my issues, I used this to prevent cur_rx
> > catching up with dirty_rx:
> >
> >                 status = stmmac_rx_status(priv, &priv->xstats, p);
> >                 /* check if managed by the DMA otherwise go ahead */
> >                 if (unlikely(status & dma_own))
> >                         break;
> >
> >                 next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
> >                                                priv->dma_conf.dma_rx_size);
> >                 if (unlikely(next_entry == rx_q->dirty_rx))
> >                         break;
> >
> >                 rx_q->cur_rx = next_entry;
> >
> > If we care about the cost of reloading rx_q->dirty_rx on every
> > iteration, then I'd suggest that the cost we already incur reading and
> > writing rx_q->cur_rx is something that should be addressed, and
> > eliminating that would counter the cost of reading rx_q->dirty_rx. I
> > suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
> > likely in the same cache line.

No, no, I like your approach better. :) It also removes the need for
the `limit` clamp at the top of the function, so later code can assume
limit==budget.

> > It looks like any fix to stmmac_rx() will also need a corresponding
> > fix for stmmac_rx_zc().

I agree that stmmac_rx_zc() is likely also broken (in a similar way,
but not similar enough to permit a "corresponding" fix), but I don't
agree that there's a dependency relationship here. This patch is
addressing #221010, which affects the generic/non-ZC codepath; I'm
afraid the ZC codepath warrants its own investigation.

> I have some further information, but a new curveball has just been
> chucked... and I've no idea what this will mean at this stage. Just
> take it that I won't be responding for a while.

I think I follow your meaning. Good luck getting it straightened out!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
  2026-04-15 17:53     ` Sam Edwards
@ 2026-04-15 19:58       ` Russell King (Oracle)
  0 siblings, 0 replies; 5+ messages in thread
From: Russell King (Oracle) @ 2026-04-15 19:58 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable

On Wed, Apr 15, 2026 at 10:53:15AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 9:28 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Apr 15, 2026 at 01:56:32PM +0100, Russell King (Oracle) wrote:
> > > Locally, while debugging my issues, I used this to prevent cur_rx
> > > catching up with dirty_rx:
> > >
> > >                 status = stmmac_rx_status(priv, &priv->xstats, p);
> > >                 /* check if managed by the DMA otherwise go ahead */
> > >                 if (unlikely(status & dma_own))
> > >                         break;
> > >
> > >                 next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
> > >                                                priv->dma_conf.dma_rx_size);
> > >                 if (unlikely(next_entry == rx_q->dirty_rx))
> > >                         break;
> > >
> > >                 rx_q->cur_rx = next_entry;
> > >
> > > If we care about the cost of reloading rx_q->dirty_rx on every
> > > iteration, then I'd suggest that the cost we already incur reading and
> > > writing rx_q->cur_rx is something that should be addressed, and
> > > eliminating that would counter the cost of reading rx_q->dirty_rx. I
> > > suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
> > > likely in the same cache line.
> 
> No, no, I like your approach better. :) It also removes the need for
> the `limit` clamp at the top of the function, so later code can assume
> limit==budget.
> 
> > > It looks like any fix to stmmac_rx() will also need a corresponding
> > > fix for stmmac_rx_zc().
> 
> I agree that stmmac_rx_zc() is likely also broken (in a similar way,
> but not similar enough to permit a "corresponding" fix), but I don't
> agree that there's a dependency relationship here. This patch is
> addressing #221010, which affects the generic/non-ZC codepath; I'm
> afraid the ZC codepath warrants its own investigation.

The code structure is identical. The only difference is what happens
to the packets.

Both paths take the NAPI limit. Both paths process up to that limit of
descriptors. The state saving / restoring is similar. The read_again
label is the same, the condition after is the same.

The ZC path differs at this point in that it will attempt to refill
every 16 descriptors that have been processed.

Both paths then read the descriptor and check the ownership.
Both paths then increment cur_rx to point to the next entry around
the ring.
Both paths then get the following descriptor pointer and prefetch
it.
Both paths then get the extended status if we're using extended
descriptors.
Both paths then handle frame discard.
Both paths then jump back to read_again if this isn't the last
segment and we have an error.
Both paths then check for error.
... and so it goes on.

The ZC path to me looks like a copy-paste-and-tweak approach to
adding support. The difference seems to be centered only around
the handling of the data buffers in the descriptors. The overall
mechanism of processing the descriptors follows the same layout
in both functions.

> > I have some further information, but a new curveball has just been
> > chucked... and I've no idea what this will mean at this stage. Just
> > take it that I won't be responding for a while.
> 
> I think I follow your meaning. Good luck getting it straightened out!

It looks like further curveballs have been thrown as a result,
destroying all "plans" for the next days/week. I have aboslutely
no ideas how much time or when I'll be able to look at anything
at the moment, so don't assume that because I find an opportunity
to send an email, everthing is back to normal.

I'll also note that over the last two days I've written several
emails on this, spent many hours on them, only to discard them
as other ideas/research and maybe even the passage of time means
they're no longer appropriate to send.

Jakub: sorry, I just *can't* review stuff on netdev with everything
that is going on, not when .... cna't complete this.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-15 19:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15  2:39 [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted Sam Edwards
2026-04-15 12:56 ` Russell King (Oracle)
2026-04-15 16:28   ` Russell King (Oracle)
2026-04-15 17:53     ` Sam Edwards
2026-04-15 19:58       ` Russell King (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox