* [PATCH net v2 0/2] stmmac crash/stall fixes when under memory pressure @ 2026-03-19 18:40 Sam Edwards 2026-03-19 18:40 ` [PATCH net v2 1/2] net: stmmac: Prevent NULL deref when RX memory exhausted Sam Edwards 2026-03-19 18:40 ` [PATCH net v2 2/2] net: stmmac: Prevent indefinite RX stall on buffer exhaustion Sam Edwards 0 siblings, 2 replies; 5+ messages in thread From: Sam Edwards @ 2026-03-19 18:40 UTC (permalink / raw) To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Russell King, Maxime Chevallier, Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin, netdev, linux-arm-kernel, linux-kernel, Sam Edwards Hi netdev, This is v2 of my series containing a pair of bugfixes for the stmmac driver's receive pipeline. These issues occur when stmmac_rx_refill() does not (fully) succeed, which happens more frequently when free memory is low. The first patch closes Bugzilla bug #221010 [1], where stmmac_rx() can circle around to a still-dirty descriptor (with a NULL buffer pointer), mistake it for a filled descriptor (due to OWN=0), and attempt to dereference the buffer. In testing that patch, I discovered a second issue: starvation of available RX buffers causes the NIC to stop sending interrupts; if the driver stops polling, it will wait indefinitely for an interrupt that will never come. (Note: the first patch makes this issue more prominent -- mostly because it lets the system survive long enough to exhibit it -- but doesn't *cause* it.) The second patch addresses that problem as well. Both patches are minimal, appropriate for stable, and designated to `net`. Regards, Sam [1] https://bugzilla.kernel.org/show_bug.cgi?id=221010 v2: - Completely rewrote the commit message of patch 1, now assuming the reader is generally familiar with DMA but wholly unfamiliar with the stmmac device (thanks Jakub!) - Added missing `Fixes:` to patch 2 - Moved patch 2's `int budget = limit;` decl per the reverse-xmas-tree rule - Dropped patch 3: this was a code improvement not appropriate for stable - Generated the series with --subject-prefix='PATCH net' v1: https://lore.kernel.org/netdev/20260316021009.262358-1-CFSworks@gmail.com/ Sam Edwards (2): net: stmmac: Prevent NULL deref when RX memory exhausted net: stmmac: Prevent indefinite RX stall on buffer exhaustion drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) -- 2.52.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net v2 1/2] net: stmmac: Prevent NULL deref when RX memory exhausted 2026-03-19 18:40 [PATCH net v2 0/2] stmmac crash/stall fixes when under memory pressure Sam Edwards @ 2026-03-19 18:40 ` Sam Edwards 2026-03-19 21:59 ` Russell King (Oracle) 2026-03-19 18:40 ` [PATCH net v2 2/2] net: stmmac: Prevent indefinite RX stall on buffer exhaustion Sam Edwards 1 sibling, 1 reply; 5+ messages in thread From: Sam Edwards @ 2026-03-19 18:40 UTC (permalink / raw) To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Russell King, Maxime Chevallier, Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin, netdev, linux-arm-kernel, linux-kernel, Sam Edwards, stable The CPU receives frames from the MAC through conventional DMA: the CPU allocates buffers for the MAC, then the MAC fills them and returns ownership to the CPU. For each hardware RX queue, the CPU and MAC coordinate through a shared ring array of DMA descriptors: one descriptor per DMA buffer. Each descriptor includes the buffer's physical address and a status flag ("OWN") indicating which side owns the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set the flag and the MAC is only allowed to clear it, and both must move through the ring in sequence: thus the ring is used for both "submissions" and "completions." In the stmmac driver, stmmac_rx() bookmarks its position in the ring with the `cur_rx` index. The main receive loop in that function checks for rx_descs[cur_rx].own=0, gives the corresponding buffer to the network stack (NULLing the pointer), and increments `cur_rx` modulo the ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its position with `dirty_rx`, allocates fresh buffers and rearms the descriptors (setting OWN=1). If it fails any allocation, it simply stops early (leaving OWN=0) and will retry where it left off when next called. This means descriptors have a three-stage lifecycle (terms my own): - `empty` (OWN=1, buffer valid) - `full` (OWN=0, buffer valid and populated) - `dirty` (OWN=0, buffer NULL) But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In the past (see 'Fixes:'), there was a bug where the loop could cycle `cur_rx` all the way back to the first descriptor it dirtied, resulting in a NULL dereference when mistaken for `full`. The aforementioned commit resolved that *specific* failure by capping the loop's iteration limit at `dma_rx_size - 1`, but this is only a partial fix: if the previous stmmac_rx_refill() didn't complete, then there are leftover `dirty` descriptors that the loop might encounter without needing to cycle fully around. The current code therefore panics (see 'Closes:') when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to catch up to `dirty_rx`. Fix this by further tightening the clamp from `dma_rx_size - 1` to `dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty entries and limiting the loop so that `cur_rx` cannot catch back up to `dirty_rx`. This carries no risk of arithmetic underflow: since the maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`, the worst the clamp can do is prevent the loop from running at all. Fixes: b6cb4541853c7 ("net: stmmac: avoid rx queue overrun") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221010 Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards <CFSworks@gmail.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 6827c99bde8c..f98b070073c0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -5609,7 +5609,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) dma_dir = page_pool_get_dma_dir(rx_q->page_pool); bufsz = DIV_ROUND_UP(priv->dma_conf.dma_buf_sz, PAGE_SIZE) * PAGE_SIZE; - limit = min(priv->dma_conf.dma_rx_size - 1, (unsigned int)limit); + limit = min(priv->dma_conf.dma_rx_size - stmmac_rx_dirty(priv, queue) - 1, + (unsigned int)limit); if (netif_msg_rx_status(priv)) { void *rx_head; -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net v2 1/2] net: stmmac: Prevent NULL deref when RX memory exhausted 2026-03-19 18:40 ` [PATCH net v2 1/2] net: stmmac: Prevent NULL deref when RX memory exhausted Sam Edwards @ 2026-03-19 21:59 ` Russell King (Oracle) 2026-03-20 1:11 ` Sam Edwards 0 siblings, 1 reply; 5+ messages in thread From: Russell King (Oracle) @ 2026-03-19 21:59 UTC (permalink / raw) To: Sam Edwards Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Maxime Chevallier, Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin, netdev, linux-arm-kernel, linux-kernel, stable On Thu, Mar 19, 2026 at 11:40:30AM -0700, Sam Edwards wrote: > The CPU receives frames from the MAC through conventional DMA: the CPU > allocates buffers for the MAC, then the MAC fills them and returns > ownership to the CPU. For each hardware RX queue, the CPU and MAC > coordinate through a shared ring array of DMA descriptors: one > descriptor per DMA buffer. Each descriptor includes the buffer's > physical address and a status flag ("OWN") indicating which side owns > the buffer: OWN=0 for CPU, OWN=1 for MAC. The CPU is only allowed to set > the flag and the MAC is only allowed to clear it, and both must move > through the ring in sequence: thus the ring is used for both > "submissions" and "completions." > > In the stmmac driver, stmmac_rx() bookmarks its position in the ring > with the `cur_rx` index. The main receive loop in that function checks > for rx_descs[cur_rx].own=0, gives the corresponding buffer to the > network stack (NULLing the pointer), and increments `cur_rx` modulo the > ring size. After the loop exits, stmmac_rx_refill(), which bookmarks its > position with `dirty_rx`, allocates fresh buffers and rearms the > descriptors (setting OWN=1). If it fails any allocation, it simply stops > early (leaving OWN=0) and will retry where it left off when next called. > > This means descriptors have a three-stage lifecycle (terms my own): > - `empty` (OWN=1, buffer valid) > - `full` (OWN=0, buffer valid and populated) > - `dirty` (OWN=0, buffer NULL) > > But because stmmac_rx() only checks OWN, it confuses `full`/`dirty`. In > the past (see 'Fixes:'), there was a bug where the loop could cycle > `cur_rx` all the way back to the first descriptor it dirtied, resulting > in a NULL dereference when mistaken for `full`. The aforementioned > commit resolved that *specific* failure by capping the loop's iteration > limit at `dma_rx_size - 1`, but this is only a partial fix: if the > previous stmmac_rx_refill() didn't complete, then there are leftover > `dirty` descriptors that the loop might encounter without needing to > cycle fully around. The current code therefore panics (see 'Closes:') > when stmmac_rx_refill() is memory-starved long enough for `cur_rx` to > catch up to `dirty_rx`. > > Fix this by further tightening the clamp from `dma_rx_size - 1` to > `dma_rx_size - stmmac_rx_dirty() - 1`, subtracting any remnant dirty > entries and limiting the loop so that `cur_rx` cannot catch back up to > `dirty_rx`. This carries no risk of arithmetic underflow: since the > maximum possible return value of stmmac_rx_dirty() is `dma_rx_size - 1`, > the worst the clamp can do is prevent the loop from running at all. Isn't the limit equivalent to: space = CIRC_SPACE(rx_q->cur_rx, rx_q->dirty_rx, priv->dma_conf.dma_rx_size); limit = min_t(unsigned int, space, limit); (Think of the "full" and "empty" cases as "space" which can be consumed to provide entries for the refiller to action.) I think the same applies for patch 2 - when the above returns zero it means we have no entries in the ring that aren't due for refill. Have you checked whether stmmac_rx_zc() also buggy in this respect? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net v2 1/2] net: stmmac: Prevent NULL deref when RX memory exhausted 2026-03-19 21:59 ` Russell King (Oracle) @ 2026-03-20 1:11 ` Sam Edwards 0 siblings, 0 replies; 5+ messages in thread From: Sam Edwards @ 2026-03-20 1:11 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Maxime Chevallier, Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin, netdev, linux-arm-kernel, linux-kernel, stable On Thu, Mar 19, 2026 at 2:59 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: Hi Russell, > Isn't the limit equivalent to: > > space = CIRC_SPACE(rx_q->cur_rx, rx_q->dirty_rx, > priv->dma_conf.dma_rx_size); > limit = min_t(unsigned int, space, limit); > I had to think about the edge cases for a few minutes -- yes, you're right. However... > (Think of the "full" and "empty" cases as "space" which can be > consumed to provide entries for the refiller to action.) ...this seems like a double negative: We're looking at the ring from the refiller's perspective (i.e. considering "dirty" the meaningful state), and *then* looking at the "unused space" (i.e. "not dirty"). If I used it, I'd at least add a comment explaining that CIRC_SPACE is, counterintuitively, not measuring "space." After the bugs are fixed, I'll follow up with a net-next patch [1] that removes the `min` altogether and makes the loop check for 'dirty' directly, so my focus here is "minimal, obviously correct" and not necessarily "tidiest." Would it be acceptable if I simplify stmmac_rx_dirty() to use CIRC_CNT() then, instead of using CIRC_SPACE() now? > I think the same applies for patch 2 - when the above returns zero > it means we have no entries in the ring that aren't due for refill. I'm not completely satisfied with the `stmmac_rx_dirty() == dma_rx_size - 1` expression either. One might argue we should keep trying until dirty==0, because tolerating *any* dirtiness risks not having enough buffers for the next traffic burst, causing avoidable rx drops. But that's a discussion for patch 2. :) > Have you checked whether stmmac_rx_zc() also buggy in this respect? I skimmed it. I saw that it checks the return status of stmmac_rx_refill_zc() and on failure, returns the NAPI budget to keep polling -- which told me that the ZC author(s) at least considered these problems. Looking at it more thoroughly now, there are a few code smells around the unconditional `dirty = 0;`, and `dirty` being used as the "budget" for stmmac_rx_refill_zc() (why have a budget at all then?) so it looks like something's there, but it's a separate bug if so. Best, Sam [1] https://lore.kernel.org/netdev/20260316021009.262358-4-CFSworks@gmail.com/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net v2 2/2] net: stmmac: Prevent indefinite RX stall on buffer exhaustion 2026-03-19 18:40 [PATCH net v2 0/2] stmmac crash/stall fixes when under memory pressure Sam Edwards 2026-03-19 18:40 ` [PATCH net v2 1/2] net: stmmac: Prevent NULL deref when RX memory exhausted Sam Edwards @ 2026-03-19 18:40 ` Sam Edwards 1 sibling, 0 replies; 5+ messages in thread From: Sam Edwards @ 2026-03-19 18:40 UTC (permalink / raw) To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Russell King, Maxime Chevallier, Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin, netdev, linux-arm-kernel, linux-kernel, Sam Edwards, stable The stmmac driver handles interrupts in the usual NAPI way: an interrupt arrives, the NAPI instance is scheduled and interrupts are masked, and the actual work occurs in the NAPI polling function. Once no further work remains, interrupts are unmasked and the NAPI instance is put to sleep to await a future interrupt. In the receive case, the MAC only sends the interrupt when a DMA operation completes; thus the driver must make sure a usable RX DMA descriptor exists before expecting a future interrupt. The main receive loop in stmmac_rx() exits under one of 3 conditions: 1) It encounters a DMA descriptor with OWN=1, indicating that no further pending data exists. The MAC will use this descriptor for the next RX DMA operation, so the driver can expect a future interrupt. 2) It exhausts the NAPI budget. In this case, the driver doesn't know whether the MAC has any usable DMA descriptors. But when the driver consumes its full budget, that signals NAPI to keep polling, so the question is moot. 3) It runs out of (non-dirty) descriptors in the RX ring. In this case, the MAC will only have a usable descriptor if stmmac_rx_refill() succeeds (at least partially). Currently, stmmac_rx() lacks any check against scenario #3 and stmmac_rx_refill() failing: it will stop NAPI polling and unmask interrupts to await an interrupt that will never arrive, stalling the receive pipeline indefinitely. Fix this by checking if stmmac_rx_dirty() returns its maximal value, returning the full budget (which tells NAPI to keep polling) if so. Fixes: 47dd7a540b8a ("net: add support for STMicroelectronics Ethernet controllers.") Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards <CFSworks@gmail.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index f98b070073c0..05d3c548ce28 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -5604,6 +5604,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) unsigned int desc_size; struct sk_buff *skb = NULL; struct stmmac_xdp_buff ctx; + int budget = limit; int xdp_status = 0; int bufsz; @@ -5870,6 +5871,12 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) priv->xstats.rx_dropped += rx_dropped; priv->xstats.rx_errors += rx_errors; + /* If the RX queue is completely dirty, we can't expect a future + * interrupt; tell NAPI to keep polling. + */ + if (unlikely(stmmac_rx_dirty(priv, queue) == priv->dma_conf.dma_rx_size - 1)) + return budget; + return count; } -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-20 1:12 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-19 18:40 [PATCH net v2 0/2] stmmac crash/stall fixes when under memory pressure Sam Edwards 2026-03-19 18:40 ` [PATCH net v2 1/2] net: stmmac: Prevent NULL deref when RX memory exhausted Sam Edwards 2026-03-19 21:59 ` Russell King (Oracle) 2026-03-20 1:11 ` Sam Edwards 2026-03-19 18:40 ` [PATCH net v2 2/2] net: stmmac: Prevent indefinite RX stall on buffer exhaustion Sam Edwards
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox