[PATCH v2] net: lan966x: restore RX state on reload failure

BPF List
 help / color / mirror / Atom feed

* [PATCH v2] net: lan966x: restore RX state on reload failure
@ 2026-06-07 14:57 Guangshuo Li
  2026-06-07 15:12 ` sashiko-bot
  2026-06-10  2:40 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Guangshuo Li @ 2026-06-07 14:57 UTC (permalink / raw)
  To: Horatiu Vultur, UNGLinuxDriver, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, David Carlier, netdev, linux-kernel, bpf
  Cc: Guangshuo Li

lan966x_fdma_reload() backs up rx->page_pool and rx->fdma before
reallocating the RX resources for the new MTU. If the allocation fails,
the restore path puts these fields back before restarting RX.

However, the reload path also updates rx->page_order and rx->max_mtu
before calling lan966x_fdma_rx_alloc(). These fields are not restored on
failure, so RX can be restarted with the old pages, old FDMA state and
old page pool, but with the page geometry from the failed new MTU.

This can make the XDP path advertise a frame size derived from the new
page_order while the actual RX pages still come from the old allocation.
For example, after a failed reload to a jumbo MTU, xdp_init_buff() may be
called with a frame size larger than the restored RX pages.

lan966x_fdma_rx_alloc_page_pool() also registers the newly allocated page
pool with each port's XDP RXQ before fdma_alloc_coherent() is called. If
fdma_alloc_coherent() fails, the new page pool is destroyed, but the
rollback path does not restore the per-port XDP RXQ mem model
registration either.

Save and restore rx->page_order and rx->max_mtu, and restore the old page
pool registration for each port's XDP RXQ before RX is started again.
This keeps the restored RX state consistent after a failed reload.

Fixes: 59c3d55a946c ("net: lan966x: fix use-after-free and leak in lan966x_fdma_reload()")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
v2:
  - Save and restore rx->page_order and rx->max_mtu in the reload rollback
    path.
  - Keep the XDP RXQ page-pool registration restore from v1.

 .../ethernet/microchip/lan966x/lan966x_fdma.c | 20 +++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
index f8ce735a7fc0..8272ad085150 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
@@ -815,6 +815,7 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
 	struct page *(*old_pages)[FDMA_RX_DCB_MAX_DBS];
 	struct page_pool *page_pool;
 	struct fdma fdma_rx_old;
+	int page_order, max_mtu;
 	int err, i, j;

 	old_pages = kmemdup(lan966x->rx.page, sizeof(lan966x->rx.page),
@@ -825,6 +826,8 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
 	/* Store these for later to free them */
 	memcpy(&fdma_rx_old, &lan966x->rx.fdma, sizeof(struct fdma));
 	page_pool = lan966x->rx.page_pool;
+	page_order = lan966x->rx.page_order;
+	max_mtu = lan966x->rx.max_mtu;

 	napi_synchronize(&lan966x->napi);
 	napi_disable(&lan966x->napi);
@@ -854,7 +857,24 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
 	return 0;
 restore:
 	lan966x->rx.page_pool = page_pool;
+	lan966x->rx.page_order = page_order;
+	lan966x->rx.max_mtu = max_mtu;
 	memcpy(&lan966x->rx.fdma, &fdma_rx_old, sizeof(struct fdma));
+	/*
+	 * lan966x_fdma_rx_alloc_page_pool() registered the new pool with
+	 * each port's XDP RXQ before the allocation failed. The new pool is
+	 * destroyed by lan966x_fdma_rx_alloc(), so restore the old pool's
+	 * registration before restarting RX.
+	 */
+	for (i = 0; i < lan966x->num_phys_ports; i++) {
+		if (!lan966x->ports[i])
+			continue;
+
+		xdp_rxq_info_unreg_mem_model(&lan966x->ports[i]->xdp_rxq);
+		xdp_rxq_info_reg_mem_model(&lan966x->ports[i]->xdp_rxq,
+					   MEM_TYPE_PAGE_POOL, page_pool);
+	}
+
 	lan966x_fdma_rx_start(&lan966x->rx);

 	lan966x_fdma_wakeup_netdev(lan966x);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] net: lan966x: restore RX state on reload failure
  2026-06-07 14:57 [PATCH v2] net: lan966x: restore RX state on reload failure Guangshuo Li
@ 2026-06-07 15:12 ` sashiko-bot
  2026-06-10  2:40 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-06-07 15:12 UTC (permalink / raw)
  To: Guangshuo Li; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] The return value of xdp_rxq_info_reg_mem_model() is ignored, leaving XDP queues without a valid memory model if registration fails.

Pre-existing issues:
- [High] lan966x_fdma_rx_alloc() ignores the return value of fdma_dcbs_init(), leading to hardware DMA into uninitialized memory if allocation fails.
- [High] lan966x_fdma_reload() fails to update rx->fdma.db_size during MTU changes, causing frames to be truncated.
- [High] Snapshotting the RX ring state before disabling NAPI creates a race condition leading to Use-After-Free and memory leaks.
--

commit 8b6a1ed08092bd9d26dff38bd133b141c326226a
Author: Guangshuo Li <lgs201920130244@gmail.com>

net: lan966x: restore RX state on reload failure

This commit saves and restores rx->page_order, rx->max_mtu, and per-port
XDP RXQ mem model registrations during a failed RX MTU reload. It attempts
to keep the restored RX state consistent when lan966x_fdma_rx_alloc() fails.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> index f8ce735a7fc09..8272ad0851508 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> @@ -815,6 +815,7 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
>  	struct page *(*old_pages)[FDMA_RX_DCB_MAX_DBS];
>  	struct page_pool *page_pool;
>  	struct fdma fdma_rx_old;
> +	int page_order, max_mtu;
>  	int err, i, j;
>  
>  	old_pages = kmemdup(lan966x->rx.page, sizeof(lan966x->rx.page),
> @@ -825,6 +826,8 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
>  	/* Store these for later to free them */
>  	memcpy(&fdma_rx_old, &lan966x->rx.fdma, sizeof(struct fdma));
>  	page_pool = lan966x->rx.page_pool;
> +	page_order = lan966x->rx.page_order;
> +	max_mtu = lan966x->rx.max_mtu;
>  
>  	napi_synchronize(&lan966x->napi);
>  	napi_disable(&lan966x->napi);

[Severity: High]
This is a pre-existing issue, but does snapshotting the RX ring state before
disabling NAPI create a race condition?

If the NAPI poll loop runs concurrently on another CPU during this window, it
can process packets, pass some of the old pages to the network stack, and
allocate new pages into rx->page.

Later, the driver iterates over the stale old_pages snapshot and frees them
to the page pool. This could double-free the pages that were consumed by the
network stack. Additionally, any new pages allocated by NAPI after the
snapshot might be completely absent from old_pages and thus leaked.

[ ... ]

> @@ -854,7 +857,24 @@ static int lan966x_fdma_reload(struct lan966x *lan966x, int new_mtu)
>  	return 0;
>  restore:
>  	lan966x->rx.page_pool = page_pool;
> +	lan966x->rx.page_order = page_order;
> +	lan966x->rx.max_mtu = max_mtu;

[Severity: High]
This isn't a bug introduced by this patch, but does lan966x_fdma_reload() fail
to update rx->fdma.db_size during MTU changes?

When changing the MTU, the driver updates rx->page_order and rx->max_mtu, but
fails to update rx->fdma.db_size (which was initially calculated in
lan966x_fdma_init() as PAGE_SIZE << rx->page_order).

Consequently, the subsequent call to fdma_dcbs_init() configures the hardware
descriptors with the old db_size. If the user increases the MTU, the hardware
will still restrict DMA to the old buffer size, which could silently truncate
larger frames.

[Severity: High]
This is also a pre-existing issue, but does lan966x_fdma_rx_alloc() ignore the
return value of fdma_dcbs_init()?

drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:lan966x_fdma_rx_alloc() {
    ...
	fdma_dcbs_init(fdma, FDMA_DCB_INFO_DATAL(fdma->db_size),
		       FDMA_DCB_STATUS_INTR);

	return 0;
}

If page allocation fails within fdma_dcbs_init(), it aborts and returns
-ENOMEM. However, lan966x_fdma_rx_alloc() ignores this return value and
returns 0. The hardware RX ring is then started with uninitialized DMA
descriptors (containing physical address 0), which could cause silent memory
corruption when the hardware DMAs packets into address 0.

>  	memcpy(&lan966x->rx.fdma, &fdma_rx_old, sizeof(struct fdma));
> +	/*
> +	 * lan966x_fdma_rx_alloc_page_pool() registered the new pool with
> +	 * each port's XDP RXQ before the allocation failed. The new pool is
> +	 * destroyed by lan966x_fdma_rx_alloc(), so restore the old pool's
> +	 * registration before restarting RX.
> +	 */
> +	for (i = 0; i < lan966x->num_phys_ports; i++) {
> +		if (!lan966x->ports[i])
> +			continue;
> +
> +		xdp_rxq_info_unreg_mem_model(&lan966x->ports[i]->xdp_rxq);
> +		xdp_rxq_info_reg_mem_model(&lan966x->ports[i]->xdp_rxq,
> +					   MEM_TYPE_PAGE_POOL, page_pool);

[Severity: High]
Is it safe to ignore the return value of xdp_rxq_info_reg_mem_model() here?

If xdp_rxq_info_reg_mem_model() fails (e.g., due to -ENOMEM from kzalloc), the
XDP RX queue is left unregistered. When subsequent packets are processed, XDP
might treat the page_pool buffers as standard shared pages and call put_page()
instead of page_pool_put_page(), which could corrupt the page pool allocator.

This same issue also appears to be present in the forward path within
lan966x_fdma_rx_alloc_page_pool():

drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:lan966x_fdma_rx_alloc_page_pool() {
    ...
		xdp_rxq_info_unreg_mem_model(&port->xdp_rxq);
		xdp_rxq_info_reg_mem_model(&port->xdp_rxq, MEM_TYPE_PAGE_POOL,
					   rx->page_pool);
    ...
}

> +	}
> +
>  	lan966x_fdma_rx_start(&lan966x->rx);
>  
>  	lan966x_fdma_wakeup_netdev(lan966x);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260607145747.1494514-1-lgs201920130244@gmail.com?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] net: lan966x: restore RX state on reload failure
  2026-06-07 14:57 [PATCH v2] net: lan966x: restore RX state on reload failure Guangshuo Li
  2026-06-07 15:12 ` sashiko-bot
@ 2026-06-10  2:40 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-06-10  2:40 UTC (permalink / raw)
  To: Guangshuo Li
  Cc: horatiu.vultur, UNGLinuxDriver, andrew+netdev, davem, edumazet,
	kuba, pabeni, ast, daniel, hawk, john.fastabend, sdf, devnexen,
	netdev, linux-kernel, bpf

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun,  7 Jun 2026 22:57:47 +0800 you wrote:
> lan966x_fdma_reload() backs up rx->page_pool and rx->fdma before
> reallocating the RX resources for the new MTU. If the allocation fails,
> the restore path puts these fields back before restarting RX.
> 
> However, the reload path also updates rx->page_order and rx->max_mtu
> before calling lan966x_fdma_rx_alloc(). These fields are not restored on
> failure, so RX can be restarted with the old pages, old FDMA state and
> old page pool, but with the page geometry from the failed new MTU.
> 
> [...]

Here is the summary with links:
  - [v2] net: lan966x: restore RX state on reload failure
    https://git.kernel.org/netdev/net-next/c/aa97f11a76e5

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-10  2:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-07 14:57 [PATCH v2] net: lan966x: restore RX state on reload failure Guangshuo Li
2026-06-07 15:12 ` sashiko-bot
2026-06-10  2:40 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox