[PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure

Linux-HyperV List
 help / color / mirror / Atom feed

* [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure
@ 2026-05-25  8:08 Dipayaan Roy
  2026-05-25  8:08 ` [PATCH net v3 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dipayaan Roy @ 2026-05-25  8:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
	pavan.chebbi

When mana_attach() fails (e.g. during queue allocation), the error
cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple
subsequent teardown paths can then dereference these NULL pointers,
causing kernel panics.

Patch 1 adds NULL guards in the low-level teardown functions
(mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are
safe to call regardless of queue initialization state. This covers all
callers: mana_remove(), mana_change_mtu() recovery, and internal error
paths in mana_alloc_queues().

Patch 2 adds an early exit in mana_detach() for already-detached ports,
making it safe for non-close callers. This allows the queue reset
handler to safely retry mana_attach() without redundant teardown.

Changes in v3:
  - Patch 1: Fixed commit message and fixes tag.
Changes in v2:
  - Patch 2: moved netif_device_present() check into mana_detach() as
    an early exit instead of using goto in the work handler 

Dipayaan Roy (2):
  net: mana: Add NULL guards in teardown path to prevent panic on attach
    failure
  net: mana: Skip redundant detach on already-detached port

 drivers/net/ethernet/microsoft/mana/mana_en.c | 76 ++++++++++++-------
 1 file changed, 47 insertions(+), 29 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net v3 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
  2026-05-25  8:08 [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure Dipayaan Roy
@ 2026-05-25  8:08 ` Dipayaan Roy
  2026-05-30  0:47   ` sashiko-bot
  2026-05-25  8:08 ` [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port Dipayaan Roy
  2026-05-28 23:40 ` [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure patchwork-bot+netdevbpf
  2 siblings, 1 reply; 8+ messages in thread
From: Dipayaan Roy @ 2026-05-25  8:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
	pavan.chebbi

When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as
mana_remove(), mana_change_mtu() recovery, and internal error handling
in mana_alloc_queues() can subsequently call into functions that
dereference these pointers without NULL checks:

- mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer
  dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26).
- mana_destroy_vport() iterates apc->rxqs without a NULL check.
- mana_fence_rqs() iterates apc->rxqs without a NULL check.
- mana_dealloc_queues() iterates apc->tx_qp without a NULL check.

Add NULL guards for apc->rxqs in mana_fence_rqs(),
mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a
NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue
draining when TX queues were never allocated or already freed.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 70 +++++++++++--------
 1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 9afc786b297a..0582803907a8 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1727,6 +1727,9 @@ static void mana_fence_rqs(struct mana_port_context *apc)
 	struct mana_rxq *rxq;
 	int err;
 
+	if (!apc->rxqs)
+		return;
+
 	for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
 		rxq = apc->rxqs[rxq_idx];
 		err = mana_fence_rq(apc, rxq);
@@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct mana_port_context *apc)
 	struct mana_rxq *rxq;
 	u32 rxq_idx;
 
-	for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
-		rxq = apc->rxqs[rxq_idx];
-		if (!rxq)
-			continue;
+	if (apc->rxqs) {
 
-		mana_destroy_rxq(apc, rxq, true);
-		apc->rxqs[rxq_idx] = NULL;
+		for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
+			rxq = apc->rxqs[rxq_idx];
+			if (!rxq)
+				continue;
+
+			mana_destroy_rxq(apc, rxq, true);
+			apc->rxqs[rxq_idx] = NULL;
+		}
 	}
 
 	mana_destroy_txq(apc);
@@ -3269,7 +3275,8 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	if (apc->port_is_up)
 		return -EINVAL;
 
-	mana_chn_setxdp(apc, NULL);
+	if (apc->rxqs)
+		mana_chn_setxdp(apc, NULL);
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
 		mana_pf_deregister_filter(apc);
@@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	 * number of queues.
 	 */
 
-	for (i = 0; i < apc->num_queues; i++) {
-		txq = &apc->tx_qp[i].txq;
-		tsleep = 1000;
-		while (atomic_read(&txq->pending_sends) > 0 &&
-		       time_before(jiffies, timeout)) {
-			usleep_range(tsleep, tsleep + 1000);
-			tsleep <<= 1;
-		}
-		if (atomic_read(&txq->pending_sends)) {
-			err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
-			if (err) {
-				netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
-					   err, atomic_read(&txq->pending_sends),
-					   txq->gdma_txq_id);
+	if (apc->tx_qp) {
+		for (i = 0; i < apc->num_queues; i++) {
+			txq = &apc->tx_qp[i].txq;
+			tsleep = 1000;
+			while (atomic_read(&txq->pending_sends) > 0 &&
+			       time_before(jiffies, timeout)) {
+				usleep_range(tsleep, tsleep + 1000);
+				tsleep <<= 1;
+			}
+			if (atomic_read(&txq->pending_sends)) {
+				err =
+				    pcie_flr(to_pci_dev(gd->gdma_context->dev));
+				if (err) {
+					netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
+						   err,
+					    atomic_read(&txq->pending_sends),
+					    txq->gdma_txq_id);
+				}
+				break;
 			}
-			break;
 		}
-	}
 
-	for (i = 0; i < apc->num_queues; i++) {
-		txq = &apc->tx_qp[i].txq;
-		while ((skb = skb_dequeue(&txq->pending_skbs))) {
-			mana_unmap_skb(skb, apc);
-			dev_kfree_skb_any(skb);
+		for (i = 0; i < apc->num_queues; i++) {
+			txq = &apc->tx_qp[i].txq;
+			while ((skb = skb_dequeue(&txq->pending_skbs))) {
+				mana_unmap_skb(skb, apc);
+				dev_kfree_skb_any(skb);
+			}
+			atomic_set(&txq->pending_sends, 0);
 		}
-		atomic_set(&txq->pending_sends, 0);
 	}
+
 	/* We're 100% sure the queues can no longer be woken up, because
 	 * we're sure now mana_poll_tx_cq() can't be running.
 	 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port
  2026-05-25  8:08 [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure Dipayaan Roy
  2026-05-25  8:08 ` [PATCH net v3 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
@ 2026-05-25  8:08 ` Dipayaan Roy
  2026-05-28  9:30   ` Paolo Abeni
  2026-05-30  0:47   ` sashiko-bot
  2026-05-28 23:40 ` [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure patchwork-bot+netdevbpf
  2 siblings, 2 replies; 8+ messages in thread
From: Dipayaan Roy @ 2026-05-25  8:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
	pavan.chebbi

When mana_per_port_queue_reset_work_handler() runs after a previous
detach succeeded but attach failed, the port is left in a detached
state with apc->tx_qp and apc->rxqs already freed. Calling
mana_detach() again unconditionally leads to NULL pointer dereferences
during queue teardown.

Add an early exit in mana_detach() when the port is already in
detached state (!netif_device_present) for non-close callers, making
it safe to call idempotently. This allows the queue reset handler and
other recovery paths to simply retry mana_attach() without redundant
teardown.

Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 0582803907a8..1e1ad2795c3c 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3350,6 +3350,12 @@ int mana_detach(struct net_device *ndev, bool from_close)
 
 	ASSERT_RTNL();
 
+	/* If already detached (indicates detach succeeded but attach failed
+	 * previously). Now skip mana detach and just retry mana_attach.
+	 */
+	if (!from_close && !netif_device_present(ndev))
+		return 0;
+
 	apc->port_st_save = apc->port_is_up;
 	apc->port_is_up = false;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port
  2026-05-25  8:08 ` [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port Dipayaan Roy
@ 2026-05-28  9:30   ` Paolo Abeni
  2026-05-28 21:16     ` Dipayaan Roy
  2026-05-30  0:47   ` sashiko-bot
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2026-05-28  9:30 UTC (permalink / raw)
  To: Dipayaan Roy, kys, haiyangz, wei.liu, decui, andrew+netdev, davem,
	edumazet, kuba, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
	pavan.chebbi

On 5/25/26 10:08 AM, Dipayaan Roy wrote:
> When mana_per_port_queue_reset_work_handler() runs after a previous
> detach succeeded but attach failed, the port is left in a detached
> state with apc->tx_qp and apc->rxqs already freed. Calling
> mana_detach() again unconditionally leads to NULL pointer dereferences
> during queue teardown.
> 
> Add an early exit in mana_detach() when the port is already in
> detached state (!netif_device_present) for non-close callers, making
> it safe to call idempotently. This allows the queue reset handler and
> other recovery paths to simply retry mana_attach() without redundant
> teardown.
> 
> Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.")
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 0582803907a8..1e1ad2795c3c 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -3350,6 +3350,12 @@ int mana_detach(struct net_device *ndev, bool from_close)
>  
>  	ASSERT_RTNL();
>  
> +	/* If already detached (indicates detach succeeded but attach failed
> +	 * previously). Now skip mana detach and just retry mana_attach.
> +	 */
> +	if (!from_close && !netif_device_present(ndev))
> +		return 0;
> +
>  	apc->port_st_save = apc->port_is_up;
>  	apc->port_is_up = false;

sashiko(gemini) notes the above can lead to different race:

---
Can this early return cause state machine corruption by bypassing the
updates
to apc->port_st_save?
Consider this sequence:
1. queue_reset_work runs, mana_detach() succeeds (apc->port_st_save = true,
   apc->port_is_up = false), but mana_attach() fails.
2. The admin brings the interface down (ip link set dev eth0 down), skipping
   mana_close() since apc->port_is_up is false.
3. The admin changes the MTU, triggering mana_change_mtu() which calls
   mana_detach() followed by mana_attach().
4. mana_detach() hits this new early return, preserving
   apc->port_st_save == true.
When mana_attach() runs, it sees apc->port_st_save == true and allocates
queues, setting apc->vport_use_count = 1 and apc->port_is_up = true, even
though the interface is administratively down.
If the admin then brings the interface up, mana_open() will unconditionally
call mana_alloc_queues(). That function calls mana_cfg_vport(), which will
return -EBUSY because apc->vport_use_count is already 1.
This leaves mana_open() failing and the interface down. Since the interface
is already down, trying to bring it down again is a no-op, meaning
mana_close() is never called to clean up the orphaned queues.
Does this sequence permanently brick the port until the driver is reloaded?
---

I think you need to be more restrictive in the early return check.

/P


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port
  2026-05-28  9:30   ` Paolo Abeni
@ 2026-05-28 21:16     ` Dipayaan Roy
  0 siblings, 0 replies; 8+ messages in thread
From: Dipayaan Roy @ 2026-05-28 21:16 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, leon, longli, kotaranov, horms, shradhagupta, ssengar,
	ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
	pavan.chebbi

On Thu, May 28, 2026 at 11:30:39AM +0200, Paolo Abeni wrote:
> On 5/25/26 10:08 AM, Dipayaan Roy wrote:
> > When mana_per_port_queue_reset_work_handler() runs after a previous
> > detach succeeded but attach failed, the port is left in a detached
> > state with apc->tx_qp and apc->rxqs already freed. Calling
> > mana_detach() again unconditionally leads to NULL pointer dereferences
> > during queue teardown.
> > 
> > Add an early exit in mana_detach() when the port is already in
> > detached state (!netif_device_present) for non-close callers, making
> > it safe to call idempotently. This allows the queue reset handler and
> > other recovery paths to simply retry mana_attach() without redundant
> > teardown.
> > 
> > Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.")
> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> > ---
> >  drivers/net/ethernet/microsoft/mana/mana_en.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 0582803907a8..1e1ad2795c3c 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -3350,6 +3350,12 @@ int mana_detach(struct net_device *ndev, bool from_close)
> >  
> >  	ASSERT_RTNL();
> >  
> > +	/* If already detached (indicates detach succeeded but attach failed
> > +	 * previously). Now skip mana detach and just retry mana_attach.
> > +	 */
> > +	if (!from_close && !netif_device_present(ndev))
> > +		return 0;
> > +
> >  	apc->port_st_save = apc->port_is_up;
> >  	apc->port_is_up = false;
> 
> sashiko(gemini) notes the above can lead to different race:
> 
> ---
> Can this early return cause state machine corruption by bypassing the
> updates
> to apc->port_st_save?
> Consider this sequence:
> 1. queue_reset_work runs, mana_detach() succeeds (apc->port_st_save = true,
>    apc->port_is_up = false), but mana_attach() fails.
> 2. The admin brings the interface down (ip link set dev eth0 down), skipping
>    mana_close() since apc->port_is_up is false.
> 3. The admin changes the MTU, triggering mana_change_mtu() which calls
>    mana_detach() followed by mana_attach().
> 4. mana_detach() hits this new early return, preserving
>    apc->port_st_save == true.
> When mana_attach() runs, it sees apc->port_st_save == true and allocates
> queues, setting apc->vport_use_count = 1 and apc->port_is_up = true, even
> though the interface is administratively down.
> If the admin then brings the interface up, mana_open() will unconditionally
> call mana_alloc_queues(). That function calls mana_cfg_vport(), which will
> return -EBUSY because apc->vport_use_count is already 1.
> This leaves mana_open() failing and the interface down. Since the interface
> is already down, trying to bring it down again is a no-op, meaning
> mana_close() is never called to clean up the orphaned queues.
> Does this sequence permanently brick the port until the driver is reloaded?
> ---
> 
> I think you need to be more restrictive in the early return check.
> 
> /P
>
Hi Paolo,

Thank you for the comments,
I think the scenario pointed out by sashiko does not seems valid,
as it mentioned in step 2 and 3 admin changing MTU after bringing
interface down. This is becasue netif_set_mtu_ext() in dev.c checks
netif_device_present and returns -ENODEV before calling
ndo_change_mtu. So mana_change_mtu() is never reachable when the
device is in the !present state.

https://elixir.bootlin.com/linux/v7.0/source/net/core/dev.c#L9906 

Please let me know if the above check is good enough?

Regards
Dipayaan Roy

 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure
  2026-05-25  8:08 [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure Dipayaan Roy
  2026-05-25  8:08 ` [PATCH net v3 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
  2026-05-25  8:08 ` [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port Dipayaan Roy
@ 2026-05-28 23:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-28 23:40 UTC (permalink / raw)
  To: Dipayaan Roy
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
	pavan.chebbi

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 25 May 2026 01:08:23 -0700 you wrote:
> When mana_attach() fails (e.g. during queue allocation), the error
> cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple
> subsequent teardown paths can then dereference these NULL pointers,
> causing kernel panics.
> 
> Patch 1 adds NULL guards in the low-level teardown functions
> (mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are
> safe to call regardless of queue initialization state. This covers all
> callers: mana_remove(), mana_change_mtu() recovery, and internal error
> paths in mana_alloc_queues().
> 
> [...]

Here is the summary with links:
  - [net,v3,1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
    https://git.kernel.org/netdev/net/c/17bfe0a8c014
  - [net,v3,2/2] net: mana: Skip redundant detach on already-detached port
    https://git.kernel.org/netdev/net/c/5b05aa36ee24

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
  2026-05-25  8:08 ` [PATCH net v3 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
@ 2026-05-30  0:47   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-05-30  0:47 UTC (permalink / raw)
  To: Dipayaan Roy; +Cc: bpf, linux-hyperv

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] mana_add_rx_queues() writes to a freed apc->rxqs pointer on success path after an error.
- [Medium] mana_destroy_txq() attempts to destroy uninitialized WQ objects.
--

commit f9a2821b4cd7ff0d58042e4bdeeb3d484c4946cf
Author: Dipayaan Roy <dipayanroy@linux.microsoft.com>

net: mana: Add NULL guards in teardown path to prevent panic on attach failure

This patch adds NULL guards for apc->rxqs and apc->tx_qp in multiple teardown
paths, such as mana_fence_rqs(), mana_destroy_vport(), and
mana_dealloc_queues(), to prevent NULL pointer dereference panics when queue
allocation fails.

> When queue allocation fails partway through, the error cleanup frees
> and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as
> mana_remove(), mana_change_mtu() recovery, and internal error handling
> in mana_alloc_queues() can subsequently call into functions that
> dereference these pointers without NULL checks:

[Severity: High]
This is a pre-existing issue, but could a subsequent interface bring-up
cause a panic due to the NULL apc->rxqs pointer?

If mana_attach() fails (e.g., during mana_change_mtu() recovery), the error
path calls mana_cleanup_port_context(apc), which frees apc->rxqs and sets it
to NULL, leaving the interface in a down state.

If the interface is subsequently brought back up, mana_open() is invoked.
mana_open() calls mana_alloc_queues() directly, skipping
mana_init_port_context() where apc->rxqs is normally allocated.

When mana_alloc_queues() reaches mana_add_rx_queues(), it executes:

mana_add_rx_queues()
    apc->rxqs[i] = rxq;

Would this write to the freed apc->rxqs pointer and cause a kernel panic?

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9afc786b297a8..9e7e4bf526bf6 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c

[ ... ]

> @@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct mana_port_context *apc)
>  	struct mana_rxq *rxq;
>  	u32 rxq_idx;
>  
> -	for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> -		rxq = apc->rxqs[rxq_idx];
> -		if (!rxq)
> -			continue;
> +	if (apc->rxqs) {
>  
> -		mana_destroy_rxq(apc, rxq, true);
> -		apc->rxqs[rxq_idx] = NULL;
> +		for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> +			rxq = apc->rxqs[rxq_idx];
> +			if (!rxq)
> +				continue;
> +
> +			mana_destroy_rxq(apc, rxq, true);
> +			apc->rxqs[rxq_idx] = NULL;
> +		}
>  	}
>  
>  	mana_destroy_txq(apc);

[Severity: Medium]
This is a pre-existing issue, but does mana_destroy_txq() send invalid
teardown requests to the hardware if mana_create_txq() failed partway
through?

Unlike mana_destroy_rxq(), which explicitly checks if rxq->rxobj is valid
(rxq->rxobj != INVALID_MANA_HANDLE) before destroying it, mana_destroy_txq()
unconditionally loops over all num_queues and calls:

mana_destroy_txq()
    mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);

For queues that were not initialized, the tx_object handle is 0 because the
memory was zeroed by allocation, rather than being set to
INVALID_MANA_HANDLE.

Does this result in blindly sending a MANA_DESTROY_WQ_OBJ hardware request
with wq_obj_handle = 0 for the uninitialized queues?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260525081129.1230035-1-dipayanroy@linux.microsoft.com?part=1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port
  2026-05-25  8:08 ` [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port Dipayaan Roy
  2026-05-28  9:30   ` Paolo Abeni
@ 2026-05-30  0:47   ` sashiko-bot
  1 sibling, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-05-30  0:47 UTC (permalink / raw)
  To: Dipayaan Roy; +Cc: linux-hyperv, bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] State machine corruption in mana_detach leads to permanently bricked port.
- [Medium] The commit message falsely claims the patch fixes a NULL pointer dereference, indicating a misdiagnosis of the original bug.
--

commit 9693e5b39c52406711e627ad84a6eb7471551a72
Author: Dipayaan Roy <dipayanroy@linux.microsoft.com>

net: mana: Skip redundant detach on already-detached port

This commit adds an early exit to mana_detach() to prevent redundant teardown
when a port is already detached due to a previous failed attach.

>     When mana_per_port_queue_reset_work_handler() runs after a previous
>     detach succeeded but attach failed, the port is left in a detached
>     state with apc->tx_qp and apc->rxqs already freed. Calling
>     mana_detach() again unconditionally leads to NULL pointer dereferences
>     during queue teardown.

[Severity: Medium]
Is this description of the crash accurate? Looking at the unpatched code, if
mana_attach() fails, apc->port_is_up is left as false. If mana_detach() is
called again, apc->port_st_save is set to apc->port_is_up (which is false). 

Because mana_dealloc_queues() is guarded by if (apc->port_st_save), queue
teardown would be entirely skipped. Could the actual issue with the old code
be that apc->port_st_save was incorrectly cleared, preventing future
mana_attach() retries from allocating queues and recovering the port, rather
than a NULL pointer dereference?

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9e7e4bf526bf6..c9b1df1ed1098 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -3350,6 +3350,12 @@ int mana_detach(struct net_device *ndev, bool from_close)
>  
>  	ASSERT_RTNL();
>  
> +	/* If already detached (indicates detach succeeded but attach failed
> +	 * previously). Now skip mana detach and just retry mana_attach.
> +	 */
> +	if (!from_close && !netif_device_present(ndev))
> +		return 0;
> +

[Severity: High]
Can this early return cause state machine corruption by bypassing the updates
to apc->port_st_save?

Consider this sequence:
1. queue_reset_work runs, mana_detach() succeeds (apc->port_st_save = true,
   apc->port_is_up = false), but mana_attach() fails.
2. The admin brings the interface down (ip link set dev eth0 down), skipping
   mana_close() since apc->port_is_up is false.
3. The admin changes the MTU, triggering mana_change_mtu() which calls
   mana_detach() followed by mana_attach().
4. mana_detach() hits this new early return, preserving
   apc->port_st_save == true.

When mana_attach() runs, it sees apc->port_st_save == true and allocates
queues, setting apc->vport_use_count = 1 and apc->port_is_up = true, even
though the interface is administratively down.

If the admin then brings the interface up, mana_open() will unconditionally
call mana_alloc_queues(). That function calls mana_cfg_vport(), which will
return -EBUSY because apc->vport_use_count is already 1.

This leaves mana_open() failing and the interface down. Since the interface
is already down, trying to bring it down again is a no-op, meaning
mana_close() is never called to clean up the orphaned queues.

Does this sequence permanently brick the port until the driver is reloaded?

>  	apc->port_st_save = apc->port_is_up;
>  	apc->port_is_up = false;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260525081129.1230035-1-dipayanroy@linux.microsoft.com?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-05-30  0:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25  8:08 [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure Dipayaan Roy
2026-05-25  8:08 ` [PATCH net v3 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
2026-05-30  0:47   ` sashiko-bot
2026-05-25  8:08 ` [PATCH net v3 2/2] net: mana: Skip redundant detach on already-detached port Dipayaan Roy
2026-05-28  9:30   ` Paolo Abeni
2026-05-28 21:16     ` Dipayaan Roy
2026-05-30  0:47   ` sashiko-bot
2026-05-28 23:40 ` [PATCH net v3 0/2] net: mana: Fix NULL dereferences during teardown after attach failure patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox