[PATCH net 0/2] net: mana: Fix NULL dereferences during teardown after attach failure.

BPF List
 help / color / mirror / Atom feed

* [PATCH net 0/2] net: mana: Fix NULL dereferences during teardown after attach failure.
@ 2026-05-18 19:43 Dipayaan Roy
  2026-05-18 19:43 ` [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
  2026-05-18 19:43 ` [PATCH net 2/2] net: mana: Skip redundant detach in queue reset handler if already detached Dipayaan Roy
  0 siblings, 2 replies; 6+ messages in thread
From: Dipayaan Roy @ 2026-05-18 19:43 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov

When mana_attach() fails (e.g. during queue allocation), the error
cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple
subsequent teardown paths can then dereference these NULL pointers,
causing kernel panics.

Patch 1 adds NULL guards in the low-level teardown functions
(mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are
safe to call regardless of queue initialization state. This covers all
callers: mana_remove(), mana_change_mtu() recovery, and internal error
paths in mana_alloc_queues().

Patch 2 addresses the queue reset work handler specifically, where an
unconditional mana_detach() on an already-detached port caused
redundant teardown. It checks netif_device_present() to skip the detach
and directly retry mana_attach().

Dipayaan Roy (2):
  net: mana: Add NULL guards in teardown path to prevent panic on attach
    failure
  net: mana: Skip redundant detach in queue reset handler if already
    detached

 drivers/net/ethernet/microsoft/mana/mana_en.c | 77 ++++++++++++-------
 1 file changed, 48 insertions(+), 29 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
  2026-05-18 19:43 [PATCH net 0/2] net: mana: Fix NULL dereferences during teardown after attach failure Dipayaan Roy
@ 2026-05-18 19:43 ` Dipayaan Roy
  2026-05-18 21:49   ` Harshitha Ramamurthy
  2026-05-19 19:47   ` sashiko-bot
  2026-05-18 19:43 ` [PATCH net 2/2] net: mana: Skip redundant detach in queue reset handler if already detached Dipayaan Roy
  1 sibling, 2 replies; 6+ messages in thread
From: Dipayaan Roy @ 2026-05-18 19:43 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov

When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as
mana_remove(), mana_change_mtu() recovery, and internal error handling
in mana_alloc_queues() can subsequently call into functions that
dereference these pointers without NULL checks:

- mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer
  dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26).
- mana_destroy_vport() iterates apc->rxqs without a NULL check.
- mana_fence_rqs() iterates apc->rxqs without a NULL check.
- mana_dealloc_queues() iterates apc->tx_qp without a NULL check.

Add NULL guards for apc->rxqs in mana_fence_rqs(),
mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a
NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue
draining when TX queues were never allocated or already freed.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 70 +++++++++++--------
 1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 9afc786b297a..0582803907a8 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1727,6 +1727,9 @@ static void mana_fence_rqs(struct mana_port_context *apc)
 	struct mana_rxq *rxq;
 	int err;
 
+	if (!apc->rxqs)
+		return;
+
 	for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
 		rxq = apc->rxqs[rxq_idx];
 		err = mana_fence_rq(apc, rxq);
@@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct mana_port_context *apc)
 	struct mana_rxq *rxq;
 	u32 rxq_idx;
 
-	for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
-		rxq = apc->rxqs[rxq_idx];
-		if (!rxq)
-			continue;
+	if (apc->rxqs) {
 
-		mana_destroy_rxq(apc, rxq, true);
-		apc->rxqs[rxq_idx] = NULL;
+		for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
+			rxq = apc->rxqs[rxq_idx];
+			if (!rxq)
+				continue;
+
+			mana_destroy_rxq(apc, rxq, true);
+			apc->rxqs[rxq_idx] = NULL;
+		}
 	}
 
 	mana_destroy_txq(apc);
@@ -3269,7 +3275,8 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	if (apc->port_is_up)
 		return -EINVAL;
 
-	mana_chn_setxdp(apc, NULL);
+	if (apc->rxqs)
+		mana_chn_setxdp(apc, NULL);
 
 	if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
 		mana_pf_deregister_filter(apc);
@@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device *ndev)
 	 * number of queues.
 	 */
 
-	for (i = 0; i < apc->num_queues; i++) {
-		txq = &apc->tx_qp[i].txq;
-		tsleep = 1000;
-		while (atomic_read(&txq->pending_sends) > 0 &&
-		       time_before(jiffies, timeout)) {
-			usleep_range(tsleep, tsleep + 1000);
-			tsleep <<= 1;
-		}
-		if (atomic_read(&txq->pending_sends)) {
-			err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
-			if (err) {
-				netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
-					   err, atomic_read(&txq->pending_sends),
-					   txq->gdma_txq_id);
+	if (apc->tx_qp) {
+		for (i = 0; i < apc->num_queues; i++) {
+			txq = &apc->tx_qp[i].txq;
+			tsleep = 1000;
+			while (atomic_read(&txq->pending_sends) > 0 &&
+			       time_before(jiffies, timeout)) {
+				usleep_range(tsleep, tsleep + 1000);
+				tsleep <<= 1;
+			}
+			if (atomic_read(&txq->pending_sends)) {
+				err =
+				    pcie_flr(to_pci_dev(gd->gdma_context->dev));
+				if (err) {
+					netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
+						   err,
+					    atomic_read(&txq->pending_sends),
+					    txq->gdma_txq_id);
+				}
+				break;
 			}
-			break;
 		}
-	}
 
-	for (i = 0; i < apc->num_queues; i++) {
-		txq = &apc->tx_qp[i].txq;
-		while ((skb = skb_dequeue(&txq->pending_skbs))) {
-			mana_unmap_skb(skb, apc);
-			dev_kfree_skb_any(skb);
+		for (i = 0; i < apc->num_queues; i++) {
+			txq = &apc->tx_qp[i].txq;
+			while ((skb = skb_dequeue(&txq->pending_skbs))) {
+				mana_unmap_skb(skb, apc);
+				dev_kfree_skb_any(skb);
+			}
+			atomic_set(&txq->pending_sends, 0);
 		}
-		atomic_set(&txq->pending_sends, 0);
 	}
+
 	/* We're 100% sure the queues can no longer be woken up, because
 	 * we're sure now mana_poll_tx_cq() can't be running.
 	 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
  2026-05-18 19:43 ` [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
@ 2026-05-18 21:49   ` Harshitha Ramamurthy
  2026-05-19 19:47   ` sashiko-bot
  1 sibling, 0 replies; 6+ messages in thread
From: Harshitha Ramamurthy @ 2026-05-18 21:49 UTC (permalink / raw)
  To: Dipayaan Roy
  Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov

On Mon, May 18, 2026 at 12:52 PM Dipayaan Roy
<dipayanroy@linux.microsoft.com> wrote:
>
> When queue allocation fails partway through, the error cleanup frees
> and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as
> mana_remove(), mana_change_mtu() recovery, and internal error handling
> in mana_alloc_queues() can subsequently call into functions that
> dereference these pointers without NULL checks:
>
> - mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer
>   dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26).
> - mana_destroy_vport() iterates apc->rxqs without a NULL check.
> - mana_fence_rqs() iterates apc->rxqs without a NULL check.
> - mana_dealloc_queues() iterates apc->tx_qp without a NULL check.
>
> Add NULL guards for apc->rxqs in mana_fence_rqs(),
> mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a
> NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue
> draining when TX queues were never allocated or already freed.
>
> Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 70 +++++++++++--------
>  1 file changed, 41 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9afc786b297a..0582803907a8 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1727,6 +1727,9 @@ static void mana_fence_rqs(struct mana_port_context *apc)
>         struct mana_rxq *rxq;
>         int err;
>
> +       if (!apc->rxqs)
> +               return;
> +
>         for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
>                 rxq = apc->rxqs[rxq_idx];
>                 err = mana_fence_rq(apc, rxq);
> @@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct mana_port_context *apc)
>         struct mana_rxq *rxq;
>         u32 rxq_idx;
>
> -       for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> -               rxq = apc->rxqs[rxq_idx];
> -               if (!rxq)
> -                       continue;
> +       if (apc->rxqs) {
>
> -               mana_destroy_rxq(apc, rxq, true);
> -               apc->rxqs[rxq_idx] = NULL;
> +               for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> +                       rxq = apc->rxqs[rxq_idx];
> +                       if (!rxq)
> +                               continue;
> +
> +                       mana_destroy_rxq(apc, rxq, true);
> +                       apc->rxqs[rxq_idx] = NULL;
> +               }
>         }
>
>         mana_destroy_txq(apc);
> @@ -3269,7 +3275,8 @@ static int mana_dealloc_queues(struct net_device *ndev)
>         if (apc->port_is_up)
>                 return -EINVAL;
>
> -       mana_chn_setxdp(apc, NULL);
> +       if (apc->rxqs)
> +               mana_chn_setxdp(apc, NULL);

Is this check required? mana_dealloc_queues() is only called by
mana_detach(). And mana_detach() calls `mana_dealloc_queues()` only if
the port state was previously up. apc->port_is_up seems to be set to
true only on successful queue allocation. Can apc->rxqs be NULL here?

>
>         if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
>                 mana_pf_deregister_filter(apc);
> @@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device *ndev)
>          * number of queues.
>          */
>
> -       for (i = 0; i < apc->num_queues; i++) {
> -               txq = &apc->tx_qp[i].txq;
> -               tsleep = 1000;
> -               while (atomic_read(&txq->pending_sends) > 0 &&
> -                      time_before(jiffies, timeout)) {
> -                       usleep_range(tsleep, tsleep + 1000);
> -                       tsleep <<= 1;
> -               }
> -               if (atomic_read(&txq->pending_sends)) {
> -                       err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
> -                       if (err) {
> -                               netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
> -                                          err, atomic_read(&txq->pending_sends),
> -                                          txq->gdma_txq_id);
> +       if (apc->tx_qp) {

And same as above, is it possible for apc->tx_qp to be NULL here?

> +               for (i = 0; i < apc->num_queues; i++) {
> +                       txq = &apc->tx_qp[i].txq;
> +                       tsleep = 1000;
> +                       while (atomic_read(&txq->pending_sends) > 0 &&
> +                              time_before(jiffies, timeout)) {
> +                               usleep_range(tsleep, tsleep + 1000);
> +                               tsleep <<= 1;
> +                       }
> +                       if (atomic_read(&txq->pending_sends)) {
> +                               err =
> +                                   pcie_flr(to_pci_dev(gd->gdma_context->dev));
> +                               if (err) {
> +                                       netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
> +                                                  err,
> +                                           atomic_read(&txq->pending_sends),
> +                                           txq->gdma_txq_id);
> +                               }
> +                               break;
>                         }
> -                       break;
>                 }
> -       }
>
> -       for (i = 0; i < apc->num_queues; i++) {
> -               txq = &apc->tx_qp[i].txq;
> -               while ((skb = skb_dequeue(&txq->pending_skbs))) {
> -                       mana_unmap_skb(skb, apc);
> -                       dev_kfree_skb_any(skb);
> +               for (i = 0; i < apc->num_queues; i++) {
> +                       txq = &apc->tx_qp[i].txq;
> +                       while ((skb = skb_dequeue(&txq->pending_skbs))) {
> +                               mana_unmap_skb(skb, apc);
> +                               dev_kfree_skb_any(skb);
> +                       }
> +                       atomic_set(&txq->pending_sends, 0);
>                 }
> -               atomic_set(&txq->pending_sends, 0);
>         }
> +
>         /* We're 100% sure the queues can no longer be woken up, because
>          * we're sure now mana_poll_tx_cq() can't be running.
>          */
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
  2026-05-18 19:43 ` [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
  2026-05-18 21:49   ` Harshitha Ramamurthy
@ 2026-05-19 19:47   ` sashiko-bot
  1 sibling, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-05-19 19:47 UTC (permalink / raw)
  To: Dipayaan Roy; +Cc: linux-hyperv, bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Missing re-allocation of `apc->rxqs` during interface bring-up causes deterministic NULL pointer dereference
- [High] Uninitialized queue handles (0 and INVALID_MANA_HANDLE) are sent to the firmware during error teardown, corrupting hardware state
- [High] GDMA admin commands are issued immediately after a PCI Function Level Reset (FLR), causing severe cascading timeouts and hardware state violations
--

commit 2129283d37aabfe724d7509e34346eab93d302a2
Author: Dipayaan Roy <dipayanroy@linux.microsoft.com>

net: mana: Add NULL guards in teardown path to prevent panic on attach failure
    
When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. The patch adds NULL guards for these
pointers in teardown paths such as mana_destroy_vport(), mana_fence_rqs(),
and mana_dealloc_queues() to prevent NULL pointer dereferences.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9afc786b297a8..9e7e4bf526bf6 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
[ ... ]
> @@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct mana_port_context *apc)
>  	struct mana_rxq *rxq;
>  	u32 rxq_idx;
>  
> -	for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> -		rxq = apc->rxqs[rxq_idx];
> -		if (!rxq)
> -			continue;
> +	if (apc->rxqs) {
>  
> -		mana_destroy_rxq(apc, rxq, true);
> -		apc->rxqs[rxq_idx] = NULL;
> +		for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> +			rxq = apc->rxqs[rxq_idx];
> +			if (!rxq)
> +				continue;
> +
> +			mana_destroy_rxq(apc, rxq, true);
> +			apc->rxqs[rxq_idx] = NULL;
> +		}
>  	}

If the teardown path frees apc->rxqs and sets it to NULL on attach failure,
how does the interface recover if the user subsequently brings it back up?

When mana_open() is invoked, it directly calls mana_alloc_queues(). Inside
mana_alloc_queues() -> mana_add_rx_queues(), the code does:

    apc->rxqs[i] = rxq;

If apc->rxqs was left as NULL during the teardown, will this cause a NULL
pointer dereference on the next interface bring-up?

[ ... ]
> @@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	 * number of queues.
>  	 */
>  
> -	for (i = 0; i < apc->num_queues; i++) {
> -		txq = &apc->tx_qp[i].txq;
> -		tsleep = 1000;
> -		while (atomic_read(&txq->pending_sends) > 0 &&
> -		       time_before(jiffies, timeout)) {
> -			usleep_range(tsleep, tsleep + 1000);
> -			tsleep <<= 1;
> -		}
> -		if (atomic_read(&txq->pending_sends)) {
> -			err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
> -			if (err) {
> -				netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
> -					   err, atomic_read(&txq->pending_sends),
> -					   txq->gdma_txq_id);
> +	if (apc->tx_qp) {

In the error path where queue creation fails partway through, will
mana_destroy_txq() be called?

If mana_destroy_txq() unconditionally iterates over all apc->num_queues:

    for (i = 0; i < apc->num_queues; i++) {
        ...
        mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);
        ...

Will it pass 0 or INVALID_MANA_HANDLE to mana_destroy_wq_obj() for queues
that were never fully initialized?

Does mana_destroy_wq_obj() send a MANA_DESTROY_WQ_OBJ command to the hardware
with these uninitialized handles, and could handle 0 corrupt the firmware
state?

> +		for (i = 0; i < apc->num_queues; i++) {
> +			txq = &apc->tx_qp[i].txq;
> +			tsleep = 1000;
> +			while (atomic_read(&txq->pending_sends) > 0 &&
> +			       time_before(jiffies, timeout)) {
> +				usleep_range(tsleep, tsleep + 1000);
> +				tsleep <<= 1;
> +			}
> +			if (atomic_read(&txq->pending_sends)) {
> +				err =
> +				    pcie_flr(to_pci_dev(gd->gdma_context->dev));

If pcie_flr() is triggered here, the PCI Function Level Reset wipes the
internal GDMA state.

Further down in this function, mana_disable_vport_rx(), mana_fence_rqs(),
and mana_destroy_vport() are called. Will these functions attempt to send
commands to the now-destroyed GDMA admin queue via mana_send_request() and
wait for completions?

Could this lead to cascading timeouts since the hardware was just reset and
cannot process admin commands?

> +				if (err) {
> +					netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
> +						   err,
> +					    atomic_read(&txq->pending_sends),
> +					    txq->gdma_txq_id);
> +				}
> +				break;
>  			}
> -			break;
>  		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518194654.735580-1-dipayanroy@linux.microsoft.com?part=1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net 2/2] net: mana: Skip redundant detach in queue reset handler if already detached
  2026-05-18 19:43 [PATCH net 0/2] net: mana: Fix NULL dereferences during teardown after attach failure Dipayaan Roy
  2026-05-18 19:43 ` [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
@ 2026-05-18 19:43 ` Dipayaan Roy
  2026-05-19 19:47   ` sashiko-bot
  1 sibling, 1 reply; 6+ messages in thread
From: Dipayaan Roy @ 2026-05-18 19:43 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
	kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
	ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
	linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
	john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov

When mana_per_port_queue_reset_work_handler() runs, it unconditionally
calls mana_detach() which attempts to tear down queues that are already
freed, leading to NULL pointer dereferences on apc->tx_qp and
apc->rxqs.

Check netif_device_present() in the reset handler and skip
mana_detach() when the port is already in detached state. This avoids
the redundant teardown and goes directly to mana_attach() to retry
bringing the port back up.

Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 0582803907a8..907efadf6fd6 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -316,12 +316,19 @@ static void mana_per_port_queue_reset_work_handler(struct work_struct *work)
 		goto out;
 	}
 
+	/* If already detached (indicates detach succeeded but attach failed
+	 * previously). Now skip mana detach and just retry mana_attach.
+	 */
+	if (!netif_device_present(ndev))
+		goto attach;
+
 	err = mana_detach(ndev, false);
 	if (err) {
 		netdev_err(ndev, "mana_detach failed: %d\n", err);
 		goto dealloc_pre_rxbufs;
 	}
 
+attach:
 	err = mana_attach(ndev);
 	if (err)
 		netdev_err(ndev, "mana_attach failed: %d\n", err);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net 2/2] net: mana: Skip redundant detach in queue reset handler if already detached
  2026-05-18 19:43 ` [PATCH net 2/2] net: mana: Skip redundant detach in queue reset handler if already detached Dipayaan Roy
@ 2026-05-19 19:47   ` sashiko-bot
  0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-05-19 19:47 UTC (permalink / raw)
  To: Dipayaan Roy; +Cc: bpf, linux-hyperv

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The patch correctly prevents `mana_detach()` from running on an already-detached device in the reset handler, but identical bugs remain in `mana_change_mtu()`, `mana_set_channels()`, `mana_set_ringparam()`, and `mana_bpf()`.
--

commit bac73a4812067e75feeec962b27ab9647e30ed9c
Author: Dipayaan Roy <dipayanroy@linux.microsoft.com>

net: mana: Skip redundant detach in queue reset handler if already detached

This commit adds a check for netif_device_present() in the queue reset
handler. If the device is already detached, it skips the redundant
mana_detach() call and proceeds directly to mana_attach().

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9e7e4bf526bf6..740f3c34db451 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -316,12 +316,19 @@ static void mana_per_port_queue_reset_work_handler(struct work_struct *work)
[ ... ]
> +	/* If already detached (indicates detach succeeded but attach failed
> +	 * previously). Now skip mana detach and just retry mana_attach.
> +	 */
> +	if (!netif_device_present(ndev))
> +		goto attach;
> +
>  	err = mana_detach(ndev, false);

The fix here correctly prevents mana_detach() from running on an already
detached device. However, do identical issues remain in mana_change_mtu(),
mana_set_channels(), mana_set_ringparam(), and mana_bpf()?

If mana_attach() previously failed (for example, due to out of memory),
the port is left with apc->port_is_up = false, but it preserves
apc->port_st_save = true, and calls netif_device_detach(ndev).

If an admin tries to recover the port by retrying a configuration command
like changing the MTU or attaching XDP, these functions appear to
unconditionally call mana_detach(ndev, false) without the
!netif_device_present(ndev) check added here.

Inside mana_detach(), apc->port_st_save = apc->port_is_up executes. Since
port_is_up is already false, port_st_save is overwritten to false,
permanently erasing the intent that the port should be up.

The subsequent call to mana_attach() would then see port_st_save == false,
skip mana_alloc_queues(), set port_is_up = false, and successfully call
netif_device_attach(ndev).

Would this result in the configuration command returning 0 to userspace
while the port queues were never allocated, leaving the interface silently
broken until manually toggled down and up?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260518194654.735580-1-dipayanroy@linux.microsoft.com?part=2

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-19 19:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 19:43 [PATCH net 0/2] net: mana: Fix NULL dereferences during teardown after attach failure Dipayaan Roy
2026-05-18 19:43 ` [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on " Dipayaan Roy
2026-05-18 21:49   ` Harshitha Ramamurthy
2026-05-19 19:47   ` sashiko-bot
2026-05-18 19:43 ` [PATCH net 2/2] net: mana: Skip redundant detach in queue reset handler if already detached Dipayaan Roy
2026-05-19 19:47   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox