* [PATCH AUTOSEL 6.18-6.12] net: mana: Drop TX skb on post_work_request failure and unmap resources
[not found] <20251209001610.611575-1-sashal@kernel.org>
@ 2025-12-09 0:15 ` Sasha Levin
0 siblings, 0 replies; only message in thread
From: Sasha Levin @ 2025-12-09 0:15 UTC (permalink / raw)
To: patches, stable
Cc: Aditya Garg, Haiyang Zhang, Jakub Kicinski, Sasha Levin, kys,
wei.liu, decui, longli, kotaranov, leon, shradhagupta, mlevitsk,
ernis, yury.norov, ssengar, dipayanroy, shirazsaleem,
linux-hyperv, linux-rdma
From: Aditya Garg <gargaditya@linux.microsoft.com>
[ Upstream commit 45120304e84171fd215c1b57b15b285446d15106 ]
Drop TX packets when posting the work request fails and ensure DMA
mappings are always cleaned up.
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Looking at this commit, I need to analyze it for stable backporting
suitability.
## Commit Analysis
### What the commit does:
1. **In `mana_en.c`**: When `mana_gd_post_work_request()` fails during
TX:
- Previously: returned `NETDEV_TX_BUSY` without cleaning up DMA
mappings
- Now: calls `mana_unmap_skb()` to clean up DMA mappings, then drops
the packet properly
2. **In `gdma_main.c`**: Removes a noisy `dev_err` message for flow
control failure
3. **In `mana.h`**: Exports `mana_unmap_skb()` function for use across
files
### Bug Being Fixed
Looking at the original error path:
```c
if (err) {
(void)skb_dequeue_tail(&txq->pending_skbs);
netdev_warn(ndev, "Failed to post TX OOB: %d\n", err);
err = NETDEV_TX_BUSY;
goto tx_busy; // DMA mappings NOT unmapped!
}
```
The original code has **two bugs**:
1. **DMA resource leak**: When `mana_gd_post_work_request()` fails, the
DMA mappings set up earlier (via `mana_map_skb()`) are never freed
2. **Improper error handling**: Returns `NETDEV_TX_BUSY` which tells the
network stack to retry, but the skb was already dequeued from
`pending_skbs`, leading to inconsistent state
### Code Change Assessment
The fix is straightforward and surgical:
```c
if (err) {
(void)skb_dequeue_tail(&txq->pending_skbs);
mana_unmap_skb(skb, apc); // NEW: Clean up DMA mappings
netdev_warn(ndev, "Failed to post TX OOB: %d\n", err);
goto free_sgl_ptr; // Drop packet properly
}
```
### Stable Criteria Evaluation
| Criterion | Assessment |
|-----------|------------|
| Fixes real bug | ✅ Yes - DMA resource leak and improper error handling
|
| Obviously correct | ✅ Yes - unmapping DMA on error is clearly correct
|
| Small and contained | ✅ Yes - only affects error path in MANA driver |
| No new features | ✅ Yes - pure bug fix |
| User impact | ✅ Medium-high - affects Azure VM users; leaks can cause
long-term instability |
### Risk Assessment
- **Low risk**: Changes only affect the TX error path
- **Self-contained**: The `mana_unmap_skb()` function already exists,
just made non-static
- **No dependencies**: This commit is standalone
- **Well-tested**: Has "Reviewed-by" tag from Microsoft maintainer
### Missing Signals
- No `Cc: stable@vger.kernel.org` tag
- No `Fixes:` tag
However, this is clearly fixing a resource leak bug that has existed
since the error path was written. Resource leaks in network drivers are
exactly the kind of bugs stable trees should fix.
### User Impact
The MANA driver is used in Microsoft Azure VMs. DMA mapping leaks:
- Cause memory pressure over time
- Can lead to allocation failures under sustained TX errors
- May cause VM instability in production cloud environments
This is a production-relevant bug affecting real cloud users.
### Conclusion
This commit fixes a genuine resource leak bug in the MANA network
driver. The fix is:
- Small (adds one function call + declaration)
- Surgical (only affects error path)
- Obviously correct (unmapping DMA on error is mandatory)
- Low risk (doesn't touch normal TX path)
- Important for production Azure VM users
The lack of stable/Fixes tags is unfortunate but doesn't diminish the
clear bug-fix nature of this change.
**YES**
drivers/net/ethernet/microsoft/mana/gdma_main.c | 6 +-----
drivers/net/ethernet/microsoft/mana/mana_en.c | 7 +++----
include/net/mana/mana.h | 1 +
3 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 43f034e180c41..7d232ac250958 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1299,7 +1299,6 @@ int mana_gd_post_work_request(struct gdma_queue *wq,
struct gdma_posted_wqe_info *wqe_info)
{
u32 client_oob_size = wqe_req->inline_oob_size;
- struct gdma_context *gc;
u32 sgl_data_size;
u32 max_wqe_size;
u32 wqe_size;
@@ -1329,11 +1328,8 @@ int mana_gd_post_work_request(struct gdma_queue *wq,
if (wqe_size > max_wqe_size)
return -EINVAL;
- if (wq->monitor_avl_buf && wqe_size > mana_gd_wq_avail_space(wq)) {
- gc = wq->gdma_dev->gdma_context;
- dev_err(gc->dev, "unsuccessful flow control!\n");
+ if (wq->monitor_avl_buf && wqe_size > mana_gd_wq_avail_space(wq))
return -ENOSPC;
- }
if (wqe_info)
wqe_info->wqe_size_in_bu = wqe_size / GDMA_WQE_BU_SIZE;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 0142fd98392c2..6d37f39930453 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -438,9 +438,9 @@ netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev)
if (err) {
(void)skb_dequeue_tail(&txq->pending_skbs);
+ mana_unmap_skb(skb, apc);
netdev_warn(ndev, "Failed to post TX OOB: %d\n", err);
- err = NETDEV_TX_BUSY;
- goto tx_busy;
+ goto free_sgl_ptr;
}
err = NETDEV_TX_OK;
@@ -460,7 +460,6 @@ netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev)
tx_stats->bytes += len + ((num_gso_seg - 1) * gso_hs);
u64_stats_update_end(&tx_stats->syncp);
-tx_busy:
if (netif_tx_queue_stopped(net_txq) && mana_can_tx(gdma_sq)) {
netif_tx_wake_queue(net_txq);
apc->eth_stats.wake_queue++;
@@ -1606,7 +1605,7 @@ static int mana_move_wq_tail(struct gdma_queue *wq, u32 num_units)
return 0;
}
-static void mana_unmap_skb(struct sk_buff *skb, struct mana_port_context *apc)
+void mana_unmap_skb(struct sk_buff *skb, struct mana_port_context *apc)
{
struct mana_skb_head *ash = (struct mana_skb_head *)skb->head;
struct gdma_context *gc = apc->ac->gdma_dev->gdma_context;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 0921485565c05..330e1bb088bb9 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -580,6 +580,7 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
void mana_query_phy_stats(struct mana_port_context *apc);
int mana_pre_alloc_rxbufs(struct mana_port_context *apc, int mtu, int num_queues);
void mana_pre_dealloc_rxbufs(struct mana_port_context *apc);
+void mana_unmap_skb(struct sk_buff *skb, struct mana_port_context *apc);
extern const struct ethtool_ops mana_ethtool_ops;
extern struct dentry *mana_debugfs_root;
--
2.51.0
^ permalink raw reply related [flat|nested] only message in thread