All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Aditya Garg <gargaditya@linux.microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Jakub Kicinski <kuba@kernel.org>, Sasha Levin <sashal@kernel.org>,
	kys@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	longli@microsoft.com, kotaranov@microsoft.com, leon@kernel.org,
	shradhagupta@linux.microsoft.com, mlevitsk@redhat.com,
	ernis@linux.microsoft.com, yury.norov@gmail.com,
	ssengar@linux.microsoft.com, dipayanroy@linux.microsoft.com,
	shirazsaleem@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-rdma@vger.kernel.org
Subject: [PATCH AUTOSEL 6.18-6.12] net: mana: Drop TX skb on post_work_request failure and unmap resources
Date: Mon,  8 Dec 2025 19:15:21 -0500	[thread overview]
Message-ID: <20251209001610.611575-29-sashal@kernel.org> (raw)
In-Reply-To: <20251209001610.611575-1-sashal@kernel.org>

From: Aditya Garg <gargaditya@linux.microsoft.com>

[ Upstream commit 45120304e84171fd215c1b57b15b285446d15106 ]

Drop TX packets when posting the work request fails and ensure DMA
mappings are always cleaned up.

Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Looking at this commit, I need to analyze it for stable backporting
suitability.

## Commit Analysis

### What the commit does:

1. **In `mana_en.c`**: When `mana_gd_post_work_request()` fails during
   TX:
   - Previously: returned `NETDEV_TX_BUSY` without cleaning up DMA
     mappings
   - Now: calls `mana_unmap_skb()` to clean up DMA mappings, then drops
     the packet properly

2. **In `gdma_main.c`**: Removes a noisy `dev_err` message for flow
   control failure

3. **In `mana.h`**: Exports `mana_unmap_skb()` function for use across
   files

### Bug Being Fixed

Looking at the original error path:
```c
if (err) {
    (void)skb_dequeue_tail(&txq->pending_skbs);
    netdev_warn(ndev, "Failed to post TX OOB: %d\n", err);
    err = NETDEV_TX_BUSY;
    goto tx_busy;  // DMA mappings NOT unmapped!
}
```

The original code has **two bugs**:
1. **DMA resource leak**: When `mana_gd_post_work_request()` fails, the
   DMA mappings set up earlier (via `mana_map_skb()`) are never freed
2. **Improper error handling**: Returns `NETDEV_TX_BUSY` which tells the
   network stack to retry, but the skb was already dequeued from
   `pending_skbs`, leading to inconsistent state

### Code Change Assessment

The fix is straightforward and surgical:
```c
if (err) {
    (void)skb_dequeue_tail(&txq->pending_skbs);
    mana_unmap_skb(skb, apc);  // NEW: Clean up DMA mappings
    netdev_warn(ndev, "Failed to post TX OOB: %d\n", err);
    goto free_sgl_ptr;  // Drop packet properly
}
```

### Stable Criteria Evaluation

| Criterion | Assessment |
|-----------|------------|
| Fixes real bug | ✅ Yes - DMA resource leak and improper error handling
|
| Obviously correct | ✅ Yes - unmapping DMA on error is clearly correct
|
| Small and contained | ✅ Yes - only affects error path in MANA driver |
| No new features | ✅ Yes - pure bug fix |
| User impact | ✅ Medium-high - affects Azure VM users; leaks can cause
long-term instability |

### Risk Assessment

- **Low risk**: Changes only affect the TX error path
- **Self-contained**: The `mana_unmap_skb()` function already exists,
  just made non-static
- **No dependencies**: This commit is standalone
- **Well-tested**: Has "Reviewed-by" tag from Microsoft maintainer

### Missing Signals

- No `Cc: stable@vger.kernel.org` tag
- No `Fixes:` tag

However, this is clearly fixing a resource leak bug that has existed
since the error path was written. Resource leaks in network drivers are
exactly the kind of bugs stable trees should fix.

### User Impact

The MANA driver is used in Microsoft Azure VMs. DMA mapping leaks:
- Cause memory pressure over time
- Can lead to allocation failures under sustained TX errors
- May cause VM instability in production cloud environments

This is a production-relevant bug affecting real cloud users.

### Conclusion

This commit fixes a genuine resource leak bug in the MANA network
driver. The fix is:
- Small (adds one function call + declaration)
- Surgical (only affects error path)
- Obviously correct (unmapping DMA on error is mandatory)
- Low risk (doesn't touch normal TX path)
- Important for production Azure VM users

The lack of stable/Fixes tags is unfortunate but doesn't diminish the
clear bug-fix nature of this change.

**YES**

 drivers/net/ethernet/microsoft/mana/gdma_main.c | 6 +-----
 drivers/net/ethernet/microsoft/mana/mana_en.c   | 7 +++----
 include/net/mana/mana.h                         | 1 +
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 43f034e180c41..7d232ac250958 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1299,7 +1299,6 @@ int mana_gd_post_work_request(struct gdma_queue *wq,
 			      struct gdma_posted_wqe_info *wqe_info)
 {
 	u32 client_oob_size = wqe_req->inline_oob_size;
-	struct gdma_context *gc;
 	u32 sgl_data_size;
 	u32 max_wqe_size;
 	u32 wqe_size;
@@ -1329,11 +1328,8 @@ int mana_gd_post_work_request(struct gdma_queue *wq,
 	if (wqe_size > max_wqe_size)
 		return -EINVAL;
 
-	if (wq->monitor_avl_buf && wqe_size > mana_gd_wq_avail_space(wq)) {
-		gc = wq->gdma_dev->gdma_context;
-		dev_err(gc->dev, "unsuccessful flow control!\n");
+	if (wq->monitor_avl_buf && wqe_size > mana_gd_wq_avail_space(wq))
 		return -ENOSPC;
-	}
 
 	if (wqe_info)
 		wqe_info->wqe_size_in_bu = wqe_size / GDMA_WQE_BU_SIZE;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 0142fd98392c2..6d37f39930453 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -438,9 +438,9 @@ netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 
 	if (err) {
 		(void)skb_dequeue_tail(&txq->pending_skbs);
+		mana_unmap_skb(skb, apc);
 		netdev_warn(ndev, "Failed to post TX OOB: %d\n", err);
-		err = NETDEV_TX_BUSY;
-		goto tx_busy;
+		goto free_sgl_ptr;
 	}
 
 	err = NETDEV_TX_OK;
@@ -460,7 +460,6 @@ netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	tx_stats->bytes += len + ((num_gso_seg - 1) * gso_hs);
 	u64_stats_update_end(&tx_stats->syncp);
 
-tx_busy:
 	if (netif_tx_queue_stopped(net_txq) && mana_can_tx(gdma_sq)) {
 		netif_tx_wake_queue(net_txq);
 		apc->eth_stats.wake_queue++;
@@ -1606,7 +1605,7 @@ static int mana_move_wq_tail(struct gdma_queue *wq, u32 num_units)
 	return 0;
 }
 
-static void mana_unmap_skb(struct sk_buff *skb, struct mana_port_context *apc)
+void mana_unmap_skb(struct sk_buff *skb, struct mana_port_context *apc)
 {
 	struct mana_skb_head *ash = (struct mana_skb_head *)skb->head;
 	struct gdma_context *gc = apc->ac->gdma_dev->gdma_context;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 0921485565c05..330e1bb088bb9 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -580,6 +580,7 @@ int mana_set_bw_clamp(struct mana_port_context *apc, u32 speed,
 void mana_query_phy_stats(struct mana_port_context *apc);
 int mana_pre_alloc_rxbufs(struct mana_port_context *apc, int mtu, int num_queues);
 void mana_pre_dealloc_rxbufs(struct mana_port_context *apc);
+void mana_unmap_skb(struct sk_buff *skb, struct mana_port_context *apc);
 
 extern const struct ethtool_ops mana_ethtool_ops;
 extern struct dentry *mana_debugfs_root;
-- 
2.51.0


  parent reply	other threads:[~2025-12-09  0:17 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-09  0:14 [PATCH AUTOSEL 6.18-6.1] ksmbd: fix use-after-free in ksmbd_tree_connect_put under concurrency Sasha Levin
2025-12-09  0:14 ` [PATCH AUTOSEL 6.18-6.17] wifi: rtw89: use skb_dequeue() for queued ROC packets to prevent racing Sasha Levin
2025-12-09  0:14 ` [PATCH AUTOSEL 6.18-6.6] ipv6: clean up routes when manually removing address with a lifetime Sasha Levin
2025-12-09  0:14 ` [PATCH AUTOSEL 6.18-5.10] ext4: remove page offset calculation in ext4_block_zero_page_range() Sasha Levin
2025-12-09  0:14 ` [PATCH AUTOSEL 6.18-6.6] fs/ntfs3: fix KMSAN uninit-value in ni_create_attr_list Sasha Levin
2025-12-09  0:14 ` [PATCH AUTOSEL 6.18-6.6] btrfs: abort transaction on item count overflow in __push_leaf_left() Sasha Levin
2025-12-09  0:14 ` [PATCH AUTOSEL 6.18-6.1] smb/server: fix return value of smb2_ioctl() Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.1] gfs2: Fix use of bio_chain Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] Bluetooth: btusb: Add new VID/PID 13d3/3533 for RTL8821CE Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] wifi: mac80211: reset CRC valid after CSA Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] Bluetooth: btusb: Add new VID/PID 0x0489/0xE12F for RTL8852BE-VT Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] wifi: mt76: mmio_*_copy fix byte order and alignment Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] btrfs: scrub: always update btrfs_scrub_progress::last_physical Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] bpf: Skip bounds adjustment for conditional jumps on same scalar register Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] wifi: rtl8xxxu: Fix HT40 channel config for RTL8192CU, RTL8723AU Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] Bluetooth: btusb: MT7920: Add VID/PID 0489/e135 Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] Bluetooth: btusb: MT7922: Add VID/PID 0489/e170 Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] virtio_blk: NULL out vqs to avoid double free on failed resume Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.1] kbuild: Use objtree for module signing key path Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.17] btrfs: use kvcalloc for btrfs_bio::csum allocation Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] net: sched: Don't use WARN_ON_ONCE() for -ENOMEM in tcf_classify() Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] hfsplus: Verify inode mode when loading from disk Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.6] gfs2: fix remote evict for read-only filesystems Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] net: amd-xgbe: use EOPNOTSUPP instead of ENOTSUPP in xgbe_phy_mii_read_c45 Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] net: init shinfo->gso_segs from qdisc_pkt_len_init() Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.17] Bluetooth: btusb: add new custom firmwares Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] cxgb4: Rename sched_class to avoid type clash Sasha Levin
2025-12-09  0:15 ` Sasha Levin [this message]
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] hfsplus: fix volume corruption issue for generic/070 Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.17] wifi: rtw89: rtw8852bu: Added dev id for ASUS AX57 NANO USB Wifi dongle Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] net: restore napi_consume_skb()'s NULL-handling Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.15] fs/ntfs3: Support timestamps prior to epoch Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.1] smb/server: fix return value of smb2_query_dir() Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.17] wifi: rtw88: Add BUFFALO WI-U3-866DHP to the USB ID list Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.6] Bluetooth: btusb: Add new VID/PID 2b89/6275 for RTL8761BUV Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] bpf: Disable file_alloc_security hook Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.1] wifi: rtw89: phy: fix out-of-bounds access in rtw89_phy_read_txpwr_limit() Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.6] ntfs: set dummy blocksize to read boot_block when mounting Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-5.10] hfsplus: fix volume corruption issue for generic/073 Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] wifi: mt76: mt792x: fix wifi init fail by setting MCU_RUNNING after CLC load Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.12] gfs2: Fix "gfs2: Switch to wait_event in gfs2_quotad" Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.6] ksmbd: vfs: fix race on m_flags in vfs_cache Sasha Levin
2025-12-09  0:15 ` [PATCH AUTOSEL 6.18-6.1] wifi: rtw89: flush TX queue before deleting key Sasha Levin
2025-12-09  0:15 ` [Intel-wired-lan] [PATCH AUTOSEL 6.18-6.12] ice: Allow 100M speed for E825C SGMII device Sasha Levin
2025-12-09  0:15   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251209001610.611575-29-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=decui@microsoft.com \
    --cc=dipayanroy@linux.microsoft.com \
    --cc=ernis@linux.microsoft.com \
    --cc=gargaditya@linux.microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kotaranov@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mlevitsk@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=shirazsaleem@microsoft.com \
    --cc=shradhagupta@linux.microsoft.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=stable@vger.kernel.org \
    --cc=wei.liu@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.