* [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers
@ 2026-07-01 14:15 Dipayaan Roy
2026-07-01 14:15 ` [PATCH net-next v11 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Dipayaan Roy @ 2026-07-01 14:15 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
pavan.chebbi
On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool
fragments for allocation in the RX refill path (~2kB buffer per fragment)
causes 15-20% throughput regression under high connection counts
(>16 TCP streams at 180+ Gbps). Using full-page buffers on these
platforms shows no regression and restores line-rate performance.
This behavior is observed on a single platform; other platforms
perform better with page_pool fragments, indicating this is not a
page_pool issue but platform-specific.
This series adds an ethtool private flag "full-page-rx" to let the
user opt in to one RX buffer per page:
ethtool --set-priv-flags eth0 full-page-rx on
There is no behavioral change by default. The flag can be persisted
via udev rule for affected platforms.
This series depends on the following fixes now merged in net-next:
commit 17bfe0a8c014 ("net: mana: Add NULL guards in teardown path to prevent panic on attach failure")
commit 5b05aa36ee24 ("net: mana: Skip redundant detach on already-detached port")
Changes in v11:
- Rebased on net-next
Changes in v10:
- Rebased on net-next which now includes the prerequisite fixes.
- Recovery logic in mana_set_priv_flags() leverages the idempotent
mana_detach() from the merged fixes.
Changes in v9:
- Added correct tree.
Changes in v8:
- Fixed queue_reset_work recovery by restoring port_is_up before
scheduling reset so the handler can properly re-attach.
- Simplified "err && schedule_port_reset" to "schedule_port_reset".
Changes in v7:
- Rebased onto net-next.
- Retained private flag approach after David Wei's testing on
Grace (ARM64) confirmed that fragment mode outperforms
full-page mode on other platforms, validating this is a
single-platform workaround rather than a generic issue.
Changes in v6:
- Added missed maintainers.
Changes in v5:
- Split prep refactor into separate patch (patch 1/2)
Changes in v4:
- Dropping the smbios string parsing and add ethtool priv flag
to reconfigure the queues with full page rx buffers.
Changes in v3:
- changed u8* to char*
Changes in v2:
- separate reading string index and the string, remove inline.
Dipayaan Roy (2):
net: mana: refactor mana_get_strings() and mana_get_sset_count() to
use switch
net: mana: force full-page RX buffers via ethtool private flag
drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++-
.../ethernet/microsoft/mana/mana_ethtool.c | 178 +++++++++++++++---
include/net/mana/mana.h | 8 +
3 files changed, 177 insertions(+), 31 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v11 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch
2026-07-01 14:15 [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy
@ 2026-07-01 14:15 ` Dipayaan Roy
2026-07-01 14:15 ` [PATCH net-next v11 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
2026-07-01 14:45 ` [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers Maciej Fijalkowski
2 siblings, 0 replies; 5+ messages in thread
From: Dipayaan Roy @ 2026-07-01 14:15 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
pavan.chebbi
Refactor mana_get_strings() and mana_get_sset_count() from if/else to
switch statements in preparation for adding ethtool private flags
support which requires handling ETH_SS_PRIV_FLAGS.
No functional change.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
.../ethernet/microsoft/mana/mana_ethtool.c | 75 ++++++++++++-------
1 file changed, 46 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index 94e658d07a27..fa9c49592828 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -138,53 +138,70 @@ static int mana_get_sset_count(struct net_device *ndev, int stringset)
struct mana_port_context *apc = netdev_priv(ndev);
unsigned int num_queues = apc->num_queues;
- if (stringset != ETH_SS_STATS)
+ switch (stringset) {
+ case ETH_SS_STATS:
+ return ARRAY_SIZE(mana_eth_stats) +
+ ARRAY_SIZE(mana_phy_stats) +
+ ARRAY_SIZE(mana_hc_stats) +
+ num_queues * (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT);
+ default:
return -EINVAL;
-
- return ARRAY_SIZE(mana_eth_stats) + ARRAY_SIZE(mana_phy_stats) + ARRAY_SIZE(mana_hc_stats) +
- num_queues * (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT);
+ }
}
-static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
+static void mana_get_strings_stats(struct mana_port_context *apc, u8 **data)
{
- struct mana_port_context *apc = netdev_priv(ndev);
unsigned int num_queues = apc->num_queues;
int i, j;
- if (stringset != ETH_SS_STATS)
- return;
for (i = 0; i < ARRAY_SIZE(mana_eth_stats); i++)
- ethtool_puts(&data, mana_eth_stats[i].name);
+ ethtool_puts(data, mana_eth_stats[i].name);
for (i = 0; i < ARRAY_SIZE(mana_hc_stats); i++)
- ethtool_puts(&data, mana_hc_stats[i].name);
+ ethtool_puts(data, mana_hc_stats[i].name);
for (i = 0; i < ARRAY_SIZE(mana_phy_stats); i++)
- ethtool_puts(&data, mana_phy_stats[i].name);
+ ethtool_puts(data, mana_phy_stats[i].name);
for (i = 0; i < num_queues; i++) {
- ethtool_sprintf(&data, "rx_%d_packets", i);
- ethtool_sprintf(&data, "rx_%d_bytes", i);
- ethtool_sprintf(&data, "rx_%d_xdp_drop", i);
- ethtool_sprintf(&data, "rx_%d_xdp_tx", i);
- ethtool_sprintf(&data, "rx_%d_xdp_redirect", i);
- ethtool_sprintf(&data, "rx_%d_pkt_len0_err", i);
+ ethtool_sprintf(data, "rx_%d_packets", i);
+ ethtool_sprintf(data, "rx_%d_bytes", i);
+ ethtool_sprintf(data, "rx_%d_xdp_drop", i);
+ ethtool_sprintf(data, "rx_%d_xdp_tx", i);
+ ethtool_sprintf(data, "rx_%d_xdp_redirect", i);
+ ethtool_sprintf(data, "rx_%d_pkt_len0_err", i);
for (j = 0; j < MANA_RXCOMP_OOB_NUM_PPI - 1; j++)
- ethtool_sprintf(&data, "rx_%d_coalesced_cqe_%d", i, j + 2);
+ ethtool_sprintf(data,
+ "rx_%d_coalesced_cqe_%d",
+ i,
+ j + 2);
}
for (i = 0; i < num_queues; i++) {
- ethtool_sprintf(&data, "tx_%d_packets", i);
- ethtool_sprintf(&data, "tx_%d_bytes", i);
- ethtool_sprintf(&data, "tx_%d_xdp_xmit", i);
- ethtool_sprintf(&data, "tx_%d_tso_packets", i);
- ethtool_sprintf(&data, "tx_%d_tso_bytes", i);
- ethtool_sprintf(&data, "tx_%d_tso_inner_packets", i);
- ethtool_sprintf(&data, "tx_%d_tso_inner_bytes", i);
- ethtool_sprintf(&data, "tx_%d_long_pkt_fmt", i);
- ethtool_sprintf(&data, "tx_%d_short_pkt_fmt", i);
- ethtool_sprintf(&data, "tx_%d_csum_partial", i);
- ethtool_sprintf(&data, "tx_%d_mana_map_err", i);
+ ethtool_sprintf(data, "tx_%d_packets", i);
+ ethtool_sprintf(data, "tx_%d_bytes", i);
+ ethtool_sprintf(data, "tx_%d_xdp_xmit", i);
+ ethtool_sprintf(data, "tx_%d_tso_packets", i);
+ ethtool_sprintf(data, "tx_%d_tso_bytes", i);
+ ethtool_sprintf(data, "tx_%d_tso_inner_packets", i);
+ ethtool_sprintf(data, "tx_%d_tso_inner_bytes", i);
+ ethtool_sprintf(data, "tx_%d_long_pkt_fmt", i);
+ ethtool_sprintf(data, "tx_%d_short_pkt_fmt", i);
+ ethtool_sprintf(data, "tx_%d_csum_partial", i);
+ ethtool_sprintf(data, "tx_%d_mana_map_err", i);
+ }
+}
+
+static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
+{
+ struct mana_port_context *apc = netdev_priv(ndev);
+
+ switch (stringset) {
+ case ETH_SS_STATS:
+ mana_get_strings_stats(apc, &data);
+ break;
+ default:
+ break;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next v11 2/2] net: mana: force full-page RX buffers via ethtool private flag
2026-07-01 14:15 [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy
2026-07-01 14:15 ` [PATCH net-next v11 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
@ 2026-07-01 14:15 ` Dipayaan Roy
2026-07-02 14:18 ` sashiko-bot
2026-07-01 14:45 ` [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers Maciej Fijalkowski
2 siblings, 1 reply; 5+ messages in thread
From: Dipayaan Roy @ 2026-07-01 14:15 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
pavan.chebbi
On some ARM64 platforms with 4K PAGE_SIZE, page_pool fragment
allocation in the RX refill path can cause 15-20% throughput
regression under high connection counts (>16 TCP streams).
Add an ethtool private flag "full-page-rx" that allows the user to
force one RX buffer per page, bypassing the page_pool fragment path.
This restores line-rate (180+ Gbps) performance on affected platforms.
Usage:
ethtool --set-priv-flags eth0 full-page-rx on
There is no behavioral change by default. The flag must be explicitly
enabled by the user or udev rule.
The existing single-buffer-per-page logic for XDP and jumbo frames is
consolidated into a new helper mana_use_single_rxbuf_per_page() which
is now the single decision point for both the automatic and
user-controlled paths.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 22 +++-
.../ethernet/microsoft/mana/mana_ethtool.c | 103 ++++++++++++++++++
include/net/mana/mana.h | 8 ++
3 files changed, 131 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 26aef21c6c2c..4bd83c782ea3 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -755,6 +755,25 @@ static void *mana_get_rxbuf_pre(struct mana_rxq *rxq, dma_addr_t *da)
return va;
}
+static bool
+mana_use_single_rxbuf_per_page(struct mana_port_context *apc, u32 mtu)
+{
+ /* On some platforms with 4K PAGE_SIZE, page_pool fragment allocation
+ * in the RX refill path (~2kB buffer) can cause significant throughput
+ * regression under high connection counts. Allow user to force one RX
+ * buffer per page via ethtool private flag to bypass the fragment
+ * path.
+ */
+ if (apc->priv_flags & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF))
+ return true;
+
+ /* For xdp and jumbo frames make sure only one packet fits per page. */
+ if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc))
+ return true;
+
+ return false;
+}
+
/* Get RX buffer's data size, alloc size, XDP headroom based on MTU */
static void mana_get_rxbuf_cfg(struct mana_port_context *apc,
int mtu, u32 *datasize, u32 *alloc_size,
@@ -765,8 +784,7 @@ static void mana_get_rxbuf_cfg(struct mana_port_context *apc,
/* Calculate datasize first (consistent across all cases) */
*datasize = mtu + ETH_HLEN;
- /* For xdp and jumbo frames make sure only one packet fits per page */
- if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) {
+ if (mana_use_single_rxbuf_per_page(apc, mtu)) {
if (mana_xdp_get(apc)) {
*headroom = XDP_PACKET_HEADROOM;
*alloc_size = PAGE_SIZE;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
index fa9c49592828..3c498a222965 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
@@ -133,6 +133,10 @@ static const struct mana_stats_desc mana_phy_stats[] = {
{ "hc_tc7_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc7_phy) },
};
+static const char mana_priv_flags[MANA_PRIV_FLAG_MAX][ETH_GSTRING_LEN] = {
+ [MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF] = "full-page-rx"
+};
+
static int mana_get_sset_count(struct net_device *ndev, int stringset)
{
struct mana_port_context *apc = netdev_priv(ndev);
@@ -144,6 +148,10 @@ static int mana_get_sset_count(struct net_device *ndev, int stringset)
ARRAY_SIZE(mana_phy_stats) +
ARRAY_SIZE(mana_hc_stats) +
num_queues * (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT);
+
+ case ETH_SS_PRIV_FLAGS:
+ return MANA_PRIV_FLAG_MAX;
+
default:
return -EINVAL;
}
@@ -192,6 +200,14 @@ static void mana_get_strings_stats(struct mana_port_context *apc, u8 **data)
}
}
+static void mana_get_strings_priv_flags(u8 **data)
+{
+ int i;
+
+ for (i = 0; i < MANA_PRIV_FLAG_MAX; i++)
+ ethtool_puts(data, mana_priv_flags[i]);
+}
+
static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
{
struct mana_port_context *apc = netdev_priv(ndev);
@@ -200,6 +216,9 @@ static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
case ETH_SS_STATS:
mana_get_strings_stats(apc, &data);
break;
+ case ETH_SS_PRIV_FLAGS:
+ mana_get_strings_priv_flags(&data);
+ break;
default:
break;
}
@@ -611,6 +630,88 @@ static int mana_get_link_ksettings(struct net_device *ndev,
return 0;
}
+static u32 mana_get_priv_flags(struct net_device *ndev)
+{
+ struct mana_port_context *apc = netdev_priv(ndev);
+
+ return apc->priv_flags;
+}
+
+static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
+{
+ struct mana_port_context *apc = netdev_priv(ndev);
+ u32 changed = apc->priv_flags ^ priv_flags;
+ u32 old_priv_flags = apc->priv_flags;
+ bool schedule_port_reset = false;
+ int err = 0;
+
+ if (!changed)
+ return 0;
+
+ /* Reject unknown bits */
+ if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
+ return -EINVAL;
+
+ if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
+ apc->priv_flags = priv_flags;
+
+ if (!apc->port_is_up) {
+ /* Port is down, flag updated to apply on next up
+ * so just return.
+ */
+ return 0;
+ }
+
+ /* Pre-allocate buffers to prevent failure in mana_attach
+ * later
+ */
+ err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
+ if (err) {
+ netdev_err(ndev,
+ "Insufficient memory for new allocations\n");
+ apc->priv_flags = old_priv_flags;
+ return err;
+ }
+
+ err = mana_detach(ndev, false);
+ if (err) {
+ netdev_err(ndev, "mana_detach failed: %d\n", err);
+ apc->priv_flags = old_priv_flags;
+
+ /* Port is in an inconsistent state. Restore
+ * 'port_is_up' so that queue reset work handler
+ * can properly detach and re-attach.
+ */
+ apc->port_is_up = true;
+ schedule_port_reset = true;
+ goto out;
+ }
+
+ err = mana_attach(ndev);
+ if (err) {
+ netdev_err(ndev, "mana_attach failed: %d\n", err);
+ apc->priv_flags = old_priv_flags;
+
+ /* Restore 'port_is_up' so the reset work handler
+ * can properly detach/attach. Without this,
+ * the handler sees port_is_up=false and skips
+ * queue allocation, leaving the port dead.
+ */
+ apc->port_is_up = true;
+ schedule_port_reset = true;
+ }
+ }
+
+out:
+ mana_pre_dealloc_rxbufs(apc);
+
+ if (schedule_port_reset)
+ queue_work(apc->ac->per_port_queue_reset_wq,
+ &apc->queue_reset_work);
+
+ return err;
+}
+
const struct ethtool_ops mana_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
.op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
@@ -631,4 +732,6 @@ const struct ethtool_ops mana_ethtool_ops = {
.set_ringparam = mana_set_ringparam,
.get_link_ksettings = mana_get_link_ksettings,
.get_link = ethtool_op_get_link,
+ .get_priv_flags = mana_get_priv_flags,
+ .set_priv_flags = mana_set_priv_flags,
};
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 13c87baf018e..8dc496f05938 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -30,6 +30,12 @@ enum TRI_STATE {
TRI_STATE_TRUE = 1
};
+/* MANA ethtool private flag bit positions */
+enum mana_priv_flag_bits {
+ MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF = 0,
+ MANA_PRIV_FLAG_MAX,
+};
+
/* Number of entries for hardware indirection table must be in power of 2 */
#define MANA_INDIRECT_TABLE_MAX_SIZE 512
#define MANA_INDIRECT_TABLE_DEF_SIZE 64
@@ -532,6 +538,8 @@ struct mana_port_context {
u32 rxbpre_headroom;
u32 rxbpre_frag_count;
+ u32 priv_flags;
+
struct bpf_prog *bpf_prog;
/* Create num_queues EQs, SQs, SQ-CQs, RQs and RQ-CQs, respectively. */
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers
2026-07-01 14:15 [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy
2026-07-01 14:15 ` [PATCH net-next v11 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
2026-07-01 14:15 ` [PATCH net-next v11 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
@ 2026-07-01 14:45 ` Maciej Fijalkowski
2 siblings, 0 replies; 5+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 14:45 UTC (permalink / raw)
To: Dipayaan Roy
Cc: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov,
pavan.chebbi
On Wed, Jul 01, 2026 at 07:15:44AM -0700, Dipayaan Roy wrote:
> On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool
> fragments for allocation in the RX refill path (~2kB buffer per fragment)
> causes 15-20% throughput regression under high connection counts
> (>16 TCP streams at 180+ Gbps). Using full-page buffers on these
> platforms shows no regression and restores line-rate performance.
>
> This behavior is observed on a single platform; other platforms
> perform better with page_pool fragments, indicating this is not a
> page_pool issue but platform-specific.
>
> This series adds an ethtool private flag "full-page-rx" to let the
> user opt in to one RX buffer per page:
>
> ethtool --set-priv-flags eth0 full-page-rx on
>
> There is no behavioral change by default. The flag can be persisted
> via udev rule for affected platforms.
Were you able to track down what is the actual bottleneck on the 'broken'
platform? What is the performance of full-page approach on healthy
platforms? On changelog below you mention the frag approach 'outperforms'
the full-page one.
>
> This series depends on the following fixes now merged in net-next:
> commit 17bfe0a8c014 ("net: mana: Add NULL guards in teardown path to prevent panic on attach failure")
> commit 5b05aa36ee24 ("net: mana: Skip redundant detach on already-detached port")
>
> Changes in v11:
> - Rebased on net-next
> Changes in v10:
> - Rebased on net-next which now includes the prerequisite fixes.
> - Recovery logic in mana_set_priv_flags() leverages the idempotent
> mana_detach() from the merged fixes.
> Changes in v9:
> - Added correct tree.
> Changes in v8:
> - Fixed queue_reset_work recovery by restoring port_is_up before
> scheduling reset so the handler can properly re-attach.
> - Simplified "err && schedule_port_reset" to "schedule_port_reset".
> Changes in v7:
> - Rebased onto net-next.
> - Retained private flag approach after David Wei's testing on
> Grace (ARM64) confirmed that fragment mode outperforms
> full-page mode on other platforms, validating this is a
> single-platform workaround rather than a generic issue.
> Changes in v6:
> - Added missed maintainers.
> Changes in v5:
> - Split prep refactor into separate patch (patch 1/2)
> Changes in v4:
> - Dropping the smbios string parsing and add ethtool priv flag
> to reconfigure the queues with full page rx buffers.
> Changes in v3:
> - changed u8* to char*
> Changes in v2:
> - separate reading string index and the string, remove inline.
>
> Dipayaan Roy (2):
> net: mana: refactor mana_get_strings() and mana_get_sset_count() to
> use switch
> net: mana: force full-page RX buffers via ethtool private flag
>
> drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++-
> .../ethernet/microsoft/mana/mana_ethtool.c | 178 +++++++++++++++---
> include/net/mana/mana.h | 8 +
> 3 files changed, 177 insertions(+), 31 deletions(-)
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v11 2/2] net: mana: force full-page RX buffers via ethtool private flag
2026-07-01 14:15 ` [PATCH net-next v11 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
@ 2026-07-02 14:18 ` sashiko-bot
0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-07-02 14:18 UTC (permalink / raw)
To: Dipayaan Roy; +Cc: bpf, linux-hyperv
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [High] Forcing `port_is_up = true` in the `mana_attach()` error path leaves the port state inconsistent, leading to a double-teardown of hardware queues and a NULL pointer dereference in stats gathering.
- [High] The device restart sequence in `mana_set_priv_flags()` lacks `vport_mutex` synchronization and does not set `apc->channel_changing`, creating a race condition with concurrent RDMA operations.
Pre-existing issues:
- [High] Bulk pre-allocation of RX buffers uses GFP_ATOMIC and emergency memory reserves in a process context (ethtool), which can lead to memory depletion and spurious failures.
- [High] Pre-allocated RX buffers with `from_pool = false` are leaked when dropped in a queue configured with `frag_count > 1`.
--
commit 40ed2f3da9bc06c10116aa35575da8c740d52002
Author: Dipayaan Roy <dipayanroy@linux.microsoft.com>
net: mana: force full-page RX buffers via ethtool private flag
This patch adds an ethtool private flag "full-page-rx" to force one RX
buffer per page on ARM64 platforms. The single-buffer-per-page logic is
consolidated into a new helper function. The driver is detached and
reattached when the flag is modified to apply the new buffer configurations.
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 3531abfdaf433..24f6ce63b031b 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
[ ... ]
> @@ -611,6 +630,88 @@ static int mana_get_link_ksettings(struct net_device *ndev,
> return 0;
> }
>
> +static u32 mana_get_priv_flags(struct net_device *ndev)
> +{
> + struct mana_port_context *apc = netdev_priv(ndev);
> +
> + return apc->priv_flags;
> +}
> +
> +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> +{
> + struct mana_port_context *apc = netdev_priv(ndev);
> + u32 changed = apc->priv_flags ^ priv_flags;
> + u32 old_priv_flags = apc->priv_flags;
> + bool schedule_port_reset = false;
> + int err = 0;
> +
> + if (!changed)
> + return 0;
> +
> + /* Reject unknown bits */
> + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
> + return -EINVAL;
> +
> + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
> + apc->priv_flags = priv_flags;
> +
> + if (!apc->port_is_up) {
> + /* Port is down, flag updated to apply on next up
> + * so just return.
> + */
> + return 0;
> + }
> +
> + /* Pre-allocate buffers to prevent failure in mana_attach
> + * later
> + */
> + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
[Severity: High]
This is a pre-existing issue, but does calling mana_pre_alloc_rxbufs() from
ethtool context cause memory allocation problems?
mana_pre_alloc_rxbufs() calls dev_alloc_pages() in a bulk loop, which uses
GFP_ATOMIC and emergency memory reserves. Since this is now triggered from
process context via ethtool, could this deplete system emergency reserves
intended for atomic receive paths, leading to spurious failures or network
drops under memory pressure?
[Severity: High]
This isn't a bug introduced by this patch, but does toggling this flag
dynamically create conditions that permanently leak pre-allocated buffers?
When a packet is dropped, mana_rx_skb() checks if from_pool is false (which
it is for pre-allocated buffers) and unconditionally saves the buffer to
rxq->xdp_save_va. However, mana_get_rxfrag() only reuses xdp_save_va if
rxq->frag_count == 1.
If this flag causes the queue to operate with frag_count > 1 after buffers
are pre-allocated (e.g. by disabling the flag), xdp_save_va is ignored.
Subsequent drops will trigger WARN_ON_ONCE(rxq->xdp_save_va) and permanently
leak the old buffer.
> + if (err) {
> + netdev_err(ndev,
> + "Insufficient memory for new allocations\n");
> + apc->priv_flags = old_priv_flags;
> + return err;
> + }
> +
> + err = mana_detach(ndev, false);
[Severity: High]
Does this device restart sequence lack vport_mutex synchronization with
concurrent RDMA operations?
In mana_set_channels(), detach and re-attach is protected by asserting
apc->channel_changing under vport_mutex. Here, mana_detach() tears down
the vport (dropping vport_use_count to 0) without setting
apc->channel_changing.
Can an RDMA process concurrently call mana_cfg_vport(), observe a zero use
count, and successfully claim the vport before mana_attach() blindly
attempts to configure the exact same vport, resulting in conflicting
hardware programming?
> + if (err) {
> + netdev_err(ndev, "mana_detach failed: %d\n", err);
> + apc->priv_flags = old_priv_flags;
> +
> + /* Port is in an inconsistent state. Restore
> + * 'port_is_up' so that queue reset work handler
> + * can properly detach and re-attach.
> + */
> + apc->port_is_up = true;
> + schedule_port_reset = true;
> + goto out;
> + }
> +
> + err = mana_attach(ndev);
> + if (err) {
> + netdev_err(ndev, "mana_attach failed: %d\n", err);
> + apc->priv_flags = old_priv_flags;
> +
> + /* Restore 'port_is_up' so the reset work handler
> + * can properly detach/attach. Without this,
> + * the handler sees port_is_up=false and skips
> + * queue allocation, leaving the port dead.
> + */
> + apc->port_is_up = true;
[Severity: High]
Does forcing port_is_up to true in this error path lead to a NULL pointer
dereference in stats gathering?
The internal failure path of mana_attach() calls
mana_cleanup_port_context(), which frees apc->rxqs and sets it to NULL.
By forcing apc->port_is_up = true here, if userspace queries stats via
ethtool -S or ip -s link, mana_get_stats64() and mana_get_ethtool_stats()
will bypass their safety checks and blindly dereference the NULL
apc->rxqs[q] array.
> + schedule_port_reset = true;
> + }
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260701141808.461554-1-dipayanroy@linux.microsoft.com?part=2
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-07-02 14:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 14:15 [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy
2026-07-01 14:15 ` [PATCH net-next v11 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
2026-07-01 14:15 ` [PATCH net-next v11 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
2026-07-02 14:18 ` sashiko-bot
2026-07-01 14:45 ` [PATCH net-next v11 0/2] net: mana: add ethtool private flag for full-page RX buffers Maciej Fijalkowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox