* [PATCH v8 0/2] net: mana: add ethtool private flag for full-page RX buffers
@ 2026-05-08 11:46 Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
0 siblings, 2 replies; 5+ messages in thread
From: Dipayaan Roy @ 2026-05-08 11:46 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet,
kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta,
ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel,
linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees,
john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov
On some ARM64 platforms with 4K PAGE_SIZE, utilizing page_pool
fragments for allocation in the RX refill path (~2kB buffer per fragment)
causes 15-20% throughput regression under high connection counts
(>16 TCP streams at 180+ Gbps). Using full-page buffers on these
platforms shows no regression and restores line-rate performance.
This behavior is observed on a single platform; other platforms
perform better with page_pool fragments, indicating this is not a
page_pool issue but platform-specific.
This series adds an ethtool private flag "full-page-rx" to let the
user opt in to one RX buffer per page:
ethtool --set-priv-flags eth0 full-page-rx on
There is no behavioral change by default. The flag can be persisted
via udev rule for affected platforms.
Changes in v8:
- Fixed queue_reset_work recovery by restoring port_is_up before
scheduling reset so the handler can properly re-attach.
- Simplified "err && schedule_port_reset" to "schedule_port_reset".
Changes in v7:
- Rebased onto net-next.
- Retained private flag approach after David Wei's testing on
Grace (ARM64) confirmed that fragment mode outperforms
full-page mode on other platforms, validating this is a
single-platform workaround rather than a generic issue.
Changes in v6:
- Added missed maintainers.
Changes in v5:
- Split prep refactor into separate patch (patch 1/2)
Changes in v4:
- Dropping the smbios string parsing and add ethtool priv flag
to reconfigure the queues with full page rx buffers.
Changes in v3:
- changed u8* to char*
Changes in v2:
- separate reading string index and the string, remove inline.
Dipayaan Roy (2):
net: mana: refactor mana_get_strings() and mana_get_sset_count() to
use switch
net: mana: force full-page RX buffers via ethtool private flag
drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++-
.../ethernet/microsoft/mana/mana_ethtool.c | 178 +++++++++++++++---
include/net/mana/mana.h | 8 +
3 files changed, 177 insertions(+), 31 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v8 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch 2026-05-08 11:46 [PATCH v8 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy @ 2026-05-08 11:46 ` Dipayaan Roy 2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy 1 sibling, 0 replies; 5+ messages in thread From: Dipayaan Roy @ 2026-05-08 11:46 UTC (permalink / raw) To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet, kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel, linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees, john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov Refactor mana_get_strings() and mana_get_sset_count() from if/else to switch statements in preparation for adding ethtool private flags support which requires handling ETH_SS_PRIV_FLAGS. No functional change. Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> --- .../ethernet/microsoft/mana/mana_ethtool.c | 75 ++++++++++++------- 1 file changed, 46 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c index 04350973e19e..7e79681634db 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c @@ -138,53 +138,70 @@ static int mana_get_sset_count(struct net_device *ndev, int stringset) struct mana_port_context *apc = netdev_priv(ndev); unsigned int num_queues = apc->num_queues; - if (stringset != ETH_SS_STATS) + switch (stringset) { + case ETH_SS_STATS: + return ARRAY_SIZE(mana_eth_stats) + + ARRAY_SIZE(mana_phy_stats) + + ARRAY_SIZE(mana_hc_stats) + + num_queues * (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT); + default: return -EINVAL; - - return ARRAY_SIZE(mana_eth_stats) + ARRAY_SIZE(mana_phy_stats) + ARRAY_SIZE(mana_hc_stats) + - num_queues * (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT); + } } -static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data) +static void mana_get_strings_stats(struct mana_port_context *apc, u8 **data) { - struct mana_port_context *apc = netdev_priv(ndev); unsigned int num_queues = apc->num_queues; int i, j; - if (stringset != ETH_SS_STATS) - return; for (i = 0; i < ARRAY_SIZE(mana_eth_stats); i++) - ethtool_puts(&data, mana_eth_stats[i].name); + ethtool_puts(data, mana_eth_stats[i].name); for (i = 0; i < ARRAY_SIZE(mana_hc_stats); i++) - ethtool_puts(&data, mana_hc_stats[i].name); + ethtool_puts(data, mana_hc_stats[i].name); for (i = 0; i < ARRAY_SIZE(mana_phy_stats); i++) - ethtool_puts(&data, mana_phy_stats[i].name); + ethtool_puts(data, mana_phy_stats[i].name); for (i = 0; i < num_queues; i++) { - ethtool_sprintf(&data, "rx_%d_packets", i); - ethtool_sprintf(&data, "rx_%d_bytes", i); - ethtool_sprintf(&data, "rx_%d_xdp_drop", i); - ethtool_sprintf(&data, "rx_%d_xdp_tx", i); - ethtool_sprintf(&data, "rx_%d_xdp_redirect", i); - ethtool_sprintf(&data, "rx_%d_pkt_len0_err", i); + ethtool_sprintf(data, "rx_%d_packets", i); + ethtool_sprintf(data, "rx_%d_bytes", i); + ethtool_sprintf(data, "rx_%d_xdp_drop", i); + ethtool_sprintf(data, "rx_%d_xdp_tx", i); + ethtool_sprintf(data, "rx_%d_xdp_redirect", i); + ethtool_sprintf(data, "rx_%d_pkt_len0_err", i); for (j = 0; j < MANA_RXCOMP_OOB_NUM_PPI - 1; j++) - ethtool_sprintf(&data, "rx_%d_coalesced_cqe_%d", i, j + 2); + ethtool_sprintf(data, + "rx_%d_coalesced_cqe_%d", + i, + j + 2); } for (i = 0; i < num_queues; i++) { - ethtool_sprintf(&data, "tx_%d_packets", i); - ethtool_sprintf(&data, "tx_%d_bytes", i); - ethtool_sprintf(&data, "tx_%d_xdp_xmit", i); - ethtool_sprintf(&data, "tx_%d_tso_packets", i); - ethtool_sprintf(&data, "tx_%d_tso_bytes", i); - ethtool_sprintf(&data, "tx_%d_tso_inner_packets", i); - ethtool_sprintf(&data, "tx_%d_tso_inner_bytes", i); - ethtool_sprintf(&data, "tx_%d_long_pkt_fmt", i); - ethtool_sprintf(&data, "tx_%d_short_pkt_fmt", i); - ethtool_sprintf(&data, "tx_%d_csum_partial", i); - ethtool_sprintf(&data, "tx_%d_mana_map_err", i); + ethtool_sprintf(data, "tx_%d_packets", i); + ethtool_sprintf(data, "tx_%d_bytes", i); + ethtool_sprintf(data, "tx_%d_xdp_xmit", i); + ethtool_sprintf(data, "tx_%d_tso_packets", i); + ethtool_sprintf(data, "tx_%d_tso_bytes", i); + ethtool_sprintf(data, "tx_%d_tso_inner_packets", i); + ethtool_sprintf(data, "tx_%d_tso_inner_bytes", i); + ethtool_sprintf(data, "tx_%d_long_pkt_fmt", i); + ethtool_sprintf(data, "tx_%d_short_pkt_fmt", i); + ethtool_sprintf(data, "tx_%d_csum_partial", i); + ethtool_sprintf(data, "tx_%d_mana_map_err", i); + } +} + +static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data) +{ + struct mana_port_context *apc = netdev_priv(ndev); + + switch (stringset) { + case ETH_SS_STATS: + mana_get_strings_stats(apc, &data); + break; + default: + break; } } -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag 2026-05-08 11:46 [PATCH v8 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy 2026-05-08 11:46 ` [PATCH v8 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy @ 2026-05-08 11:46 ` Dipayaan Roy 2026-05-08 21:16 ` sashiko-bot 2026-05-12 2:21 ` Jakub Kicinski 1 sibling, 2 replies; 5+ messages in thread From: Dipayaan Roy @ 2026-05-08 11:46 UTC (permalink / raw) To: kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet, kuba, pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel, linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees, john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov On some ARM64 platforms with 4K PAGE_SIZE, page_pool fragment allocation in the RX refill path can cause 15-20% throughput regression under high connection counts (>16 TCP streams). Add an ethtool private flag "full-page-rx" that allows the user to force one RX buffer per page, bypassing the page_pool fragment path. This restores line-rate (180+ Gbps) performance on affected platforms. Usage: ethtool --set-priv-flags eth0 full-page-rx on There is no behavioral change by default. The flag must be explicitly enabled by the user or udev rule. The existing single-buffer-per-page logic for XDP and jumbo frames is consolidated into a new helper mana_use_single_rxbuf_per_page() which is now the single decision point for both the automatic and user-controlled paths. Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> --- drivers/net/ethernet/microsoft/mana/mana_en.c | 22 +++- .../ethernet/microsoft/mana/mana_ethtool.c | 103 ++++++++++++++++++ include/net/mana/mana.h | 8 ++ 3 files changed, 131 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index 462a457e7d53..c4bc8bf19d75 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -744,6 +744,25 @@ static void *mana_get_rxbuf_pre(struct mana_rxq *rxq, dma_addr_t *da) return va; } +static bool +mana_use_single_rxbuf_per_page(struct mana_port_context *apc, u32 mtu) +{ + /* On some platforms with 4K PAGE_SIZE, page_pool fragment allocation + * in the RX refill path (~2kB buffer) can cause significant throughput + * regression under high connection counts. Allow user to force one RX + * buffer per page via ethtool private flag to bypass the fragment + * path. + */ + if (apc->priv_flags & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) + return true; + + /* For xdp and jumbo frames make sure only one packet fits per page. */ + if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) + return true; + + return false; +} + /* Get RX buffer's data size, alloc size, XDP headroom based on MTU */ static void mana_get_rxbuf_cfg(struct mana_port_context *apc, int mtu, u32 *datasize, u32 *alloc_size, @@ -754,8 +773,7 @@ static void mana_get_rxbuf_cfg(struct mana_port_context *apc, /* Calculate datasize first (consistent across all cases) */ *datasize = mtu + ETH_HLEN; - /* For xdp and jumbo frames make sure only one packet fits per page */ - if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) { + if (mana_use_single_rxbuf_per_page(apc, mtu)) { if (mana_xdp_get(apc)) { *headroom = XDP_PACKET_HEADROOM; *alloc_size = PAGE_SIZE; diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c index 7e79681634db..f22bbb325948 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c @@ -133,6 +133,10 @@ static const struct mana_stats_desc mana_phy_stats[] = { { "hc_tc7_tx_pause_phy", offsetof(struct mana_ethtool_phy_stats, tx_pause_tc7_phy) }, }; +static const char mana_priv_flags[MANA_PRIV_FLAG_MAX][ETH_GSTRING_LEN] = { + [MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF] = "full-page-rx" +}; + static int mana_get_sset_count(struct net_device *ndev, int stringset) { struct mana_port_context *apc = netdev_priv(ndev); @@ -144,6 +148,10 @@ static int mana_get_sset_count(struct net_device *ndev, int stringset) ARRAY_SIZE(mana_phy_stats) + ARRAY_SIZE(mana_hc_stats) + num_queues * (MANA_STATS_RX_COUNT + MANA_STATS_TX_COUNT); + + case ETH_SS_PRIV_FLAGS: + return MANA_PRIV_FLAG_MAX; + default: return -EINVAL; } @@ -192,6 +200,14 @@ static void mana_get_strings_stats(struct mana_port_context *apc, u8 **data) } } +static void mana_get_strings_priv_flags(u8 **data) +{ + int i; + + for (i = 0; i < MANA_PRIV_FLAG_MAX; i++) + ethtool_puts(data, mana_priv_flags[i]); +} + static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data) { struct mana_port_context *apc = netdev_priv(ndev); @@ -200,6 +216,9 @@ static void mana_get_strings(struct net_device *ndev, u32 stringset, u8 *data) case ETH_SS_STATS: mana_get_strings_stats(apc, &data); break; + case ETH_SS_PRIV_FLAGS: + mana_get_strings_priv_flags(&data); + break; default: break; } @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev, return 0; } +static u32 mana_get_priv_flags(struct net_device *ndev) +{ + struct mana_port_context *apc = netdev_priv(ndev); + + return apc->priv_flags; +} + +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags) +{ + struct mana_port_context *apc = netdev_priv(ndev); + u32 changed = apc->priv_flags ^ priv_flags; + u32 old_priv_flags = apc->priv_flags; + bool schedule_port_reset = false; + int err = 0; + + if (!changed) + return 0; + + /* Reject unknown bits */ + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0)) + return -EINVAL; + + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) { + apc->priv_flags = priv_flags; + + if (!apc->port_is_up) { + /* Port is down, flag updated to apply on next up + * so just return. + */ + return 0; + } + + /* Pre-allocate buffers to prevent failure in mana_attach + * later + */ + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues); + if (err) { + netdev_err(ndev, + "Insufficient memory for new allocations\n"); + apc->priv_flags = old_priv_flags; + return err; + } + + err = mana_detach(ndev, false); + if (err) { + netdev_err(ndev, "mana_detach failed: %d\n", err); + apc->priv_flags = old_priv_flags; + + /* Port is in an inconsistent state. Restore + * 'port_is_up' so that queue reset work handler + * can properly detach and re-attach. + */ + apc->port_is_up = true; + schedule_port_reset = true; + goto out; + } + + err = mana_attach(ndev); + if (err) { + netdev_err(ndev, "mana_attach failed: %d\n", err); + apc->priv_flags = old_priv_flags; + + /* Restore 'port_is_up' so the reset work handler + * can properly detach/attach. Without this, + * the handler sees port_is_up=false and skips + * queue allocation, leaving the port dead. + */ + apc->port_is_up = true; + schedule_port_reset = true; + } + } + +out: + mana_pre_dealloc_rxbufs(apc); + + if (schedule_port_reset) + queue_work(apc->ac->per_port_queue_reset_wq, + &apc->queue_reset_work); + + return err; +} + const struct ethtool_ops mana_ethtool_ops = { .supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES, .get_ethtool_stats = mana_get_ethtool_stats, @@ -608,4 +709,6 @@ const struct ethtool_ops mana_ethtool_ops = { .set_ringparam = mana_set_ringparam, .get_link_ksettings = mana_get_link_ksettings, .get_link = ethtool_op_get_link, + .get_priv_flags = mana_get_priv_flags, + .set_priv_flags = mana_set_priv_flags, }; diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index aa90a858c8e3..1d44a78da520 100644 --- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -30,6 +30,12 @@ enum TRI_STATE { TRI_STATE_TRUE = 1 }; +/* MANA ethtool private flag bit positions */ +enum mana_priv_flag_bits { + MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF = 0, + MANA_PRIV_FLAG_MAX, +}; + /* Number of entries for hardware indirection table must be in power of 2 */ #define MANA_INDIRECT_TABLE_MAX_SIZE 512 #define MANA_INDIRECT_TABLE_DEF_SIZE 64 @@ -531,6 +537,8 @@ struct mana_port_context { u32 rxbpre_headroom; u32 rxbpre_frag_count; + u32 priv_flags; + struct bpf_prog *bpf_prog; /* Create num_queues EQs, SQs, SQ-CQs, RQs and RQ-CQs, respectively. */ -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag 2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy @ 2026-05-08 21:16 ` sashiko-bot 2026-05-12 2:21 ` Jakub Kicinski 1 sibling, 0 replies; 5+ messages in thread From: sashiko-bot @ 2026-05-08 21:16 UTC (permalink / raw) To: Dipayaan Roy; +Cc: bpf Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: - [Critical] Faking `apc->port_is_up = true` on `mana_attach()` failure causes a guaranteed NULL pointer dereference in the asynchronous reset worker. - [High] Toggling the 'full-page-rx' flag OFF leaks memory when initial packets are dropped because pre-allocated buffers are saved for reuse but never consumed by the fragment refill path. - [High] `mana_pre_alloc_rxbufs` dangerously allocates huge arrays with `kmalloc` and thousands of pages with `GFP_ATOMIC` in process context, leading to allocation failures under load. -- commit 57f161d214770cf8af214936230e6e5eeac378bc Author: Dipayaan Roy <dipayanroy@linux.microsoft.com> net: mana: force full-page RX buffers via ethtool private flag On some ARM64 platforms with 4K PAGE_SIZE, page_pool fragment allocation in the RX refill path can cause 15-20% throughput regression under high connection counts. This adds an ethtool private flag "full-page-rx" to force one RX buffer per page. > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c > index a28ca461c135f..dfd222de11d09 100644 > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c [ ... ] > @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev, > return 0; > } > > +static u32 mana_get_priv_flags(struct net_device *ndev) > +{ [ ... ] > +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags) > +{ > + struct mana_port_context *apc = netdev_priv(ndev); > + u32 changed = apc->priv_flags ^ priv_flags; > + u32 old_priv_flags = apc->priv_flags; > + bool schedule_port_reset = false; > + int err = 0; > + > + if (!changed) > + return 0; > + > + /* Reject unknown bits */ > + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0)) > + return -EINVAL; > + > + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) { > + apc->priv_flags = priv_flags; > + > + if (!apc->port_is_up) { > + /* Port is down, flag updated to apply on next up > + * so just return. > + */ > + return 0; > + } > + > + /* Pre-allocate buffers to prevent failure in mana_attach > + * later > + */ > + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues); Since mana_set_priv_flags() runs in process context under rtnl_lock, does calling mana_pre_alloc_rxbufs() risk memory exhaustion and fragmentation failures? mana_pre_alloc_rxbufs() loops num_rxb times using dev_alloc_pages(), which allocates using GFP_ATOMIC. Could mass-allocating thousands of atomic pages exhaust atomic memory reserves? It also uses kmalloc_array() for rxbufs_pre and das_pre, which might require megabytes of physically contiguous memory. Would it be safer to use kvmalloc_array() and alloc_pages(GFP_KERNEL) here instead? Additionally, if the full-page-rx flag is toggled OFF, does this leak memory when initial packets are dropped? When the flag is toggled, mana_pre_alloc_rxbufs() pre-allocates the RX ring and the buffers are injected with from_pool = false. If the flag is toggled OFF, the standard MTU configuration sets rxq->frag_count > 1. If any initial packets are dropped (e.g. truncated packets), mana_rx_skb() executes the drop path and saves the buffer for reuse via rxq->xdp_save_va = buf_va. However, mana_get_rxfrag() only reuses rxq->xdp_save_va if rxq->frag_count == 1. For frag_count > 1, it allocates new fragments from the page pool and ignores the saved buffer. If multiple pre-allocated packets are dropped, could mana_rx_skb() repeatedly overwrite rxq->xdp_save_va, triggering a WARN_ON_ONCE and leaking the pages? > + if (err) { > + netdev_err(ndev, > + "Insufficient memory for new allocations\n"); > + apc->priv_flags = old_priv_flags; > + return err; > + } > + > + err = mana_detach(ndev, false); [ ... ] > + err = mana_attach(ndev); > + if (err) { > + netdev_err(ndev, "mana_attach failed: %d\n", err); > + apc->priv_flags = old_priv_flags; > + > + /* Restore 'port_is_up' so the reset work handler > + * can properly detach/attach. Without this, > + * the handler sees port_is_up=false and skips > + * queue allocation, leaving the port dead. > + */ > + apc->port_is_up = true; > + schedule_port_reset = true; Does setting apc->port_is_up = true on mana_attach() failure cause a NULL pointer dereference in the async reset worker? If mana_attach() fails, its error path calls mana_cleanup_port_context(), which ultimately frees apc->tx_qp and sets it to NULL. When the scheduled mana_per_port_queue_reset_work_handler executes, it calls mana_detach(). Because we faked apc->port_is_up = true, mana_detach() invokes mana_dealloc_queues(), which blindly iterates over apc->num_queues and dereferences apc->tx_qp[i].txq. Since apc->tx_qp is NULL, will this result in a kernel panic? > + } > + } > + > +out: > + mana_pre_dealloc_rxbufs(apc); > + > + if (schedule_port_reset) > + queue_work(apc->ac->per_port_queue_reset_wq, > + &apc->queue_reset_work); > + > + return err; > +} -- Sashiko AI review · https://sashiko.dev/#/patchset/20260508115100.488506-1-dipayanroy@linux.microsoft.com?part=2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag 2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy 2026-05-08 21:16 ` sashiko-bot @ 2026-05-12 2:21 ` Jakub Kicinski 1 sibling, 0 replies; 5+ messages in thread From: Jakub Kicinski @ 2026-05-12 2:21 UTC (permalink / raw) To: dipayanroy Cc: Jakub Kicinski, kys, haiyangz, wei.liu, decui, andrew+netdev, davem, edumazet, pabeni, leon, longli, kotaranov, horms, shradhagupta, ssengar, ernis, shirazsaleem, linux-hyperv, netdev, linux-kernel, linux-rdma, stephen, jacob.e.keller, dipayanroy, leitao, kees, john.fastabend, hawk, bpf, daniel, ast, sdf, yury.norov This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. --- net: mana: force full-page RX buffers via ethtool private flag This patch adds an ethtool private flag "full-page-rx" that forces one RX buffer per page, bypassing the page_pool fragment path, to address a 15-20% throughput regression seen on some 4K PAGE_SIZE ARM64 platforms under high connection counts. Toggling the flag triggers mana_detach() + mana_attach() to apply the new RX buffer layout. > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c > index a28ca461c135..dfd222de11d0 100644 > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c [ ... ] > @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev, > return 0; > } > [ ... ] > +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags) > +{ > + struct mana_port_context *apc = netdev_priv(ndev); > + u32 changed = apc->priv_flags ^ priv_flags; > + u32 old_priv_flags = apc->priv_flags; > + bool schedule_port_reset = false; > + int err = 0; > + > + if (!changed) > + return 0; > + > + /* Reject unknown bits */ > + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0)) > + return -EINVAL; > + > + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) { > + apc->priv_flags = priv_flags; > + > + if (!apc->port_is_up) { > + /* Port is down, flag updated to apply on next up > + * so just return. > + */ > + return 0; > + } > + > + /* Pre-allocate buffers to prevent failure in mana_attach > + * later > + */ > + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues); > + if (err) { > + netdev_err(ndev, > + "Insufficient memory for new allocations\n"); > + apc->priv_flags = old_priv_flags; > + return err; > + } > + > + err = mana_detach(ndev, false); > + if (err) { > + netdev_err(ndev, "mana_detach failed: %d\n", err); > + apc->priv_flags = old_priv_flags; > + > + /* Port is in an inconsistent state. Restore > + * 'port_is_up' so that queue reset work handler > + * can properly detach and re-attach. > + */ > + apc->port_is_up = true; > + schedule_port_reset = true; > + goto out; > + } > + > + err = mana_attach(ndev); > + if (err) { > + netdev_err(ndev, "mana_attach failed: %d\n", err); > + apc->priv_flags = old_priv_flags; > + > + /* Restore 'port_is_up' so the reset work handler > + * can properly detach/attach. Without this, > + * the handler sees port_is_up=false and skips > + * queue allocation, leaving the port dead. > + */ > + apc->port_is_up = true; > + schedule_port_reset = true; > + } > + } > + > +out: > + mana_pre_dealloc_rxbufs(apc); > + > + if (schedule_port_reset) > + queue_work(apc->ac->per_port_queue_reset_wq, > + &apc->queue_reset_work); > + > + return err; > +} [High] Can this failure path oops inside the reset work handler? When mana_attach() fails after mana_detach() has already torn the vport down, apc->tx_qp has been set to NULL by mana_destroy_txq() and apc->rxqs has been cleared by mana_cleanup_port_context(). This code then forces apc->port_is_up back to true and queues apc->queue_reset_work: apc->port_is_up = true; schedule_port_reset = true; When mana_per_port_queue_reset_work_handler runs it calls mana_detach(ndev, false), which latches port_st_save = port_is_up = true and, because port_st_save is true, invokes mana_dealloc_queues(). The TX drain loop there does: txq = &apc->tx_qp[i].txq; atomic_read(&txq->pending_sends); with no NULL check on apc->tx_qp, and mana_destroy_vport further down iterates apc->rxqs[rxq_idx] while rxqs is NULL. Would that produce a NULL pointer dereference in the workqueue worker whenever mana_attach() fails here (for example on an allocation failure in mana_init_port_context or mana_create_txq, or a transient HW command failure in mana_query_vport_cfg / mana_cfg_vport)? The mana_detach() failure path above (goto out) reaches mana_dealloc_queues() through the same chain and looks to have the same exposure. For comparison, mana_change_mtu() handles a mana_attach() failure by returning the error without scheduling a reset. Would a similar treatment here avoid the asynchronous oops, or is there a reason the reset must be scheduled in this specific failure case? -- pw-bot: cr ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-12 2:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-08 11:46 [PATCH v8 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy 2026-05-08 11:46 ` [PATCH v8 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy 2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy 2026-05-08 21:16 ` sashiko-bot 2026-05-12 2:21 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox