From: Jakub Kicinski <kuba@kernel.org>
To: dipayanroy@linux.microsoft.com
Cc: Jakub Kicinski <kuba@kernel.org>,
kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, pabeni@redhat.com, leon@kernel.org,
longli@microsoft.com, kotaranov@microsoft.com, horms@kernel.org,
shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com,
ernis@linux.microsoft.com, shirazsaleem@microsoft.com,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stephen@networkplumber.org, jacob.e.keller@intel.com,
dipayanroy@microsoft.com, leitao@debian.org, kees@kernel.org,
john.fastabend@gmail.com, hawk@kernel.org, bpf@vger.kernel.org,
daniel@iogearbox.net, ast@kernel.org, sdf@fomichev.me,
yury.norov@gmail.com
Subject: Re: [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag
Date: Mon, 11 May 2026 19:21:33 -0700 [thread overview]
Message-ID: <20260512022133.856196-1-kuba@kernel.org> (raw)
In-Reply-To: <20260508115100.488506-3-dipayanroy@linux.microsoft.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mana: force full-page RX buffers via ethtool private flag
This patch adds an ethtool private flag "full-page-rx" that forces one
RX buffer per page, bypassing the page_pool fragment path, to address
a 15-20% throughput regression seen on some 4K PAGE_SIZE ARM64
platforms under high connection counts. Toggling the flag triggers
mana_detach() + mana_attach() to apply the new RX buffer layout.
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index a28ca461c135..dfd222de11d0 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
[ ... ]
> @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev,
> return 0;
> }
>
[ ... ]
> +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> +{
> + struct mana_port_context *apc = netdev_priv(ndev);
> + u32 changed = apc->priv_flags ^ priv_flags;
> + u32 old_priv_flags = apc->priv_flags;
> + bool schedule_port_reset = false;
> + int err = 0;
> +
> + if (!changed)
> + return 0;
> +
> + /* Reject unknown bits */
> + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
> + return -EINVAL;
> +
> + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
> + apc->priv_flags = priv_flags;
> +
> + if (!apc->port_is_up) {
> + /* Port is down, flag updated to apply on next up
> + * so just return.
> + */
> + return 0;
> + }
> +
> + /* Pre-allocate buffers to prevent failure in mana_attach
> + * later
> + */
> + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
> + if (err) {
> + netdev_err(ndev,
> + "Insufficient memory for new allocations\n");
> + apc->priv_flags = old_priv_flags;
> + return err;
> + }
> +
> + err = mana_detach(ndev, false);
> + if (err) {
> + netdev_err(ndev, "mana_detach failed: %d\n", err);
> + apc->priv_flags = old_priv_flags;
> +
> + /* Port is in an inconsistent state. Restore
> + * 'port_is_up' so that queue reset work handler
> + * can properly detach and re-attach.
> + */
> + apc->port_is_up = true;
> + schedule_port_reset = true;
> + goto out;
> + }
> +
> + err = mana_attach(ndev);
> + if (err) {
> + netdev_err(ndev, "mana_attach failed: %d\n", err);
> + apc->priv_flags = old_priv_flags;
> +
> + /* Restore 'port_is_up' so the reset work handler
> + * can properly detach/attach. Without this,
> + * the handler sees port_is_up=false and skips
> + * queue allocation, leaving the port dead.
> + */
> + apc->port_is_up = true;
> + schedule_port_reset = true;
> + }
> + }
> +
> +out:
> + mana_pre_dealloc_rxbufs(apc);
> +
> + if (schedule_port_reset)
> + queue_work(apc->ac->per_port_queue_reset_wq,
> + &apc->queue_reset_work);
> +
> + return err;
> +}
[High]
Can this failure path oops inside the reset work handler?
When mana_attach() fails after mana_detach() has already torn the vport
down, apc->tx_qp has been set to NULL by mana_destroy_txq() and apc->rxqs
has been cleared by mana_cleanup_port_context(). This code then forces
apc->port_is_up back to true and queues apc->queue_reset_work:
apc->port_is_up = true;
schedule_port_reset = true;
When mana_per_port_queue_reset_work_handler runs it calls
mana_detach(ndev, false), which latches port_st_save = port_is_up = true
and, because port_st_save is true, invokes mana_dealloc_queues(). The TX
drain loop there does:
txq = &apc->tx_qp[i].txq;
atomic_read(&txq->pending_sends);
with no NULL check on apc->tx_qp, and mana_destroy_vport further down
iterates apc->rxqs[rxq_idx] while rxqs is NULL. Would that produce a
NULL pointer dereference in the workqueue worker whenever mana_attach()
fails here (for example on an allocation failure in
mana_init_port_context or mana_create_txq, or a transient HW command
failure in mana_query_vport_cfg / mana_cfg_vport)?
The mana_detach() failure path above (goto out) reaches
mana_dealloc_queues() through the same chain and looks to have the same
exposure.
For comparison, mana_change_mtu() handles a mana_attach() failure by
returning the error without scheduling a reset. Would a similar
treatment here avoid the asynchronous oops, or is there a reason the
reset must be scheduled in this specific failure case?
--
pw-bot: cr
next prev parent reply other threads:[~2026-05-12 2:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 11:46 [PATCH v8 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
2026-05-08 21:16 ` sashiko-bot
2026-05-12 2:21 ` Jakub Kicinski [this message]
2026-05-18 21:33 ` Dipayaan Roy
2026-05-29 2:55 ` Dipayaan Roy
2026-05-29 17:57 ` Jakub Kicinski
2026-06-02 20:42 ` Dipayaan Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512022133.856196-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=dipayanroy@linux.microsoft.com \
--cc=dipayanroy@microsoft.com \
--cc=edumazet@google.com \
--cc=ernis@linux.microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=jacob.e.keller@intel.com \
--cc=john.fastabend@gmail.com \
--cc=kees@kernel.org \
--cc=kotaranov@microsoft.com \
--cc=kys@microsoft.com \
--cc=leitao@debian.org \
--cc=leon@kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shirazsaleem@microsoft.com \
--cc=shradhagupta@linux.microsoft.com \
--cc=ssengar@linux.microsoft.com \
--cc=stephen@networkplumber.org \
--cc=wei.liu@kernel.org \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.