From: Jakub Kicinski <kuba@kernel.org>
To: dipayanroy@linux.microsoft.com
Cc: Jakub Kicinski <kuba@kernel.org>,
kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, pabeni@redhat.com, leon@kernel.org,
longli@microsoft.com, kotaranov@microsoft.com, horms@kernel.org,
shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com,
ernis@linux.microsoft.com, shirazsaleem@microsoft.com,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
stephen@networkplumber.org, jacob.e.keller@intel.com,
dipayanroy@microsoft.com, leitao@debian.org, kees@kernel.org,
john.fastabend@gmail.com, hawk@kernel.org, bpf@vger.kernel.org,
daniel@iogearbox.net, ast@kernel.org, sdf@fomichev.me,
yury.norov@gmail.com
Subject: Re: [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag
Date: Mon, 11 May 2026 19:21:33 -0700 [thread overview]
Message-ID: <20260512022133.856196-1-kuba@kernel.org> (raw)
In-Reply-To: <20260508115100.488506-3-dipayanroy@linux.microsoft.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mana: force full-page RX buffers via ethtool private flag
This patch adds an ethtool private flag "full-page-rx" that forces one
RX buffer per page, bypassing the page_pool fragment path, to address
a 15-20% throughput regression seen on some 4K PAGE_SIZE ARM64
platforms under high connection counts. Toggling the flag triggers
mana_detach() + mana_attach() to apply the new RX buffer layout.
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index a28ca461c135..dfd222de11d0 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
[ ... ]
> @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev,
> return 0;
> }
>
[ ... ]
> +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> +{
> + struct mana_port_context *apc = netdev_priv(ndev);
> + u32 changed = apc->priv_flags ^ priv_flags;
> + u32 old_priv_flags = apc->priv_flags;
> + bool schedule_port_reset = false;
> + int err = 0;
> +
> + if (!changed)
> + return 0;
> +
> + /* Reject unknown bits */
> + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
> + return -EINVAL;
> +
> + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
> + apc->priv_flags = priv_flags;
> +
> + if (!apc->port_is_up) {
> + /* Port is down, flag updated to apply on next up
> + * so just return.
> + */
> + return 0;
> + }
> +
> + /* Pre-allocate buffers to prevent failure in mana_attach
> + * later
> + */
> + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
> + if (err) {
> + netdev_err(ndev,
> + "Insufficient memory for new allocations\n");
> + apc->priv_flags = old_priv_flags;
> + return err;
> + }
> +
> + err = mana_detach(ndev, false);
> + if (err) {
> + netdev_err(ndev, "mana_detach failed: %d\n", err);
> + apc->priv_flags = old_priv_flags;
> +
> + /* Port is in an inconsistent state. Restore
> + * 'port_is_up' so that queue reset work handler
> + * can properly detach and re-attach.
> + */
> + apc->port_is_up = true;
> + schedule_port_reset = true;
> + goto out;
> + }
> +
> + err = mana_attach(ndev);
> + if (err) {
> + netdev_err(ndev, "mana_attach failed: %d\n", err);
> + apc->priv_flags = old_priv_flags;
> +
> + /* Restore 'port_is_up' so the reset work handler
> + * can properly detach/attach. Without this,
> + * the handler sees port_is_up=false and skips
> + * queue allocation, leaving the port dead.
> + */
> + apc->port_is_up = true;
> + schedule_port_reset = true;
> + }
> + }
> +
> +out:
> + mana_pre_dealloc_rxbufs(apc);
> +
> + if (schedule_port_reset)
> + queue_work(apc->ac->per_port_queue_reset_wq,
> + &apc->queue_reset_work);
> +
> + return err;
> +}
[High]
Can this failure path oops inside the reset work handler?
When mana_attach() fails after mana_detach() has already torn the vport
down, apc->tx_qp has been set to NULL by mana_destroy_txq() and apc->rxqs
has been cleared by mana_cleanup_port_context(). This code then forces
apc->port_is_up back to true and queues apc->queue_reset_work:
apc->port_is_up = true;
schedule_port_reset = true;
When mana_per_port_queue_reset_work_handler runs it calls
mana_detach(ndev, false), which latches port_st_save = port_is_up = true
and, because port_st_save is true, invokes mana_dealloc_queues(). The TX
drain loop there does:
txq = &apc->tx_qp[i].txq;
atomic_read(&txq->pending_sends);
with no NULL check on apc->tx_qp, and mana_destroy_vport further down
iterates apc->rxqs[rxq_idx] while rxqs is NULL. Would that produce a
NULL pointer dereference in the workqueue worker whenever mana_attach()
fails here (for example on an allocation failure in
mana_init_port_context or mana_create_txq, or a transient HW command
failure in mana_query_vport_cfg / mana_cfg_vport)?
The mana_detach() failure path above (goto out) reaches
mana_dealloc_queues() through the same chain and looks to have the same
exposure.
For comparison, mana_change_mtu() handles a mana_attach() failure by
returning the error without scheduling a reset. Would a similar
treatment here avoid the asynchronous oops, or is there a reason the
reset must be scheduled in this specific failure case?
--
pw-bot: cr
prev parent reply other threads:[~2026-05-12 2:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 11:46 [PATCH v8 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
2026-05-12 2:21 ` Jakub Kicinski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512022133.856196-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=dipayanroy@linux.microsoft.com \
--cc=dipayanroy@microsoft.com \
--cc=edumazet@google.com \
--cc=ernis@linux.microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=jacob.e.keller@intel.com \
--cc=john.fastabend@gmail.com \
--cc=kees@kernel.org \
--cc=kotaranov@microsoft.com \
--cc=kys@microsoft.com \
--cc=leitao@debian.org \
--cc=leon@kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shirazsaleem@microsoft.com \
--cc=shradhagupta@linux.microsoft.com \
--cc=ssengar@linux.microsoft.com \
--cc=stephen@networkplumber.org \
--cc=wei.liu@kernel.org \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox