All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Dipayaan Roy" <dipayanroy@linux.microsoft.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag
Date: Fri, 08 May 2026 21:16:31 +0000	[thread overview]
Message-ID: <20260508211632.052D5C2BCB0@smtp.kernel.org> (raw)
In-Reply-To: <20260508115100.488506-3-dipayanroy@linux.microsoft.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Faking `apc->port_is_up = true` on `mana_attach()` failure causes a guaranteed NULL pointer dereference in the asynchronous reset worker.
- [High] Toggling the 'full-page-rx' flag OFF leaks memory when initial packets are dropped because pre-allocated buffers are saved for reuse but never consumed by the fragment refill path.
- [High] `mana_pre_alloc_rxbufs` dangerously allocates huge arrays with `kmalloc` and thousands of pages with `GFP_ATOMIC` in process context, leading to allocation failures under load.
--

commit 57f161d214770cf8af214936230e6e5eeac378bc
Author: Dipayaan Roy <dipayanroy@linux.microsoft.com>

net: mana: force full-page RX buffers via ethtool private flag
    
On some ARM64 platforms with 4K PAGE_SIZE, page_pool fragment
allocation in the RX refill path can cause 15-20% throughput
regression under high connection counts. This adds an ethtool private
flag "full-page-rx" to force one RX buffer per page.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index a28ca461c135f..dfd222de11d09 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
[ ... ]
> @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device *ndev,
>  	return 0;
>  }
>  
> +static u32 mana_get_priv_flags(struct net_device *ndev)
> +{
[ ... ]
> +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> +{
> +	struct mana_port_context *apc = netdev_priv(ndev);
> +	u32 changed = apc->priv_flags ^ priv_flags;
> +	u32 old_priv_flags = apc->priv_flags;
> +	bool schedule_port_reset = false;
> +	int err = 0;
> +
> +	if (!changed)
> +		return 0;
> +
> +	/* Reject unknown bits */
> +	if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
> +		return -EINVAL;
> +
> +	if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
> +		apc->priv_flags = priv_flags;
> +
> +		if (!apc->port_is_up) {
> +			/* Port is down, flag updated to apply on next up
> +			 * so just return.
> +			 */
> +			return 0;
> +		}
> +
> +		/* Pre-allocate buffers to prevent failure in mana_attach
> +		 * later
> +		 */
> +		err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);

Since mana_set_priv_flags() runs in process context under rtnl_lock, does
calling mana_pre_alloc_rxbufs() risk memory exhaustion and fragmentation
failures?

mana_pre_alloc_rxbufs() loops num_rxb times using dev_alloc_pages(), which
allocates using GFP_ATOMIC. Could mass-allocating thousands of atomic pages
exhaust atomic memory reserves?

It also uses kmalloc_array() for rxbufs_pre and das_pre, which might require
megabytes of physically contiguous memory. Would it be safer to use
kvmalloc_array() and alloc_pages(GFP_KERNEL) here instead?

Additionally, if the full-page-rx flag is toggled OFF, does this leak memory
when initial packets are dropped?

When the flag is toggled, mana_pre_alloc_rxbufs() pre-allocates the RX ring
and the buffers are injected with from_pool = false. If the flag is toggled
OFF, the standard MTU configuration sets rxq->frag_count > 1.

If any initial packets are dropped (e.g. truncated packets), mana_rx_skb()
executes the drop path and saves the buffer for reuse via
rxq->xdp_save_va = buf_va.

However, mana_get_rxfrag() only reuses rxq->xdp_save_va if
rxq->frag_count == 1. For frag_count > 1, it allocates new fragments from the
page pool and ignores the saved buffer. If multiple pre-allocated packets are
dropped, could mana_rx_skb() repeatedly overwrite rxq->xdp_save_va, triggering
a WARN_ON_ONCE and leaking the pages?

> +		if (err) {
> +			netdev_err(ndev,
> +				   "Insufficient memory for new allocations\n");
> +			apc->priv_flags = old_priv_flags;
> +			return err;
> +		}
> +
> +		err = mana_detach(ndev, false);
[ ... ]
> +		err = mana_attach(ndev);
> +		if (err) {
> +			netdev_err(ndev, "mana_attach failed: %d\n", err);
> +			apc->priv_flags = old_priv_flags;
> +
> +			/* Restore 'port_is_up' so the reset work handler
> +			 * can properly detach/attach. Without this,
> +			 * the handler sees port_is_up=false and skips
> +			 * queue allocation, leaving the port dead.
> +			 */
> +			apc->port_is_up = true;
> +			schedule_port_reset = true;

Does setting apc->port_is_up = true on mana_attach() failure cause a NULL
pointer dereference in the async reset worker?

If mana_attach() fails, its error path calls mana_cleanup_port_context(),
which ultimately frees apc->tx_qp and sets it to NULL.

When the scheduled mana_per_port_queue_reset_work_handler executes, it calls
mana_detach(). Because we faked apc->port_is_up = true, mana_detach() invokes
mana_dealloc_queues(), which blindly iterates over apc->num_queues and
dereferences apc->tx_qp[i].txq.

Since apc->tx_qp is NULL, will this result in a kernel panic?

> +		}
> +	}
> +
> +out:
> +	mana_pre_dealloc_rxbufs(apc);
> +
> +	if (schedule_port_reset)
> +		queue_work(apc->ac->per_port_queue_reset_wq,
> +			   &apc->queue_reset_work);
> +
> +	return err;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260508115100.488506-1-dipayanroy@linux.microsoft.com?part=2

  reply	other threads:[~2026-05-08 21:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 11:46 [PATCH v8 0/2] net: mana: add ethtool private flag for full-page RX buffers Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 1/2] net: mana: refactor mana_get_strings() and mana_get_sset_count() to use switch Dipayaan Roy
2026-05-08 11:46 ` [PATCH v8 2/2] net: mana: force full-page RX buffers via ethtool private flag Dipayaan Roy
2026-05-08 21:16   ` sashiko-bot [this message]
2026-05-12  2:21   ` Jakub Kicinski
2026-05-18 21:33     ` Dipayaan Roy
2026-05-29  2:55       ` Dipayaan Roy
2026-05-29 17:57         ` Jakub Kicinski
2026-06-02 20:42           ` Dipayaan Roy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508211632.052D5C2BCB0@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dipayanroy@linux.microsoft.com \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.