Intel-Wired-Lan Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: aleksander.lobakin@intel.com
Cc: 'Simon Horman' <horms@kernel.org>,
	intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, kohei@enjuk.jp, jacob.e.keller@intel.com,
	aleksandr.loktionov@intel.com,
	nxne.cnse.osdt.itp.upstreaming@intel.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v5 4/5] ice: implement Rx queue management ops
Date: Fri,  8 May 2026 15:08:00 +0100	[thread overview]
Message-ID: <20260508140758.1200568-3-horms@kernel.org> (raw)
In-Reply-To: <20260505152923.1040589-5-aleksander.lobakin@intel.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
ice: implement Rx queue management ops

This commit wires the ice driver into the netdev queue management ops
framework, reusing the existing per-queue disable/reconfig/enable paths
originally added for XSk. With it, ice reportedly passes the io_uring
zcrx selftests.

> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index 1add82d894bb..4e0b8895c303 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c

[ ... ]

> @@ -1577,3 +1593,81 @@ int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>
>  	return fail;
>  }
> +
> +/**
> + * ice_qp_ena - Enables a queue pair
> + * @vsi: VSI of interest
> + * @q_idx: ring index in array
> + *
> + * Returns 0 on success, negative on failure.
> + */
> +int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
> +{
> +	return __ice_qp_ena(vsi, q_idx, ice_vsi_cfg_single_rxq(vsi, q_idx));
> +}
> +
> +static int ice_queue_start(struct net_device *dev,
> +			   struct netdev_queue_config *qcfg,
> +			   void *per_queue_mem, int idx)
> +{
> +	const struct ice_netdev_priv *priv = netdev_priv(dev);
> +	struct ice_rx_ring *real = priv->vsi->rx_rings[idx];
> +	struct ice_rx_ring *new = per_queue_mem;
> +	struct napi_struct *napi;
> +	int ret;
> +
> +	real->pp = new->pp;
> +	real->rx_fqes = new->rx_fqes;
> +	real->hdr_fqes = new->hdr_fqes;
> +	real->hdr_pp = new->hdr_pp;
> +
> +	real->hdr_truesize = new->hdr_truesize;
> +	real->truesize = new->truesize;
> +	real->rx_hdr_len = new->rx_hdr_len;
> +	real->rx_buf_len = new->rx_buf_len;
> +
> +	memcpy(&real->xdp_rxq, &new->xdp_rxq, sizeof(new->xdp_rxq));
> +
> +	ret = ice_setup_rx_ctx(real);
> +	if (ret)
> +		return ret;

Simon says: I believe that the issue flagged immediately below is
            a false positive; the cleanup logic unwinds things correctly.

	    However, the two other issues flagged further down do seem to be
	    of concern.

ice_queue_start() eagerly copies new->pp, new->rx_fqes, new->hdr_pp,
new->hdr_fqes and the entire new->xdp_rxq into real before ice_setup_rx_ctx()
has succeeded.

If ice_setup_rx_ctx() returns an error here, the function bails out with
real already pointing at new_mem's page pools and fqes arrays.

The core netdev_rx_queue_reconfig() rollback then invokes
ndo_queue_start(dev, qcfg_old, old_mem, rxq_idx), which repeats the same
eager overwrite using old_mem and calls ice_setup_rx_ctx() again.  If the
second ice_setup_rx_ctx() also fails, the core proceeds to
ndo_queue_mem_free(dev, old_mem), and ice_queue_mem_free() will tear down
the page pools whose pointers real->pp / real->hdr_pp are still aliasing.

Would it be safer to stash the existing real->pp / real->rx_fqes /
real->hdr_pp / real->hdr_fqes / real->xdp_rxq on entry and restore them if
ice_setup_rx_ctx() fails, so that a failed rollback cannot leave real with
dangling pointers to a pool that ice_queue_mem_free() is about to destroy?

> +
> +	napi = &real->q_vector->napi;
> +
> +	page_pool_enable_direct_recycling(real->pp, napi);
> +	if (real->hdr_pp)
> +		page_pool_enable_direct_recycling(real->hdr_pp, napi);
> +
> +	ret = ice_alloc_rx_bufs(real, ICE_DESC_UNUSED(real));
> +
> +	return __ice_qp_ena(priv->vsi, idx, ret);

ice_alloc_rx_bufs() is declared bool and returns !!cleaned_count, so it
yields 0 on full success and 1 on partial/failed allocation:

bool ice_alloc_rx_bufs(struct ice_rx_ring *rx_ring, unsigned int cleaned_count)
{
    ...
    return !!cleaned_count;
}

That bool is then passed straight into __ice_qp_ena()'s fail parameter,
which only overwrites fail via 'if (!fail) fail = err;', so a value of 1
propagates unchanged to ice_queue_start()'s return value.

netdev_rx_queue_reconfig() treats any non-zero return as failure and
propagates it back to userspace (netlink, ethtool, io_uring zcrx install),
where the callers expect a negative errno.

Before this patch, ice_qp_ena() obtained its fail value from
ice_vsi_cfg_single_rxq(), which returns a signed errno.

Should this be converted to a proper negative errno before being handed
to __ice_qp_ena(), for example:

	ret = ice_alloc_rx_bufs(real, ICE_DESC_UNUSED(real)) ? -ENOMEM : 0;

> +}
> +
> +static int ice_queue_stop(struct net_device *dev, void *per_queue_mem,
> +			  int idx)
> +{
> +	const struct ice_netdev_priv *priv = netdev_priv(dev);
> +	struct ice_rx_ring *real = priv->vsi->rx_rings[idx];
> +	int ret;
> +
> +	ret = __ice_qp_dis(priv->vsi, idx);
> +	if (ret)
> +		return ret;
> +
> +	page_pool_disable_direct_recycling(real->pp);
> +	if (real->hdr_pp)
> +		page_pool_disable_direct_recycling(real->hdr_pp);
> +
> +	ice_zero_rx_ring(real);
> +	memcpy(per_queue_mem, real, sizeof(*real));
> +
> +	return 0;
> +}

Can this leak inflight page pool buffers?

ice_zero_rx_ring(real) is called before the memcpy to per_queue_mem, and
ice_zero_rx_ring() resets both indices:

void ice_zero_rx_ring(struct ice_rx_ring *rx_ring)
{
    ...
    rx_ring->next_to_clean = 0;
    rx_ring->next_to_use = 0;
}

So per_queue_mem captures a ring where next_to_clean == next_to_use == 0.

The core then invokes ndo_queue_mem_free(dev, old_mem), and the recycle
loop in ice_queue_mem_free() is guarded by exactly those two indices:

void ice_queue_mem_free(struct net_device *dev, void *per_queue_mem)
{
    ...
    for (u32 i = rx_ring->next_to_clean; i != rx_ring->next_to_use; ) {
        libeth_rx_recycle_slow(rx_ring->rx_fqes[i].netmem);

        if (rx_ring->hdr_pp)
            libeth_rx_recycle_slow(rx_ring->hdr_fqes[i].netmem);

        if (unlikely(++i == rx_ring->count))
            i = 0;
    }
    ...
    ice_rxq_pp_destroy(rx_ring);
}

With 0 != 0 false on entry, the loop never runs, so any buffers that were
in rx_fqes[old_ntc..old_ntu) (and hdr_fqes[] when header split is on) are
never returned via libeth_rx_recycle_slow().  ice_rxq_pp_destroy() then
kvfree()s the fqes arrays and calls page_pool_destroy() with outstanding
inflight references that are no longer tracked anywhere.

The new ice_clean_rx_ring() wrapper keeps the original ordering:

void ice_clean_rx_ring(struct ice_rx_ring *rx_ring)
{
    ice_queue_mem_free(rx_ring->netdev, rx_ring);
    ice_zero_rx_ring(rx_ring);
}

Should ice_queue_stop() follow the same ordering and perform the memcpy
before ice_zero_rx_ring(real), so the recycle loop in ice_queue_mem_free()
sees the real ntc/ntu values?

> +
> +const struct netdev_queue_mgmt_ops ice_queue_mgmt_ops = {
> +	.ndo_queue_mem_alloc	= ice_queue_mem_alloc,
> +	.ndo_queue_mem_free	= ice_queue_mem_free,
> +	.ndo_queue_mem_size	= sizeof(struct ice_rx_ring),
> +	.ndo_queue_start	= ice_queue_start,
> +	.ndo_queue_stop		= ice_queue_stop,
> +};

[ ... ]

  reply	other threads:[~2026-05-08 14:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 15:29 [Intel-wired-lan] [PATCH iwl-next v5 0/5] ice: add support for devmem/io_uring Rx and Tx Alexander Lobakin
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 1/5] libeth: pass Rx queue index to PP when creating a fill queue Alexander Lobakin
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 2/5] libeth: handle creating pools with unreadable buffers Alexander Lobakin
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 3/5] ice: migrate to netdev ops lock Alexander Lobakin
2026-05-08 13:28   ` Simon Horman
2026-05-08 13:57     ` Alexander Lobakin
2026-05-10  8:57       ` Simon Horman
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 4/5] ice: implement Rx queue management ops Alexander Lobakin
2026-05-08 14:08   ` Simon Horman [this message]
2026-05-08 14:26     ` Simon Horman
2026-05-08 15:07     ` Alexander Lobakin
2026-05-08 14:25   ` Simon Horman
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 5/5] ice: add support for transmitting unreadable frags Alexander Lobakin
2026-05-08 15:42   ` Simon Horman
2026-05-08 12:06 ` [Intel-wired-lan] [PATCH iwl-next v5 0/5] ice: add support for devmem/io_uring Rx and Tx Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508140758.1200568-3-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jacob.e.keller@intel.com \
    --cc=kohei@enjuk.jp \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nxne.cnse.osdt.itp.upstreaming@intel.com \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox