Re: [Intel-wired-lan] [PATCH iwl-next v5 4/5] ice: implement Rx queue management ops

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Simon Horman <horms@kernel.org>
To: aleksander.lobakin@intel.com
Cc: 'Simon Horman' <horms@kernel.org>,
	intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, kohei@enjuk.jp, jacob.e.keller@intel.com,
	aleksandr.loktionov@intel.com,
	nxne.cnse.osdt.itp.upstreaming@intel.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v5 4/5] ice: implement Rx queue management ops
Date: Fri,  8 May 2026 15:08:00 +0100	[thread overview]
Message-ID: <20260508140758.1200568-3-horms@kernel.org> (raw)
In-Reply-To: <20260505152923.1040589-5-aleksander.lobakin@intel.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
ice: implement Rx queue management ops

This commit wires the ice driver into the netdev queue management ops
framework, reusing the existing per-queue disable/reconfig/enable paths
originally added for XSk. With it, ice reportedly passes the io_uring
zcrx selftests.

> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index 1add82d894bb..4e0b8895c303 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c

[ ... ]

> @@ -1577,3 +1593,81 @@ int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>
>  	return fail;
>  }
> +
> +/**
> + * ice_qp_ena - Enables a queue pair
> + * @vsi: VSI of interest
> + * @q_idx: ring index in array
> + *
> + * Returns 0 on success, negative on failure.
> + */
> +int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
> +{
> +	return __ice_qp_ena(vsi, q_idx, ice_vsi_cfg_single_rxq(vsi, q_idx));
> +}
> +
> +static int ice_queue_start(struct net_device *dev,
> +			   struct netdev_queue_config *qcfg,
> +			   void *per_queue_mem, int idx)
> +{
> +	const struct ice_netdev_priv *priv = netdev_priv(dev);
> +	struct ice_rx_ring *real = priv->vsi->rx_rings[idx];
> +	struct ice_rx_ring *new = per_queue_mem;
> +	struct napi_struct *napi;
> +	int ret;
> +
> +	real->pp = new->pp;
> +	real->rx_fqes = new->rx_fqes;
> +	real->hdr_fqes = new->hdr_fqes;
> +	real->hdr_pp = new->hdr_pp;
> +
> +	real->hdr_truesize = new->hdr_truesize;
> +	real->truesize = new->truesize;
> +	real->rx_hdr_len = new->rx_hdr_len;
> +	real->rx_buf_len = new->rx_buf_len;
> +
> +	memcpy(&real->xdp_rxq, &new->xdp_rxq, sizeof(new->xdp_rxq));
> +
> +	ret = ice_setup_rx_ctx(real);
> +	if (ret)
> +		return ret;

Simon says: I believe that the issue flagged immediately below is
            a false positive; the cleanup logic unwinds things correctly.

	    However, the two other issues flagged further down do seem to be
	    of concern.

ice_queue_start() eagerly copies new->pp, new->rx_fqes, new->hdr_pp,
new->hdr_fqes and the entire new->xdp_rxq into real before ice_setup_rx_ctx()
has succeeded.

If ice_setup_rx_ctx() returns an error here, the function bails out with
real already pointing at new_mem's page pools and fqes arrays.

The core netdev_rx_queue_reconfig() rollback then invokes
ndo_queue_start(dev, qcfg_old, old_mem, rxq_idx), which repeats the same
eager overwrite using old_mem and calls ice_setup_rx_ctx() again.  If the
second ice_setup_rx_ctx() also fails, the core proceeds to
ndo_queue_mem_free(dev, old_mem), and ice_queue_mem_free() will tear down
the page pools whose pointers real->pp / real->hdr_pp are still aliasing.

Would it be safer to stash the existing real->pp / real->rx_fqes /
real->hdr_pp / real->hdr_fqes / real->xdp_rxq on entry and restore them if
ice_setup_rx_ctx() fails, so that a failed rollback cannot leave real with
dangling pointers to a pool that ice_queue_mem_free() is about to destroy?

> +
> +	napi = &real->q_vector->napi;
> +
> +	page_pool_enable_direct_recycling(real->pp, napi);
> +	if (real->hdr_pp)
> +		page_pool_enable_direct_recycling(real->hdr_pp, napi);
> +
> +	ret = ice_alloc_rx_bufs(real, ICE_DESC_UNUSED(real));
> +
> +	return __ice_qp_ena(priv->vsi, idx, ret);

ice_alloc_rx_bufs() is declared bool and returns !!cleaned_count, so it
yields 0 on full success and 1 on partial/failed allocation:

bool ice_alloc_rx_bufs(struct ice_rx_ring *rx_ring, unsigned int cleaned_count)
{
    ...
    return !!cleaned_count;
}

That bool is then passed straight into __ice_qp_ena()'s fail parameter,
which only overwrites fail via 'if (!fail) fail = err;', so a value of 1
propagates unchanged to ice_queue_start()'s return value.

netdev_rx_queue_reconfig() treats any non-zero return as failure and
propagates it back to userspace (netlink, ethtool, io_uring zcrx install),
where the callers expect a negative errno.

Before this patch, ice_qp_ena() obtained its fail value from
ice_vsi_cfg_single_rxq(), which returns a signed errno.

Should this be converted to a proper negative errno before being handed
to __ice_qp_ena(), for example:

	ret = ice_alloc_rx_bufs(real, ICE_DESC_UNUSED(real)) ? -ENOMEM : 0;

> +}
> +
> +static int ice_queue_stop(struct net_device *dev, void *per_queue_mem,
> +			  int idx)
> +{
> +	const struct ice_netdev_priv *priv = netdev_priv(dev);
> +	struct ice_rx_ring *real = priv->vsi->rx_rings[idx];
> +	int ret;
> +
> +	ret = __ice_qp_dis(priv->vsi, idx);
> +	if (ret)
> +		return ret;
> +
> +	page_pool_disable_direct_recycling(real->pp);
> +	if (real->hdr_pp)
> +		page_pool_disable_direct_recycling(real->hdr_pp);
> +
> +	ice_zero_rx_ring(real);
> +	memcpy(per_queue_mem, real, sizeof(*real));
> +
> +	return 0;
> +}

Can this leak inflight page pool buffers?

ice_zero_rx_ring(real) is called before the memcpy to per_queue_mem, and
ice_zero_rx_ring() resets both indices:

void ice_zero_rx_ring(struct ice_rx_ring *rx_ring)
{
    ...
    rx_ring->next_to_clean = 0;
    rx_ring->next_to_use = 0;
}

So per_queue_mem captures a ring where next_to_clean == next_to_use == 0.

The core then invokes ndo_queue_mem_free(dev, old_mem), and the recycle
loop in ice_queue_mem_free() is guarded by exactly those two indices:

void ice_queue_mem_free(struct net_device *dev, void *per_queue_mem)
{
    ...
    for (u32 i = rx_ring->next_to_clean; i != rx_ring->next_to_use; ) {
        libeth_rx_recycle_slow(rx_ring->rx_fqes[i].netmem);

        if (rx_ring->hdr_pp)
            libeth_rx_recycle_slow(rx_ring->hdr_fqes[i].netmem);

        if (unlikely(++i == rx_ring->count))
            i = 0;
    }
    ...
    ice_rxq_pp_destroy(rx_ring);
}

With 0 != 0 false on entry, the loop never runs, so any buffers that were
in rx_fqes[old_ntc..old_ntu) (and hdr_fqes[] when header split is on) are
never returned via libeth_rx_recycle_slow().  ice_rxq_pp_destroy() then
kvfree()s the fqes arrays and calls page_pool_destroy() with outstanding
inflight references that are no longer tracked anywhere.

The new ice_clean_rx_ring() wrapper keeps the original ordering:

void ice_clean_rx_ring(struct ice_rx_ring *rx_ring)
{
    ice_queue_mem_free(rx_ring->netdev, rx_ring);
    ice_zero_rx_ring(rx_ring);
}

Should ice_queue_stop() follow the same ordering and perform the memcpy
before ice_zero_rx_ring(real), so the recycle loop in ice_queue_mem_free()
sees the real ntc/ntu values?

> +
> +const struct netdev_queue_mgmt_ops ice_queue_mgmt_ops = {
> +	.ndo_queue_mem_alloc	= ice_queue_mem_alloc,
> +	.ndo_queue_mem_free	= ice_queue_mem_free,
> +	.ndo_queue_mem_size	= sizeof(struct ice_rx_ring),
> +	.ndo_queue_start	= ice_queue_start,
> +	.ndo_queue_stop		= ice_queue_stop,
> +};

[ ... ]

next prev parent reply	other threads:[~2026-05-08 14:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 15:29 [Intel-wired-lan] [PATCH iwl-next v5 0/5] ice: add support for devmem/io_uring Rx and Tx Alexander Lobakin
2026-05-05 15:29 ` Alexander Lobakin
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 1/5] libeth: pass Rx queue index to PP when creating a fill queue Alexander Lobakin
2026-05-05 15:29   ` Alexander Lobakin
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 2/5] libeth: handle creating pools with unreadable buffers Alexander Lobakin
2026-05-05 15:29   ` Alexander Lobakin
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 3/5] ice: migrate to netdev ops lock Alexander Lobakin
2026-05-05 15:29   ` Alexander Lobakin
2026-05-08 13:28   ` [Intel-wired-lan] " Simon Horman
2026-05-08 13:28     ` Simon Horman
2026-05-08 13:57     ` [Intel-wired-lan] " Alexander Lobakin
2026-05-08 13:57       ` Alexander Lobakin
2026-05-10  8:57       ` [Intel-wired-lan] " Simon Horman
2026-05-10  8:57         ` Simon Horman
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 4/5] ice: implement Rx queue management ops Alexander Lobakin
2026-05-05 15:29   ` Alexander Lobakin
2026-05-08 14:08   ` Simon Horman [this message]
2026-05-08 14:26     ` [Intel-wired-lan] " Simon Horman
2026-05-08 15:07     ` Alexander Lobakin
2026-05-08 14:25   ` Simon Horman
2026-05-08 14:25     ` Simon Horman
2026-05-05 15:29 ` [Intel-wired-lan] [PATCH iwl-next v5 5/5] ice: add support for transmitting unreadable frags Alexander Lobakin
2026-05-05 15:29   ` Alexander Lobakin
2026-05-08 15:42   ` [Intel-wired-lan] " Simon Horman
2026-05-08 15:42     ` Simon Horman
2026-05-08 12:06 ` [Intel-wired-lan] [PATCH iwl-next v5 0/5] ice: add support for devmem/io_uring Rx and Tx Alexander Lobakin
2026-05-08 12:06   ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508140758.1200568-3-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jacob.e.keller@intel.com \
    --cc=kohei@enjuk.jp \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nxne.cnse.osdt.itp.upstreaming@intel.com \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.