public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Björn Töpel" <bjorn@kernel.org>
Cc: Stanislav Fomichev <stfomichev@gmail.com>,
	<netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <kuba@kernel.org>,
	<pabeni@redhat.com>, <horms@kernel.org>,
	<larysa.zaremba@intel.com>, <aleksander.lobakin@intel.com>
Subject: Re: [PATCH net 1/6] xsk: respect tailroom for ZC setups
Date: Tue, 17 Mar 2026 12:08:36 +0100	[thread overview]
Message-ID: <abk2NBesEnygstnP@boxer> (raw)
In-Reply-To: <878qbqrexu.fsf@all.your.base.are.belong.to.us>

On Tue, Mar 17, 2026 at 10:19:25AM +0100, Björn Töpel wrote:
> Stanislav Fomichev <stfomichev@gmail.com> writes:
> 
> > On 03/16, Maciej Fijalkowski wrote:
> >> Multi-buffer XDP stores information about frags in skb_shared_info that
> >> sits at the tailroom of a packet. The storage space is reserved via
> >> xdp_data_hard_end():
> >> 
> >> 	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> >> 	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> >> 
> >> and then we refer to it via macro below:
> >> 
> >> static inline struct skb_shared_info *
> >> xdp_get_shared_info_from_buff(const struct xdp_buff *xdp)
> >> {
> >>         return (struct skb_shared_info *)xdp_data_hard_end(xdp);
> >> }
> >> 
> >> Currently we do not respect this tailroom space in multi-buffer AF_XDP
> >> ZC scenario. To address this, introduce xsk_pool_get_tailroom() and use
> >> it within xsk_pool_get_rx_frame_size() which is used in ZC drivers to
> >> configure length of HW Rx buffer.
> >> 
> >> xsk_pool_get_tailroom() is only reserving necessary space when pool is
> >> zc and underlying netdev supports zc multi-buffer. Since this function
> >> relies on pool->umem->zc setting, set it before ndo_bpf during zc
> >> configuration, so that driver that actually calls
> >> xsk_pool_get_rx_frame_size() inside ndo_bpf will get correct tailroom
> >> value.
> >> 
> >> Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> >> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> >> ---
> >>  include/net/xdp_sock_drv.h | 21 ++++++++++++++++++++-
> >>  net/xdp/xsk_buff_pool.c    |  3 ++-
> >>  2 files changed, 22 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
> >> index 6b9ebae2dc95..13b2aae00737 100644
> >> --- a/include/net/xdp_sock_drv.h
> >> +++ b/include/net/xdp_sock_drv.h
> >> @@ -41,6 +41,19 @@ static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
> >>  	return XDP_PACKET_HEADROOM + pool->headroom;
> >>  }
> >>  
> >> +static inline u32 xsk_pool_get_tailroom(struct xsk_buff_pool *pool)
> >> +{
> >> +	struct xdp_umem *umem = pool->umem;
> >> +
> >> +	/* Reserve tailroom only for zero-copy pools that opted into
> >> +	 * multi-buffer. The reserved area is used for skb_shared_info,
> >> +	 * matching the XDP core's xdp_data_hard_end() layout.
> >> +	 */
> >> +	if (umem->zc && (umem->flags & XDP_UMEM_SG_FLAG))
> >> +		return SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >> +	return 0;
> >> +}
> >> +
> >>  static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
> >>  {
> >>  	return pool->chunk_size;
> >> @@ -48,7 +61,8 @@ static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
> >>  
> >>  static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
> >>  {
> >> -	return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool);
> >> +	return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool) -
> >> +	       xsk_pool_get_tailroom(pool);
> >>  }
> >>  
> >>  static inline u32 xsk_pool_get_rx_frag_step(struct xsk_buff_pool *pool)
> >> @@ -332,6 +346,11 @@ static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
> >>  	return 0;
> >>  }
> >
> > [..]
> >  
> >> +static inline u32 xsk_pool_get_tailroom(struct xsk_buff_pool *pool)
> >> +{
> >> +	return 0;
> >> +}
> >
> > Not sure it's needed? xsk_pool_get_tailroom is only used by
> > CONFIG_XDP_SOCKETS' version of xsk_pool_get_rx_frame_size.

To Stan - we probably could live without this, right.

> >
> >>  static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
> >>  {
> >>  	return 0;
> >> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> >> index 37b7a68b89b3..2cfc19e363e3 100644
> >> --- a/net/xdp/xsk_buff_pool.c
> >> +++ b/net/xdp/xsk_buff_pool.c
> >> @@ -213,6 +213,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> >>  	bpf.command = XDP_SETUP_XSK_POOL;
> >>  	bpf.xsk.pool = pool;
> >>  	bpf.xsk.queue_id = queue_id;
> >> +	pool->umem->zc = true;
> >>  
> >>  	netdev_ops_assert_locked(netdev);
> >>  	err = netdev->netdev_ops->ndo_bpf(netdev, &bpf);
> >> @@ -224,13 +225,13 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> >>  		err = -EINVAL;
> >>  		goto err_unreg_xsk;
> >>  	}
> >> -	pool->umem->zc = true;
> >>  	pool->xdp_zc_max_segs = netdev->xdp_zc_max_segs;
> >>  	return 0;
> >>  
> >>  err_unreg_xsk:
> >>  	xp_disable_drv_zc(pool);
> >>  err_unreg_pool:
> >> +	pool->umem->zc = false;
> >>  	if (!force_zc)
> >>  		err = 0; /* fallback to copy mode */
> >>  	if (err) {
> >
> > I'm not super familiar with the shared umem patch, but is it safe to
> > unconditionally undo pool->umem->zc = false here? xp_assign_dev_shared
> > looks at this umem->zc flag.. Presumably other places do as well on
> > teardown?
> 
> Good catch!
> 
> I can elaborate a bit; the zero-copy property of umem is shared between
> all users (sockets) of that umem. IOW, all sockets sharing an umem,
> inherits whatever the first socket negotiated.
> 
> So, we could get into something like:
> 
> 1. Socket A binds queue 0, ndo_bpf OK (umem->zc = true)
> 2. Socket B binds queue 1 via xp_assign_dev_shared()
>    reads umem->zc == true, so flags = XDP_ZEROCOPY
>    xp_assign_dev() sets umem->zc = true
>    ndo_bpf() NOK for queue 1 -> error path: umem->zc = false (oops)
> 3. Socket A is still active on queue 0 in ZC mode, but umem->zc is now
>    false
> 
> ...and we'll have a bunch of checks on umem->zc that now has incorrect
> state.
> 
> From this follows that the zc flag shouldn't be toggled on a shared
> resource without checking if other consumers exist. I think a per-pool
> zc flag is needed here or smth. :-(

Larysa suggested to use pool->dev ptr instead of touching umem->zc, let me
see if that will be sufficient. Besides that is currently broken as you
guys are saying!

> 
> 
> Björn

  reply	other threads:[~2026-03-17 11:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 17:45 [PATCH net 0/6] xsk: tailroom reservation and MTU validation Maciej Fijalkowski
2026-03-16 17:45 ` [PATCH net 1/6] xsk: respect tailroom for ZC setups Maciej Fijalkowski
2026-03-16 22:53   ` Stanislav Fomichev
2026-03-17  9:19     ` Björn Töpel
2026-03-17 11:08       ` Maciej Fijalkowski [this message]
2026-03-16 17:45 ` [PATCH net 2/6] ice: do not round up result of dbuff calculation for xsk pool Maciej Fijalkowski
2026-03-17  9:21   ` Björn Töpel
2026-03-16 17:45 ` [PATCH net 3/6] i40e: " Maciej Fijalkowski
2026-03-17  9:21   ` Björn Töpel
2026-03-16 17:45 ` [PATCH net 4/6] xsk: validate MTU against usable frame size on bind Maciej Fijalkowski
2026-03-17  9:30   ` Björn Töpel
2026-03-18 16:46   ` Alexander Lobakin
2026-03-16 17:45 ` [PATCH net 5/6] selftests: bpf: fix pkt grow tests Maciej Fijalkowski
2026-03-17  9:27   ` Björn Töpel
2026-03-17 10:57     ` Maciej Fijalkowski
2026-03-17 12:13       ` Björn Töpel
2026-03-16 17:45 ` [PATCH net 6/6] selftests: bpf: have a separate variable for drop test Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abk2NBesEnygstnP@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox