public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Björn Töpel" <bjorn@kernel.org>
Cc: <netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <stfomichev@gmail.com>,
	<kuba@kernel.org>, <pabeni@redhat.com>, <horms@kernel.org>,
	<larysa.zaremba@intel.com>, <aleksander.lobakin@intel.com>
Subject: Re: [PATCH v2 net 5/8] xsk: validate MTU against usable frame size on bind
Date: Fri, 20 Mar 2026 16:51:16 +0100	[thread overview]
Message-ID: <ab1s9J5yGxCbqB28@boxer> (raw)
In-Reply-To: <87qzpeopzr.fsf@all.your.base.are.belong.to.us>

On Fri, Mar 20, 2026 at 09:38:00AM +0100, Björn Töpel wrote:
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
> 
> > AF_XDP bind currently accepts zero-copy pool configurations without
> > verifying that the device MTU fits into the usable frame space provided
> > by the UMEM chunk.
> >
> > This becomes a problem since we started to respect tailroom which is
> > subtracted from chunk_size (among with headroom). 2k chunk size might
> > not provide enough space for standard 1500 MTU, so let us catch such
> > settings at bind time.
> >
> > This prevents creating an already-invalid setup and complements the
> > MTU change restriction for devices with an attached XSK pool.
> >
> > Currently xp_assign_dev_shared() is missing XDP_USE_SG being propagated
> > to flags so set it in order to preserve mtu check that is supposed to be
> > done only when no multi-buffer setup is in picture.
> >
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> 
> Fixes tag?

My reasoning was that since this came up due to respecting tailroom I went
with no fixes tag, but missing mbuf flag setting for shared umem was an
existing bug, so we could pick something now.

> 
> This got me thinking; Nothing in the xsk core prevents MTU from being
> changes while xsk is runing? Some drivers do! Not for this patch, but
> maybe xsk should listen to MTU notifiers?

Nice, I had exact patch locally, attaching at the bottom [0].
However I withdrew it as this would yield a rework on xskxceiver - test
suite actually changes MTU while keeping XSK resources alive.

We can get back to this idea but I didn't want to stir up the pot too much
in this set.

> 
> Seems like we're in a bind-time TOCTOU gap...
> 
> > ---
> >  net/xdp/xsk_buff_pool.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > index 37b7a68b89b3..e9377b05118b 100644
> > --- a/net/xdp/xsk_buff_pool.c
> > +++ b/net/xdp/xsk_buff_pool.c
> > @@ -157,6 +157,7 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
> >  int xp_assign_dev(struct xsk_buff_pool *pool,
> >  		  struct net_device *netdev, u16 queue_id, u16 flags)
> >  {
> > +	bool mbuf = flags & XDP_USE_SG;
> >  	bool force_zc, force_copy;
> >  	struct netdev_bpf bpf;
> >  	int err = 0;
> > @@ -178,7 +179,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> >  	if (err)
> >  		return err;
> >  
> > -	if (flags & XDP_USE_SG)
> > +	if (mbuf)
> >  		pool->umem->flags |= XDP_UMEM_SG_FLAG;
> >  
> >  	if (flags & XDP_USE_NEED_WAKEUP)
> > @@ -200,10 +201,18 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> >  		goto err_unreg_pool;
> >  	}
> >  
> > -	if (netdev->xdp_zc_max_segs == 1 && (flags & XDP_USE_SG)) {
> > +	if (netdev->xdp_zc_max_segs == 1 && mbuf) {
> >  		err = -EOPNOTSUPP;
> >  		goto err_unreg_pool;
> >  	}
> > +#define ETH_PAD_LEN (ETH_HLEN + 2 * VLAN_HLEN  + ETH_FCS_LEN)
> 
> Yuk! Move this somewhere else.
> 
> > +	if (!mbuf) {
> > +		if (READ_ONCE(netdev->mtu) +  ETH_PAD_LEN >
> 
> I think READ_ONCE sends wrong signal to readers. We're in an
> ASSERT_RTNL() region.

Good point!

> 
> > +		    xsk_pool_get_rx_frame_size(pool)) {
> > +			err = -EINVAL;
> > +			goto err_unreg_pool;
> > +		}
> > +	}
> >  
> >  	if (dev_get_min_mp_channel_count(netdev)) {
> >  		err = -EBUSY;
> > @@ -247,6 +256,9 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *umem_xs,
> >  	struct xdp_umem *umem = umem_xs->umem;
> >  
> >  	flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
> > +	if (umem->flags & XDP_UMEM_SG_FLAG)
> > +		flags |= XDP_USE_SG;
> > +
> >  	if (umem_xs->pool->uses_need_wakeup)
> >  		flags |= XDP_USE_NEED_WAKEUP;
> >  
> > -- 
> > 2.43.0

[0]:
Subject: [PATCH net 4/5] xsk: forbid MTU changes while an XSK pool is attached

AF_XDP pool setup depends on the netdev configuration that is in effect
at bind time.

In particular, the usable frame size seen by zero-copy drivers is
derived from the UMEM chunk layout. Changing MTU after a pool has been
attached can invalidate that setup and leave the device and pool with
incompatible packet geometry.

Reject NETDEV_PRECHANGEMTU when an XSK pool is attached to the device.
This keeps the policy in the AF_XDP code, avoids touching individual
ndo_change_mtu() implementations, and stops the MTU change before the
driver callback is reached.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 net/xdp/xsk.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index e6530996053b..73cbd5774031 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1790,6 +1790,18 @@ static int xsk_mmap(struct file *file, struct socket *sock,
 	return remap_vmalloc_range(vma, q->ring, 0);
 }
 
+static bool xsk_dev_has_pool(struct net_device *dev)
+{
+	u32 i, n;
+
+	n = max_t(u32, dev->real_num_rx_queues, dev->real_num_tx_queues);
+	for (i = 0; i < n; i++)
+		if (xsk_get_pool_from_qid(dev, i))
+			return true;
+
+	return false;
+}
+
 static int xsk_notifier(struct notifier_block *this,
 			unsigned long msg, void *ptr)
 {
@@ -1798,6 +1810,11 @@ static int xsk_notifier(struct notifier_block *this,
 	struct sock *sk;
 
 	switch (msg) {
+	case NETDEV_PRECHANGEMTU:
+		if (xsk_dev_has_pool(dev))
+			return notifier_from_errno(-EBUSY);
+		break;
+
 	case NETDEV_UNREGISTER:
 		mutex_lock(&net->xdp.lock);
 		sk_for_each(sk, &net->xdp.list) {
-- 
2.43.0


  reply	other threads:[~2026-03-20 15:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 17:55 [PATCH v2 net 0/8] xsk: tailroom reservation and MTU validation Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 1/8] xsk: tighten UMEM headroom validation to account for tailroom and min frame Maciej Fijalkowski
2026-03-20  8:13   ` Björn Töpel
2026-03-20 15:01   ` Stanislav Fomichev
2026-03-19 17:55 ` [PATCH v2 net 2/8] xsk: respect tailroom for ZC setups Maciej Fijalkowski
2026-03-20  8:14   ` Björn Töpel
2026-03-20 15:01   ` Stanislav Fomichev
2026-03-19 17:55 ` [PATCH v2 net 3/8] ice: do not round up result of dbuf calculation for xsk pool Maciej Fijalkowski
2026-03-20  8:18   ` Björn Töpel
2026-03-20 15:57     ` Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 4/8] i40e: do not round up result of dbuff " Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 5/8] xsk: validate MTU against usable frame size on bind Maciej Fijalkowski
2026-03-20  8:38   ` Björn Töpel
2026-03-20 15:51     ` Maciej Fijalkowski [this message]
2026-03-21 12:21       ` Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 6/8] selftests: bpf: fix pkt grow tests Maciej Fijalkowski
2026-03-20  8:40   ` Björn Töpel
2026-03-19 17:55 ` [PATCH v2 net 7/8] selftests: bpf: have a separate variable for drop test Maciej Fijalkowski
2026-03-20  8:41   ` Björn Töpel
2026-03-19 17:55 ` [PATCH v2 net 8/8] selftests: bpf: adjust rx_dropped xskxceiver's test to respect tailroom Maciej Fijalkowski
2026-03-20  8:42   ` Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab1s9J5yGxCbqB28@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox