From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Björn Töpel" <bjorn@kernel.org>
Cc: <netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
<magnus.karlsson@intel.com>, <stfomichev@gmail.com>,
<kuba@kernel.org>, <pabeni@redhat.com>, <horms@kernel.org>,
<larysa.zaremba@intel.com>, <aleksander.lobakin@intel.com>
Subject: Re: [PATCH v2 net 5/8] xsk: validate MTU against usable frame size on bind
Date: Fri, 20 Mar 2026 16:51:16 +0100 [thread overview]
Message-ID: <ab1s9J5yGxCbqB28@boxer> (raw)
In-Reply-To: <87qzpeopzr.fsf@all.your.base.are.belong.to.us>
On Fri, Mar 20, 2026 at 09:38:00AM +0100, Björn Töpel wrote:
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
>
> > AF_XDP bind currently accepts zero-copy pool configurations without
> > verifying that the device MTU fits into the usable frame space provided
> > by the UMEM chunk.
> >
> > This becomes a problem since we started to respect tailroom which is
> > subtracted from chunk_size (among with headroom). 2k chunk size might
> > not provide enough space for standard 1500 MTU, so let us catch such
> > settings at bind time.
> >
> > This prevents creating an already-invalid setup and complements the
> > MTU change restriction for devices with an attached XSK pool.
> >
> > Currently xp_assign_dev_shared() is missing XDP_USE_SG being propagated
> > to flags so set it in order to preserve mtu check that is supposed to be
> > done only when no multi-buffer setup is in picture.
> >
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>
> Fixes tag?
My reasoning was that since this came up due to respecting tailroom I went
with no fixes tag, but missing mbuf flag setting for shared umem was an
existing bug, so we could pick something now.
>
> This got me thinking; Nothing in the xsk core prevents MTU from being
> changes while xsk is runing? Some drivers do! Not for this patch, but
> maybe xsk should listen to MTU notifiers?
Nice, I had exact patch locally, attaching at the bottom [0].
However I withdrew it as this would yield a rework on xskxceiver - test
suite actually changes MTU while keeping XSK resources alive.
We can get back to this idea but I didn't want to stir up the pot too much
in this set.
>
> Seems like we're in a bind-time TOCTOU gap...
>
> > ---
> > net/xdp/xsk_buff_pool.c | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > index 37b7a68b89b3..e9377b05118b 100644
> > --- a/net/xdp/xsk_buff_pool.c
> > +++ b/net/xdp/xsk_buff_pool.c
> > @@ -157,6 +157,7 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
> > int xp_assign_dev(struct xsk_buff_pool *pool,
> > struct net_device *netdev, u16 queue_id, u16 flags)
> > {
> > + bool mbuf = flags & XDP_USE_SG;
> > bool force_zc, force_copy;
> > struct netdev_bpf bpf;
> > int err = 0;
> > @@ -178,7 +179,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> > if (err)
> > return err;
> >
> > - if (flags & XDP_USE_SG)
> > + if (mbuf)
> > pool->umem->flags |= XDP_UMEM_SG_FLAG;
> >
> > if (flags & XDP_USE_NEED_WAKEUP)
> > @@ -200,10 +201,18 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> > goto err_unreg_pool;
> > }
> >
> > - if (netdev->xdp_zc_max_segs == 1 && (flags & XDP_USE_SG)) {
> > + if (netdev->xdp_zc_max_segs == 1 && mbuf) {
> > err = -EOPNOTSUPP;
> > goto err_unreg_pool;
> > }
> > +#define ETH_PAD_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
>
> Yuk! Move this somewhere else.
>
> > + if (!mbuf) {
> > + if (READ_ONCE(netdev->mtu) + ETH_PAD_LEN >
>
> I think READ_ONCE sends wrong signal to readers. We're in an
> ASSERT_RTNL() region.
Good point!
>
> > + xsk_pool_get_rx_frame_size(pool)) {
> > + err = -EINVAL;
> > + goto err_unreg_pool;
> > + }
> > + }
> >
> > if (dev_get_min_mp_channel_count(netdev)) {
> > err = -EBUSY;
> > @@ -247,6 +256,9 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *umem_xs,
> > struct xdp_umem *umem = umem_xs->umem;
> >
> > flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
> > + if (umem->flags & XDP_UMEM_SG_FLAG)
> > + flags |= XDP_USE_SG;
> > +
> > if (umem_xs->pool->uses_need_wakeup)
> > flags |= XDP_USE_NEED_WAKEUP;
> >
> > --
> > 2.43.0
[0]:
Subject: [PATCH net 4/5] xsk: forbid MTU changes while an XSK pool is attached
AF_XDP pool setup depends on the netdev configuration that is in effect
at bind time.
In particular, the usable frame size seen by zero-copy drivers is
derived from the UMEM chunk layout. Changing MTU after a pool has been
attached can invalidate that setup and leave the device and pool with
incompatible packet geometry.
Reject NETDEV_PRECHANGEMTU when an XSK pool is attached to the device.
This keeps the policy in the AF_XDP code, avoids touching individual
ndo_change_mtu() implementations, and stops the MTU change before the
driver callback is reached.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
net/xdp/xsk.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index e6530996053b..73cbd5774031 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1790,6 +1790,18 @@ static int xsk_mmap(struct file *file, struct socket *sock,
return remap_vmalloc_range(vma, q->ring, 0);
}
+static bool xsk_dev_has_pool(struct net_device *dev)
+{
+ u32 i, n;
+
+ n = max_t(u32, dev->real_num_rx_queues, dev->real_num_tx_queues);
+ for (i = 0; i < n; i++)
+ if (xsk_get_pool_from_qid(dev, i))
+ return true;
+
+ return false;
+}
+
static int xsk_notifier(struct notifier_block *this,
unsigned long msg, void *ptr)
{
@@ -1798,6 +1810,11 @@ static int xsk_notifier(struct notifier_block *this,
struct sock *sk;
switch (msg) {
+ case NETDEV_PRECHANGEMTU:
+ if (xsk_dev_has_pool(dev))
+ return notifier_from_errno(-EBUSY);
+ break;
+
case NETDEV_UNREGISTER:
mutex_lock(&net->xdp.lock);
sk_for_each(sk, &net->xdp.list) {
--
2.43.0
next prev parent reply other threads:[~2026-03-20 15:51 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 17:55 [PATCH v2 net 0/8] xsk: tailroom reservation and MTU validation Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 1/8] xsk: tighten UMEM headroom validation to account for tailroom and min frame Maciej Fijalkowski
2026-03-20 8:13 ` Björn Töpel
2026-03-20 15:01 ` Stanislav Fomichev
2026-03-19 17:55 ` [PATCH v2 net 2/8] xsk: respect tailroom for ZC setups Maciej Fijalkowski
2026-03-20 8:14 ` Björn Töpel
2026-03-20 15:01 ` Stanislav Fomichev
2026-03-19 17:55 ` [PATCH v2 net 3/8] ice: do not round up result of dbuf calculation for xsk pool Maciej Fijalkowski
2026-03-20 8:18 ` Björn Töpel
2026-03-20 15:57 ` Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 4/8] i40e: do not round up result of dbuff " Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 5/8] xsk: validate MTU against usable frame size on bind Maciej Fijalkowski
2026-03-20 8:38 ` Björn Töpel
2026-03-20 15:51 ` Maciej Fijalkowski [this message]
2026-03-21 12:21 ` Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 6/8] selftests: bpf: fix pkt grow tests Maciej Fijalkowski
2026-03-20 8:40 ` Björn Töpel
2026-03-19 17:55 ` [PATCH v2 net 7/8] selftests: bpf: have a separate variable for drop test Maciej Fijalkowski
2026-03-20 8:41 ` Björn Töpel
2026-03-19 17:55 ` [PATCH v2 net 8/8] selftests: bpf: adjust rx_dropped xskxceiver's test to respect tailroom Maciej Fijalkowski
2026-03-20 8:42 ` Björn Töpel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab1s9J5yGxCbqB28@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=aleksander.lobakin@intel.com \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=larysa.zaremba@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.