From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Björn Töpel" <bjorn@kernel.org>
Cc: <netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
<magnus.karlsson@intel.com>, <stfomichev@gmail.com>,
<kuba@kernel.org>, <pabeni@redhat.com>, <horms@kernel.org>,
<larysa.zaremba@intel.com>, <aleksander.lobakin@intel.com>
Subject: Re: [PATCH v2 net 5/8] xsk: validate MTU against usable frame size on bind
Date: Fri, 20 Mar 2026 16:51:16 +0100 [thread overview]
Message-ID: <ab1s9J5yGxCbqB28@boxer> (raw)
In-Reply-To: <87qzpeopzr.fsf@all.your.base.are.belong.to.us>
On Fri, Mar 20, 2026 at 09:38:00AM +0100, Björn Töpel wrote:
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
>
> > AF_XDP bind currently accepts zero-copy pool configurations without
> > verifying that the device MTU fits into the usable frame space provided
> > by the UMEM chunk.
> >
> > This becomes a problem since we started to respect tailroom which is
> > subtracted from chunk_size (among with headroom). 2k chunk size might
> > not provide enough space for standard 1500 MTU, so let us catch such
> > settings at bind time.
> >
> > This prevents creating an already-invalid setup and complements the
> > MTU change restriction for devices with an attached XSK pool.
> >
> > Currently xp_assign_dev_shared() is missing XDP_USE_SG being propagated
> > to flags so set it in order to preserve mtu check that is supposed to be
> > done only when no multi-buffer setup is in picture.
> >
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>
> Fixes tag?
My reasoning was that since this came up due to respecting tailroom I went
with no fixes tag, but missing mbuf flag setting for shared umem was an
existing bug, so we could pick something now.
>
> This got me thinking; Nothing in the xsk core prevents MTU from being
> changes while xsk is runing? Some drivers do! Not for this patch, but
> maybe xsk should listen to MTU notifiers?
Nice, I had exact patch locally, attaching at the bottom [0].
However I withdrew it as this would yield a rework on xskxceiver - test
suite actually changes MTU while keeping XSK resources alive.
We can get back to this idea but I didn't want to stir up the pot too much
in this set.
>
> Seems like we're in a bind-time TOCTOU gap...
>
> > ---
> > net/xdp/xsk_buff_pool.c | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > index 37b7a68b89b3..e9377b05118b 100644
> > --- a/net/xdp/xsk_buff_pool.c
> > +++ b/net/xdp/xsk_buff_pool.c
> > @@ -157,6 +157,7 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
> > int xp_assign_dev(struct xsk_buff_pool *pool,
> > struct net_device *netdev, u16 queue_id, u16 flags)
> > {
> > + bool mbuf = flags & XDP_USE_SG;
> > bool force_zc, force_copy;
> > struct netdev_bpf bpf;
> > int err = 0;
> > @@ -178,7 +179,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> > if (err)
> > return err;
> >
> > - if (flags & XDP_USE_SG)
> > + if (mbuf)
> > pool->umem->flags |= XDP_UMEM_SG_FLAG;
> >
> > if (flags & XDP_USE_NEED_WAKEUP)
> > @@ -200,10 +201,18 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
> > goto err_unreg_pool;
> > }
> >
> > - if (netdev->xdp_zc_max_segs == 1 && (flags & XDP_USE_SG)) {
> > + if (netdev->xdp_zc_max_segs == 1 && mbuf) {
> > err = -EOPNOTSUPP;
> > goto err_unreg_pool;
> > }
> > +#define ETH_PAD_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
>
> Yuk! Move this somewhere else.
>
> > + if (!mbuf) {
> > + if (READ_ONCE(netdev->mtu) + ETH_PAD_LEN >
>
> I think READ_ONCE sends wrong signal to readers. We're in an
> ASSERT_RTNL() region.
Good point!
>
> > + xsk_pool_get_rx_frame_size(pool)) {
> > + err = -EINVAL;
> > + goto err_unreg_pool;
> > + }
> > + }
> >
> > if (dev_get_min_mp_channel_count(netdev)) {
> > err = -EBUSY;
> > @@ -247,6 +256,9 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *umem_xs,
> > struct xdp_umem *umem = umem_xs->umem;
> >
> > flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
> > + if (umem->flags & XDP_UMEM_SG_FLAG)
> > + flags |= XDP_USE_SG;
> > +
> > if (umem_xs->pool->uses_need_wakeup)
> > flags |= XDP_USE_NEED_WAKEUP;
> >
> > --
> > 2.43.0
[0]:
Subject: [PATCH net 4/5] xsk: forbid MTU changes while an XSK pool is attached
AF_XDP pool setup depends on the netdev configuration that is in effect
at bind time.
In particular, the usable frame size seen by zero-copy drivers is
derived from the UMEM chunk layout. Changing MTU after a pool has been
attached can invalidate that setup and leave the device and pool with
incompatible packet geometry.
Reject NETDEV_PRECHANGEMTU when an XSK pool is attached to the device.
This keeps the policy in the AF_XDP code, avoids touching individual
ndo_change_mtu() implementations, and stops the MTU change before the
driver callback is reached.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
net/xdp/xsk.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index e6530996053b..73cbd5774031 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1790,6 +1790,18 @@ static int xsk_mmap(struct file *file, struct socket *sock,
return remap_vmalloc_range(vma, q->ring, 0);
}
+static bool xsk_dev_has_pool(struct net_device *dev)
+{
+ u32 i, n;
+
+ n = max_t(u32, dev->real_num_rx_queues, dev->real_num_tx_queues);
+ for (i = 0; i < n; i++)
+ if (xsk_get_pool_from_qid(dev, i))
+ return true;
+
+ return false;
+}
+
static int xsk_notifier(struct notifier_block *this,
unsigned long msg, void *ptr)
{
@@ -1798,6 +1810,11 @@ static int xsk_notifier(struct notifier_block *this,
struct sock *sk;
switch (msg) {
+ case NETDEV_PRECHANGEMTU:
+ if (xsk_dev_has_pool(dev))
+ return notifier_from_errno(-EBUSY);
+ break;
+
case NETDEV_UNREGISTER:
mutex_lock(&net->xdp.lock);
sk_for_each(sk, &net->xdp.list) {
--
2.43.0
next prev parent reply other threads:[~2026-03-20 15:51 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 17:55 [PATCH v2 net 0/8] xsk: tailroom reservation and MTU validation Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 1/8] xsk: tighten UMEM headroom validation to account for tailroom and min frame Maciej Fijalkowski
2026-03-20 8:13 ` Björn Töpel
2026-03-20 15:01 ` Stanislav Fomichev
2026-03-19 17:55 ` [PATCH v2 net 2/8] xsk: respect tailroom for ZC setups Maciej Fijalkowski
2026-03-20 8:14 ` Björn Töpel
2026-03-20 15:01 ` Stanislav Fomichev
2026-03-19 17:55 ` [PATCH v2 net 3/8] ice: do not round up result of dbuf calculation for xsk pool Maciej Fijalkowski
2026-03-20 8:18 ` Björn Töpel
2026-03-20 15:57 ` Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 4/8] i40e: do not round up result of dbuff " Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 5/8] xsk: validate MTU against usable frame size on bind Maciej Fijalkowski
2026-03-20 8:38 ` Björn Töpel
2026-03-20 15:51 ` Maciej Fijalkowski [this message]
2026-03-21 12:21 ` Maciej Fijalkowski
2026-03-19 17:55 ` [PATCH v2 net 6/8] selftests: bpf: fix pkt grow tests Maciej Fijalkowski
2026-03-20 8:40 ` Björn Töpel
2026-03-19 17:55 ` [PATCH v2 net 7/8] selftests: bpf: have a separate variable for drop test Maciej Fijalkowski
2026-03-20 8:41 ` Björn Töpel
2026-03-19 17:55 ` [PATCH v2 net 8/8] selftests: bpf: adjust rx_dropped xskxceiver's test to respect tailroom Maciej Fijalkowski
2026-03-20 8:42 ` Björn Töpel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab1s9J5yGxCbqB28@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=aleksander.lobakin@intel.com \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=larysa.zaremba@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox