From: Maoyi Xie <maoyixie.tju@gmail.com>
To: "David S . Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
David Ahern <dsahern@kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
Willem de Bruijn <willemb@google.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, Maoyi Xie <maoyi.xie@ntu.edu.sg>
Subject: [PATCH net v8 0/2] ipv6: flowlabel: per-netns budget for unprivileged callers
Date: Wed, 6 May 2026 16:24:14 +0800 [thread overview]
Message-ID: <20260506082416.2259567-1-maoyixie.tju@gmail.com> (raw)
From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
This series fixes the cross-tenant DoS in net/ipv6/ip6_flowlabel.c.
v1 through v6 were single-patch postings, each in its own thread.
v6 review pointed out that the existing fl_size read in
mem_check() and the corresponding write in fl_intern() are not in
the same critical section. v7 split the work into 2 patches.
Patch 1/2 is a prerequisite. It moves spin_lock_bh(&ip6_fl_lock)
and the matching unlock from fl_intern() into its only caller
ipv6_flowlabel_get(), so the mem_check() call runs under the same
critical section as the fl_intern() insert. With all writers and
the read of fl_size under the lock, fl_size is converted from
atomic_t to plain int. This is independent of the per-netns
budget. It also makes 2/2 backportable without conflicts.
Patch 2/2 is the v6 patch, rebased on 1/2.
- flowlabel_count is plain int rather than atomic_t, since the
previous patch put all writers and readers under ip6_fl_lock.
- In ip6_fl_gc(), fl_free() is now placed below the fl_size
and flowlabel_count decrements, removing the v6 cache of
fl->fl_net.
- In ip6_fl_purge(), fl_free() stays in its original position.
The function argument net is used for flowlabel_count.
- mem_check() uses spaces around the / operator on all four
expressions, addressing the checkpatch note in v6 review.
Numeric budget (preserved from v6):
pre-patch:
global non-CAP_NET_ADMIN budget = FL_MAX_SIZE - FL_MAX_SIZE/4
= 4096 - 1024 = 3072
per-actor reach = 3072
post-patch:
FL_MAX_SIZE doubled to 8192
global non-CAP_NET_ADMIN budget = 8192 - 2048 = 6144
per-netns ceiling = 6144 / 2 = 3072
per-actor reach = 3072 (preserved)
CAP_NET_ADMIN against init_user_ns still bypasses both caps.
Reproducer (KASAN VM, 4 cores, qemu): unprivileged netns A holds
3072 flowlabels via 100 procs. Fresh unprivileged netns B then
allocates 32 flowlabels (the FL_MAX_PER_SOCK ceiling for one
socket), the same as a clean baseline. Without the per-netns
ceiling, netns A could push fl_size past FL_MAX_SIZE - FL_MAX_SIZE
/ 4 and netns B would see allocations denied.
v8:
- 1/2: replaced the "Caller must hold ip6_fl_lock" comment in
fl_intern() with lockdep_assert_held(&ip6_fl_lock), matching
the runtime check already used in mem_check(), per Willem's
review.
- 1/2: added Fixes: 1da177e4c3f4 trailer to match 2/2, per
Willem's review.
- Carried forward Reviewed-by: Willem de Bruijn on both
patches.
- No code change beyond the lockdep_assert_held swap.
v7:
- 2-patch series: 1/2 (lock prep) and 2/2 (v6 rebased on 1/2).
- 2/2: flowlabel_count int, fl_free() reorder removed in
ip6_fl_purge(), checkpatch / spacing in mem_check() fixed.
v6: rebased onto current net (resolves the conflict on
include/net/netns/ipv6.h that v5 hit). fl_free() restored
to its pre-series position, with fl->fl_net cached locally
in ip6_fl_gc().
v5: replaced the per-netns ceiling FL_MAX_SIZE/8 with the
computed unpriv_user_limit = (FL_MAX_SIZE - FL_MAX_SIZE/4)/2,
which evaluates to 3072.
v4: addressed Willem's v3 review on netdev. Dropped the
flowlabel_has_excl cacheline argument in favour of "fills
the existing 4-byte hole after ipmr_seq".
v3: addressed Willem's review on the private security@ thread.
Merged FL_MAX_SIZE doubling, dropped test data, moved
flowlabel_count near ipmr_seq, inlined fl->fl_net in
ip6_fl_gc().
v2: per-netns counter + cap, sent to security@ as a 2-patch
series.
v1: fix-shape sketch in original disclosure.
Maoyi Xie (2):
ipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern
ipv6: flowlabel: enforce per-netns limit for unprivileged callers
include/net/netns/ipv6.h | 1 +
net/ipv6/ip6_flowlabel.c | 46 +++++++++++++++++++++++++++-------------
2 files changed, 32 insertions(+), 15 deletions(-)
base-commit: ebb639024ebd47a13a511cce6ae630c15e4b3126
--
2.34.1
next reply other threads:[~2026-05-06 8:24 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 8:24 Maoyi Xie [this message]
2026-05-06 8:24 ` [PATCH net v8 1/2] ipv6: flowlabel: take ip6_fl_lock across mem_check and fl_intern Maoyi Xie
2026-05-06 8:24 ` [PATCH net v8 2/2] ipv6: flowlabel: enforce per-netns limit for unprivileged callers Maoyi Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260506082416.2259567-1-maoyixie.tju@gmail.com \
--to=maoyixie.tju@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=maoyi.xie@ntu.edu.sg \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox