From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Maoyi Xie <maoyixie.tju@gmail.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
edumazet@google.com
Cc: dsahern@kernel.org, kuznet@ms2.inr.ac.ru, willemb@google.com,
willemdebruijn.kernel@gmail.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH net v6] ipv6: flowlabel: enforce per-netns limit for unprivileged callers
Date: Sun, 03 May 2026 16:43:37 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.3269daabfa48e@gmail.com> (raw)
In-Reply-To: <20260502150918.4171847-1-maoyi.xie@ntu.edu.sg>
Maoyi Xie wrote:
> fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are file
> scope and shared across netns. mem_check() reads fl_size to decide
> whether to deny non-CAP_NET_ADMIN callers; capable() runs against
> init_user_ns, so an unprivileged user in any non-init userns can
> push fl_size past FL_MAX_SIZE - FL_MAX_SIZE/4 and starve every
> other unprivileged userns on the host.
>
> Add struct netns_ipv6::flowlabel_count, bumped and decremented next
> to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new field
> is placed in the existing 4-byte hole after ipmr_seq, so struct
> netns_ipv6 stays the same size on 64-bit builds.
>
> Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the file
> was added; machines and connection counts have grown.
>
> mem_check() folds an extra per-netns ceiling into the existing
> non-CAP_NET_ADMIN conditional. The ceiling is half of the total
> budget that unprivileged callers have ever been able to use, i.e.
> (FL_MAX_SIZE - FL_MAX_SIZE/4) / 2 = 3072 entries. With FL_MAX_SIZE
> doubled, this preserves the original per-user reach (~3K, what an
> unprivileged caller could already obtain before this change) while
> forcing an attacker to spread allocations across at least two
> netns to exhaust the global non-CAP_NET_ADMIN budget.
>
> CAP_NET_ADMIN against init_user_ns still bypasses both caps.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Suggested-by: Willem de Bruijn <willemb@google.com>
> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> ---
> v6 (this submission, addressing v5 review by Willem):
> - Rebased onto current net (resolves the conflict on
> include/net/netns/ipv6.h that v5 hit. ipmr_seq is now
> atomic_t but remains 4 bytes, so flowlabel_count still
> fills the 4-byte hole after it).
> - Restored fl_free() to its original position in both
> ip6_fl_gc() and ip6_fl_purge(). v5 had moved fl_free()
> after the new atomic_dec() to avoid the use-after-free
> on fl->fl_net. v6 instead caches fl->fl_net into a
> local before fl_free() in ip6_fl_gc(), and uses the
> net argument already in scope in ip6_fl_purge().
> v5: replaced the per-netns ceiling FL_MAX_SIZE/8 with the
> computed unpriv_user_limit = (FL_MAX_SIZE - FL_MAX_SIZE/4)/2,
> which evaluates to 3072. v4's FL_MAX_SIZE/8 = 1024 would
> have reduced the per-user budget below the ~3K an
> unprivileged caller could already obtain before any of
> this work, defeating the reason FL_MAX_SIZE was doubled
> in the first place.
> v4: addressed Willem's v3 review on netdev. Dropped the
> flowlabel_has_excl cacheline argument in favour of "fills
> the existing 4-byte hole after ipmr_seq", and reordered
> atomic_dec(&...flowlabel_count) to sit immediately after
> atomic_dec(&fl_size) in ip6_fl_gc and ip6_fl_purge.
> v3: addressed Willem's review on the private security@ thread.
> Merged FL_MAX_SIZE doubling, dropped test data, moved
> flowlabel_count near ipmr_seq, inlined fl->fl_net in
> ip6_fl_gc.
> v2: per-netns counter + cap, sent to security@ as a 2-patch
> series.
> v1: fix-shape sketch in original disclosure.
>
> include/net/netns/ipv6.h | 1 +
> net/ipv6/ip6_flowlabel.c | 14 ++++++++++++--
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index 499e42881..ef698f5fa 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -119,6 +119,7 @@ struct netns_ipv6 {
> struct fib_notifier_ops *notifier_ops;
> struct fib_notifier_ops *ip6mr_notifier_ops;
> atomic_t ipmr_seq;
> + atomic_t flowlabel_count;
> struct {
> struct hlist_head head;
> spinlock_t lock;
> diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> index c92f98c6f..28e43718d 100644
> --- a/net/ipv6/ip6_flowlabel.c
> +++ b/net/ipv6/ip6_flowlabel.c
> @@ -36,7 +36,7 @@
> /* FL hash table */
>
> #define FL_MAX_PER_SOCK 32
> -#define FL_MAX_SIZE 4096
> +#define FL_MAX_SIZE 8192
> #define FL_HASH_MASK 255
> #define FL_HASH(l) (ntohl(l)&FL_HASH_MASK)
>
> @@ -161,9 +161,12 @@ static void ip6_fl_gc(struct timer_list *unused)
> fl->expires = ttd;
> ttd = fl->expires;
> if (time_after_eq(now, ttd)) {
> + struct net *net = fl->fl_net;
> +
> *flp = fl->next;
> fl_free(fl);
> atomic_dec(&fl_size);
> + atomic_dec(&net->ipv6.flowlabel_count);
If resubmitting, moving fl_free here makes sense (only the second case
was entirely unnecessary).
> continue;
> }
> if (!sched || time_before(ttd, sched))
> @@ -197,6 +200,7 @@ static void __net_exit ip6_fl_purge(struct net *net)
> *flp = fl->next;
> fl_free(fl);
> atomic_dec(&fl_size);
> + atomic_dec(&net->ipv6.flowlabel_count);
> continue;
> }
> flp = &fl->next;
> @@ -245,6 +249,7 @@ static struct ip6_flowlabel *fl_intern(struct net *net,
> fl->next = fl_ht[FL_HASH(fl->label)];
> rcu_assign_pointer(fl_ht[FL_HASH(fl->label)], fl);
> atomic_inc(&fl_size);
> + atomic_inc(&net->ipv6.flowlabel_count);
> spin_unlock_bh(&ip6_fl_lock);
> rcu_read_unlock();
> return NULL;
> @@ -464,6 +469,9 @@ fl_create(struct net *net, struct sock *sk, struct in6_flowlabel_req *freq,
>
> static int mem_check(struct sock *sk)
> {
> + const int unpriv_total_limit = FL_MAX_SIZE - (FL_MAX_SIZE / 4);
> + const int unpriv_user_limit = unpriv_total_limit / 2;
> + struct net *net = sock_net(sk);
> int room = FL_MAX_SIZE - atomic_read(&fl_size);
> struct ipv6_fl_socklist *sfl;
> int count = 0;
> @@ -478,7 +486,9 @@ static int mem_check(struct sock *sk)
>
> if (room <= 0 ||
> ((count >= FL_MAX_PER_SOCK ||
> - (count > 0 && room < FL_MAX_SIZE/2) || room < FL_MAX_SIZE/4) &&
> + (count > 0 && room < FL_MAX_SIZE/2) ||
> + room < FL_MAX_SIZE/4 ||
And here make checkpatch happy and add spaces around the division
operator.
> + atomic_read(&net->ipv6.flowlabel_count) >= unpriv_user_limit) &&
> !capable(CAP_NET_ADMIN)))
> return -ENOBUFS;
>
> --
> 2.34.1
>
next prev parent reply other threads:[~2026-05-03 20:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-02 15:09 [PATCH net v6] ipv6: flowlabel: enforce per-netns limit for unprivileged callers Maoyi Xie
2026-05-02 16:53 ` Jakub Kicinski
2026-05-03 5:47 ` Maoyi Xie
2026-05-03 20:40 ` Willem de Bruijn
2026-05-03 20:43 ` Willem de Bruijn [this message]
2026-05-05 5:55 ` Maoyi Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.3269daabfa48e@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=maoyixie.tju@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.