From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81B931E8342 for ; Fri, 1 May 2026 14:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777644589; cv=none; b=eXViV6g5Ed2a5f54ee4uP7jgkDgPLFPFxlY3+Gy3mknfpXv5jZRehrOAifx8egj0vphDlZbGXFdR/ry3Kf4WsHpUdALbQV54WWwT/NcOH7fe6Cl8AF635CtPhLmw8H7lYGhQQkzNVpIk1ZJ1OewNCFyplMJ6YLW4eP7AEfm3po0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777644589; c=relaxed/simple; bh=/OrMxPCyOD1rXzCRz26nrqaa980nwBJdGqPGFsI+8QM=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: Mime-Version:Content-Type; b=PVsZZpfJOhScmNvv3YtM8I3/nBSo3EbikqDGyWCWaP5WQsP5GajFyX/3mzmDSRrd72FMMwwhKEpmESrQqefnnTJEKhzs37gN0hSh3ecfjXM5VsMw2PUI08MfjDo7wDoqYaIaFK33tsQiLSpbtwcg+EPCS72qtvVs6Uvw4GJmuDY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=owYELWbU; arc=none smtp.client-ip=209.85.128.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="owYELWbU" Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-7baee75f874so27942497b3.2 for ; Fri, 01 May 2026 07:09:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777644586; x=1778249386; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=naG0VHj9sEI1e5yJEOv+qDodVUJyEMefzNkfvw511o4=; b=owYELWbUIrKAHlwp1w9zk8iHwUYljTakDCjYxNb2xE4/Lw2jYrQt0qRNrkDc24hfOk xpN4+JSMnpmxmFF529+Okyg2ieSjCwr14x8W0yEuGTT6EpjUQlNZsRk4rSXDOZmlqDxB 8Fk95F/YSoPRVx+TxSrLThpT/sBQZvMfIh/t8+dawuuDUhKhPQ4lst02N4k8gubFcFjS ezl+Uzr1OkmPFIIIZsAn2X0v/7Gn27qFEmcF+6ziHyEH3ZDgndYpKdW2x1ORyxQrcbqg WOECJJfUy8d+cQOQ/yvEhFVDCfliWpeVcBQyTLOjwuE+j7icLfnT/2gnVSlOmofiXD+m 5cXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777644586; x=1778249386; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=naG0VHj9sEI1e5yJEOv+qDodVUJyEMefzNkfvw511o4=; b=ZaL/xd0q3KmCXUg2XAw5nd6dgcTCXnMMZ1ZXuWXYaVjx2DrIOfRwaOuLkks8vh3S/R Z8rXd8EgPSuth1+uUoLGfg8lZlYf0BsUIOEHBqdtv6mxX8ZGm8egdeNH61Zx069INNSr Qjs2UQRZikM6mk7rm5lsZPd/Plr6H7Umv5wcbP8YBocXVe0Sxlc4JZEfLitTDnwBpHjM Dumr5QnYze++LnuhL8ZKRyN4sdf8n2oLTtot8jllpK8iPzYcgzIN7AmFBU1urw3+s/KO 7hlWTfZiDejOo9MgH8TaTXAJNrxbouLcKiFWVu8n5+s400v9v1lzy3FjP0B/u8+wnceO BZ+g== X-Forwarded-Encrypted: i=1; AFNElJ9y7b/baBq0utc2vM7Tje/xS2uS2YWRkOHqfvCEJ3FFjXwJcgX6xtBIqy8airl9eH9JGUtyWYpY0LdqiC4=@vger.kernel.org X-Gm-Message-State: AOJu0YyJIN3EGuM1CE24LOLIIEQnjBlMa8/zj9RP8Q6idpTYucJcTZuZ EDXIYcRq6rcRHlDw7RgHtPqBUVQ6hvdnwoPQMqS9aozAhj/2ECpi3bXW X-Gm-Gg: AeBDiesnzZvdJhd/4v10RNBLflMB2s0sixfJe8pRZww04N7wR6kW7JFk64I1O8Gu4MU aolQyb5gbV4Ueb2KDHeKeD1b/ceZvAPqAB3brpNy+NWRVqBO3QXP2G0SMpuJbDNmyU+pL8dWEQW 3xvFxGdZP4JVgqa8UzhfOHA2DDjXJqggNmT5vWeXy6fEK0jxY9CBzcnv3ItShJySwnFqYHY+XyA Qt5hle7hiIFwDIVFUWBNeXgm1wXWYRdI1W80/K5QZq6qVdIlXwliHUVr56ezolLtyHPATknWKKH Mwc1uZPg9EBEROyWiScYhFPhEW/hNVZCbeEdZFeR8myn60MtoLMbh0zDHv9BA9iv1/Pk0k+kao+ +jIISx19BV2kc8tyM6zWEqMnGcnfSwIGgUUqWy8P2I1b6kOhZMVPgiMfv7WtVob5qlrqh1t04Z/ E0RVkcf0WvxnmVvhLL+M6yXAr3Vgz6WQwihw/Qm8i7ODVB0zrv3cQw8rpTgUDAbnglx3W69XVel 84J7FjmifXRy1Y= X-Received: by 2002:a05:690c:c4fa:b0:7ac:24ba:3e61 with SMTP id 00721157ae682-7bd5286ece5mr73018947b3.19.1777644586387; Fri, 01 May 2026 07:09:46 -0700 (PDT) Received: from gmail.com (172.235.85.34.bc.googleusercontent.com. [34.85.235.172]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7bd66888cc7sm10973117b3.44.2026.05.01.07.09.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 May 2026 07:09:45 -0700 (PDT) Date: Fri, 01 May 2026 10:09:44 -0400 From: Willem de Bruijn To: Maoyi Xie , netdev@vger.kernel.org Cc: willemdebruijn.kernel@gmail.com, willemb@google.com, edumazet@google.com, pabeni@redhat.com, kuba@kernel.org, davem@davemloft.net, dsahern@kernel.org, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org, stable@vger.kernel.org Message-ID: In-Reply-To: <20260501074130.3532402-1-maoyi.xie@ntu.edu.sg> References: <20260501074130.3532402-1-maoyi.xie@ntu.edu.sg> Subject: Re: [PATCH net v4] ipv6: flowlabel: enforce per-netns limit for unprivileged callers Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Maoyi Xie wrote: > fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are file > scope and shared across netns. mem_check() reads fl_size to decide > whether to deny non-CAP_NET_ADMIN callers; capable() runs against > init_user_ns, so an unprivileged user in any non-init userns can > push fl_size past FL_MAX_SIZE - FL_MAX_SIZE/4 and starve every > other unprivileged userns on the host. So previously a single unprivileged user could get 4K - 1K == 3K entries. Now it can only get 1K entries even after doubling FL_MAX_SIZE. The goal of doubling that was to avoid reducing the per-user limit. With the expanded limit, unprivileged users collectively can fill 6K entries. Should the check become that each individual user can only fill half of this. Keeping the original limit: const int unpriv_total_limit = FL_MAX_SIZE - (FL_MAX_SIZE / 4); const int unpriv_user_limit = unpriv_total_limit / 2; if (room <= 0 || ((count >= FL_MAX_PER_SOCK || - (count > 0 && room < FL_MAX_SIZE/2) || room < FL_MAX_SIZE/4) && + (count > 0 && room < FL_MAX_SIZE/2) || + room < FL_MAX_SIZE/4 || + atomic_read(&net->ipv6.flowlabel_count) >= unpriv_user_limit) && !capable(CAP_NET_ADMIN))) Sorry for not catching this sooner. > > Add struct netns_ipv6::flowlabel_count, bumped and decremented next > to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new field > is placed in the existing 4-byte hole after ipmr_seq, so struct > netns_ipv6 stays the same size on 64-bit builds. > > mem_check() folds an extra FL_MAX_SIZE/8 ceiling into the existing > non-CAP_NET_ADMIN conditional. > > Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the file > was added; machines and connection counts have grown. The new > per-netns ceiling is then 1024 flowlabels, half of FL_MAX_SIZE/4. > > CAP_NET_ADMIN against init_user_ns still bypasses both caps. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Suggested-by: Willem de Bruijn > Cc: stable@vger.kernel.org # v5.15+ > Signed-off-by: Maoyi Xie > --- > v4 (this submission, addressing v3 review by Willem): > - rephrased the flowlabel_count placement note: dropped the > flowlabel_has_excl cacheline argument; replaced with the > simpler "fills the existing 4-byte hole after ipmr_seq" fact. > - reordered atomic_dec(&...flowlabel_count) to sit immediately > after atomic_dec(&fl_size) in ip6_fl_gc and ip6_fl_purge so > the pairing is visually obvious. Both decs now happen before > fl_free(fl) since fl_free invalidates fl->fl_net. fl_intern > was already in this order. > v3: addressed Willem's review on the private security@ thread; > merged FL_MAX_SIZE doubling, dropped test data, moved > flowlabel_count near ipmr_seq, inlined fl->fl_net in ip6_fl_gc. > v2: per-netns counter + cap, sent to security@ as a 2-patch series. > v1: fix-shape sketch in original disclosure. > > include/net/netns/ipv6.h | 1 + > net/ipv6/ip6_flowlabel.c | 14 ++++++++++---- > 2 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h > index 34bdb1308..329482373 100644 > --- a/include/net/netns/ipv6.h > +++ b/include/net/netns/ipv6.h > @@ -119,6 +119,7 @@ struct netns_ipv6 { > struct fib_notifier_ops *notifier_ops; > struct fib_notifier_ops *ip6mr_notifier_ops; > unsigned int ipmr_seq; /* protected by rtnl_mutex */ > + atomic_t flowlabel_count; > struct { > struct hlist_head head; > spinlock_t lock; > diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c > index c92f98c6f..360109cad 100644 > --- a/net/ipv6/ip6_flowlabel.c > +++ b/net/ipv6/ip6_flowlabel.c > @@ -36,7 +36,7 @@ > /* FL hash table */ > > #define FL_MAX_PER_SOCK 32 > -#define FL_MAX_SIZE 4096 > +#define FL_MAX_SIZE 8192 > #define FL_HASH_MASK 255 > #define FL_HASH(l) (ntohl(l)&FL_HASH_MASK) > > @@ -162,8 +162,9 @@ static void ip6_fl_gc(struct timer_list *unused) > ttd = fl->expires; > if (time_after_eq(now, ttd)) { > *flp = fl->next; > - fl_free(fl); > atomic_dec(&fl_size); > + atomic_dec(&fl->fl_net->ipv6.flowlabel_count); > + fl_free(fl); > continue; > } > if (!sched || time_before(ttd, sched)) > @@ -195,8 +196,9 @@ static void __net_exit ip6_fl_purge(struct net *net) > if (net_eq(fl->fl_net, net) && > atomic_read(&fl->users) == 0) { > *flp = fl->next; > - fl_free(fl); > atomic_dec(&fl_size); > + atomic_dec(&net->ipv6.flowlabel_count); > + fl_free(fl); > continue; > } > flp = &fl->next; > @@ -245,6 +247,7 @@ static struct ip6_flowlabel *fl_intern(struct net *net, > fl->next = fl_ht[FL_HASH(fl->label)]; > rcu_assign_pointer(fl_ht[FL_HASH(fl->label)], fl); > atomic_inc(&fl_size); > + atomic_inc(&net->ipv6.flowlabel_count); > spin_unlock_bh(&ip6_fl_lock); > rcu_read_unlock(); > return NULL; > @@ -464,6 +467,7 @@ fl_create(struct net *net, struct sock *sk, struct in6_flowlabel_req *freq, > > static int mem_check(struct sock *sk) > { > + struct net *net = sock_net(sk); > int room = FL_MAX_SIZE - atomic_read(&fl_size); > struct ipv6_fl_socklist *sfl; > int count = 0; > @@ -478,7 +482,9 @@ static int mem_check(struct sock *sk) > > if (room <= 0 || > ((count >= FL_MAX_PER_SOCK || > - (count > 0 && room < FL_MAX_SIZE/2) || room < FL_MAX_SIZE/4) && > + (count > 0 && room < FL_MAX_SIZE/2) || > + room < FL_MAX_SIZE/4 || > + atomic_read(&net->ipv6.flowlabel_count) >= FL_MAX_SIZE/8) && > !capable(CAP_NET_ADMIN))) > return -ENOBUFS; > > -- > 2.34.1 >