netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] net: socket: centralize netns refcounting
@ 2025-05-16 22:09 Enzo Matsumiya
  2025-05-16 22:09 ` [RFC PATCH 1/1] net: socket: hint netns refcounting through @kern arg Enzo Matsumiya
  2025-05-17  2:51 ` [RFC PATCH 0/1] net: socket: centralize netns refcounting Kuniyuki Iwashima
  0 siblings, 2 replies; 3+ messages in thread
From: Enzo Matsumiya @ 2025-05-16 22:09 UTC (permalink / raw)
  To: netdev
  Cc: Enzo Matsumiya, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, Willem de Bruijn,
	linux-kernel

Hi,

I came up with this patch to centralize netns refcounting on kernel sockets,
because `sk_net_refcnt = !kern' is not enough anymore.

The idea is simply to remove the responsibility of a module outside of net/
to have to deal with sockets internals (cf. sk_net_refcnt_upgrade()).

It adds an anonymous enum (just for named values) SOCK_NETNS_REFCNT_* that
can be passed to __sock_create() and sk_alloc() through the @kern arg.
(this was much easier and shorter than e.g. adding another arg)

A sock_create_netns() wrapper is added, for callers who need such refcounting
(e.g. current callers of sk_net_refcnt_upgrade()).

And then, the core change is quite simple in sk_alloc() -- sk_net_refcnt is
set only if it's a user socket, or
(@kern == SOCK_NETNS_REFCNT_KERN_ANY && @net != inet_net).

I have the patches that modifies current users of sk_net_refcnt_upgrade() to
create their sockets with sock_create_netns(), if anyone wants to test or
this gets merged.

I could confirm this works only on cifs, though, by using Kuniyuki's reproducer
in [0], which is quite reliable.  Unfortunately, I don't know enough about the
other modules and/or how to trigger this same bug on those, but I'll be happy
to test it if I can get instructions.


Cheers,

Enzo


[0] - https://lore.kernel.org/linux-cifs/20241031175709.20111-1-kuniyu@amazon.com/

Enzo Matsumiya (1):
  net: socket: hint netns refcounting through @kern arg

 include/linux/net.h | 15 +++++++++++++++
 net/core/sock.c     | 10 ++++++----
 net/socket.c        | 27 +++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 6 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC PATCH 1/1] net: socket: hint netns refcounting through @kern arg
  2025-05-16 22:09 [RFC PATCH 0/1] net: socket: centralize netns refcounting Enzo Matsumiya
@ 2025-05-16 22:09 ` Enzo Matsumiya
  2025-05-17  2:51 ` [RFC PATCH 0/1] net: socket: centralize netns refcounting Kuniyuki Iwashima
  1 sibling, 0 replies; 3+ messages in thread
From: Enzo Matsumiya @ 2025-05-16 22:09 UTC (permalink / raw)
  To: netdev
  Cc: Enzo Matsumiya, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, Willem de Bruijn,
	linux-kernel

Some modules require the netns a socket resides in to be properly
refcounted to avoid UAF (e.g. netns is gone before socket, socket goes
away before TCP timers kicking in).

Such refcounting is done based on sk->sk_net_refcnt, which, in turn, is
set based on @kern arg -- kernel sockets are not netns refcounted by
default.

In order to deal with that, modules are allocating a kernel socket, and,
right after, calling sk_net_refcnt_upgrade() (which sets sk->sk_net_refcnt
to 1 and do the proper setup for the netns refcounter).

This patch aims to centralize this behaviour on sk_alloc() by changing
the @kern arg to accept newly added SOCK_NETNS_REFCNT_* values.

Practically it only adds a third value which means "kernel socket with
netns refcounting".

To maintain compatibility with the previous boolean behaviour
(@kern/!@kern), SOCK_NETNS_REFCNT_USER is 0 and
SOCK_NETNS_REFCNT_KERN_* > 0.

Also add a sock_create_netns() wrapper.  Callers that need a kernel
socket with netns refcounting may use this wrapper.

Follow up patch will update callers of sk_net_refcnt_upgrade()
to use this new option instead.

Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
---
 include/linux/net.h | 15 +++++++++++++++
 net/core/sock.c     | 10 ++++++----
 net/socket.c        | 27 +++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 0ff950eecc6b..bf5e2e68cee5 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -247,6 +247,20 @@ enum {
 	SOCK_WAKE_URG,
 };
 
+/*
+ * sock netns refcounting modes:
+ *
+ * @SOCK_NETNS_REFCNT_USER: user sockets always hold a netns reference
+ * @SOCK_NETNS_REFCNT_KERN_NONE: kernel socket will not hold active netns reference
+ * @SOCK_NETNS_REFCNT_KERN_ANY: kernel socket will hold active reference for any netns
+ *				(but init_net)
+ */
+enum {
+	SOCK_NETNS_REFCNT_USER,
+	SOCK_NETNS_REFCNT_KERN_NONE,
+	SOCK_NETNS_REFCNT_KERN_ANY,
+};
+
 int sock_wake_async(struct socket_wq *sk_wq, int how, int band);
 int sock_register(const struct net_proto_family *fam);
 void sock_unregister(int family);
@@ -255,6 +269,7 @@ int __sock_create(struct net *net, int family, int type, int proto,
 		  struct socket **res, int kern);
 int sock_create(int family, int type, int proto, struct socket **res);
 int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
+int sock_create_netns(struct net *net, int family, int type, int protocol, struct socket **res);
 int sock_create_lite(int family, int type, int proto, struct socket **res);
 struct socket *sock_alloc(void);
 void sock_release(struct socket *sock);
diff --git a/net/core/sock.c b/net/core/sock.c
index e54449c9ab0b..1b987d47e4d8 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2244,7 +2244,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
  *	@family: protocol family
  *	@priority: for allocation (%GFP_KERNEL, %GFP_ATOMIC, etc)
  *	@prot: struct proto associated with this new sock instance
- *	@kern: is this to be a kernel socket?
+ *	@kern: hint for netns refcounting (%SOCK_NETNS_REFCNT_USER, ...)
  */
 struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		      struct proto *prot, int kern)
@@ -2259,13 +2259,15 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		 * why we need sk_prot_creator -acme
 		 */
 		sk->sk_prot = sk->sk_prot_creator = prot;
-		sk->sk_kern_sock = kern;
+		sk->sk_kern_sock = !!kern;
 		sock_lock_init(sk);
-		sk->sk_net_refcnt = kern ? 0 : 1;
-		if (likely(sk->sk_net_refcnt)) {
+		if (likely(kern == SOCK_NETNS_REFCNT_USER) ||
+		    (kern == SOCK_NETNS_REFCNT_KERN_ANY && !net_eq(net, &init_net))) {
+			sk->sk_net_refcnt = 1;
 			get_net_track(net, &sk->ns_tracker, priority);
 			sock_inuse_add(net, 1);
 		} else {
+			sk->sk_net_refcnt = 0;
 			net_passive_inc(net);
 			__netns_tracker_alloc(net, &sk->ns_tracker,
 					      false, priority);
diff --git a/net/socket.c b/net/socket.c
index 9a0e720f0859..9cce213b3fc2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1459,11 +1459,12 @@ EXPORT_SYMBOL(sock_wake_async);
  *	@type: communication type (SOCK_STREAM, ...)
  *	@protocol: protocol (0, ...)
  *	@res: new socket
- *	@kern: boolean for kernel space sockets
+ *	@kern: hint for netns refcounting (%SOCK_NETNS_REFCNT_USER, ...)
  *
  *	Creates a new socket and assigns it to @res, passing through LSM.
  *	Returns 0 or an error. On failure @res is set to %NULL. @kern must
- *	be set to true if the socket resides in kernel space.
+ *	be set to %SOCK_NETNS_REFCNT_* -- handled as boolean in most places,
+ *	effectively handled only in sk_alloc().
  *	This function internally uses GFP_KERNEL.
  */
 
@@ -1609,6 +1610,7 @@ EXPORT_SYMBOL(sock_create);
  *	@res: new socket
  *
  *	A wrapper around __sock_create().
+ *	Created socket will not hold a reference on @net.
  *	Returns 0 or an error. This function internally uses GFP_KERNEL.
  */
 
@@ -1618,6 +1620,27 @@ int sock_create_kern(struct net *net, int family, int type, int protocol, struct
 }
 EXPORT_SYMBOL(sock_create_kern);
 
+/**
+ *	sock_create_netns - creates a socket (kernel space)
+ *	@net: net namespace
+ *	@family: protocol family (AF_INET, ...)
+ *	@type: communication type (SOCK_STREAM, ...)
+ *	@protocol: protocol (0, ...)
+ *	@res: new socket
+ *
+ *	A wrapper around __sock_create().
+ *	If @net == %init_net (checked in sk_alloc), created socket will
+ *	not hold a reference on @net (i.e. same as sock_create_kern).
+ *	Otherwise, created socket will hold a reference on @net.
+ *	Returns 0 or an error. This function internally uses GFP_KERNEL.
+ */
+
+int sock_create_netns(struct net *net, int family, int type, int protocol, struct socket **res)
+{
+	return __sock_create(net, family, type, protocol, res, SOCK_NETNS_REFCNT_KERN_ANY);
+}
+EXPORT_SYMBOL(sock_create_netns);
+
 static struct socket *__sys_socket_create(int family, int type, int protocol)
 {
 	struct socket *sock;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH 0/1] net: socket: centralize netns refcounting
  2025-05-16 22:09 [RFC PATCH 0/1] net: socket: centralize netns refcounting Enzo Matsumiya
  2025-05-16 22:09 ` [RFC PATCH 1/1] net: socket: hint netns refcounting through @kern arg Enzo Matsumiya
@ 2025-05-17  2:51 ` Kuniyuki Iwashima
  1 sibling, 0 replies; 3+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-17  2:51 UTC (permalink / raw)
  To: ematsumiya
  Cc: davem, edumazet, horms, kuba, kuniyu, linux-kernel, netdev,
	pabeni, willemb

From: Enzo Matsumiya <ematsumiya@suse.de>
Date: Fri, 16 May 2025 19:09:18 -0300
> Hi,
> 
> I came up with this patch to centralize netns refcounting on kernel sockets,
> because `sk_net_refcnt = !kern' is not enough anymore.
> 
> The idea is simply to remove the responsibility of a module outside of net/
> to have to deal with sockets internals (cf. sk_net_refcnt_upgrade()).
> 
> It adds an anonymous enum (just for named values) SOCK_NETNS_REFCNT_* that
> can be passed to __sock_create() and sk_alloc() through the @kern arg.
> (this was much easier and shorter than e.g. adding another arg)
> 
> A sock_create_netns() wrapper is added, for callers who need such refcounting
> (e.g. current callers of sk_net_refcnt_upgrade()).
> 
> And then, the core change is quite simple in sk_alloc() -- sk_net_refcnt is
> set only if it's a user socket, or
> (@kern == SOCK_NETNS_REFCNT_KERN_ANY && @net != inet_net).
> 
> I have the patches that modifies current users of sk_net_refcnt_upgrade() to
> create their sockets with sock_create_netns(), if anyone wants to test or
> this gets merged.
> 
> I could confirm this works only on cifs, though, by using Kuniyuki's reproducer
> in [0], which is quite reliable.  Unfortunately, I don't know enough about the
> other modules and/or how to trigger this same bug on those, but I'll be happy
> to test it if I can get instructions.
> 

I was waiting for b013b817f32f to land on net-next to respin
this series [0] as propised in [1].

[0]: https://lore.kernel.org/netdev/20241213092152.14057-1-kuniyu@amazon.com/#t
[1]: https://lore.kernel.org/lkml/20250409190031.38942-1-kuniyu@amazon.com/

So, let me respin it.

FWIW, I posted the most identical patch before the series.
https://lore.kernel.org/netdev/20240227011041.97375-4-kuniyu@amazon.com/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-17  2:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-16 22:09 [RFC PATCH 0/1] net: socket: centralize netns refcounting Enzo Matsumiya
2025-05-16 22:09 ` [RFC PATCH 1/1] net: socket: hint netns refcounting through @kern arg Enzo Matsumiya
2025-05-17  2:51 ` [RFC PATCH 0/1] net: socket: centralize netns refcounting Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).