netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends.
@ 2024-12-13  9:21 Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 01/15] socket: Un-export __sock_create() Kuniyuki Iwashima
                   ` (14 more replies)
  0 siblings, 15 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

There are a bunch of weird usages of sock_create() and friends due
to poor documentation.

  1) some subsystems use __sock_create(), but all of them can be
     replaced with sock_create_kern()

  2) some subsystems use sock_create(), but most of the sockets are
     not tied to userspace processes nor exposed via file descriptors
     but are (most likely unintentionally) exposed to some BPF hooks
     (infiniband, ISDN, NVMe over TCP, iscsi, Xen PV call, ocfs2, smbd)

  3) some subsystems use sock_create_kern() and convert the sockets
     to hold netns refcnt (cifs, mptcp, rds, smc, and sunrpc)

The primary goal is to sort out such confusion and provide enough
documentation for future developers to choose an appropriate API.

Regarding 3), we introduce a new API, sock_create_net(), that holds
a netns refcnt for kernel socket to remove the socket conversion to
avoid use-after-free triggered by TCP kernel socket after commit
26abe14379f8 ("net: Modify sk_alloc to not reference count the netns
of kernel sockets.").

Finally, we rename sock_create() and sock_create_kern() to
sock_create_user() and sock_create_net_noref(), respectively.
This intentionally breaks out-of-tree drivers to give the owners
a chance to choose an appropriate API.

Throughout the series, we follow the definition below:

  userspace socket:
    * created by sock_create_user()
    * holds the reference count of the network namespace
    * directly linked to a file descriptor
      * currently all sockets created by sane sock_create() users
        are tied to userspace process and exposed via file descriptors
    * accessed via a file descriptor (and some BPF hooks except
      for BPF LSM)

  kernel socket
    * created by sock_create_net() or sock_create_net_noref()
      * the former holds the refcnt of netns, but the latter doesn't
    * not directly exposed to userspace via a file descriptor nor BPF
      except for BPF LSM

Note that __sock_create(kern=1) skips some LSMs (SELinux, AppArmor)
but not all; BPF LSM can enforce security regardless of the argument.

Since this refactoring is huge, there will be a concern that
the series could make the future backport difficult.  However,
socket() / accept() / sk_alloc() paths are unlikely to have many
bugs and backports.  For example, net/socket.c has few backports
and only 631083143315 touches __sock_create() in 6.1 and 6.6.

  $ for v in 6.12 6.6 6.1 5.15 5.10 5.4; \
  do \
    echo "$v : $(git log --oneline stable/linux-$v.y...v$v -- net/socket.c | wc -l)"; \
  done
  6.12 : 0
  6.6 : 7
  6.1 : 13
  5.15 : 8
  5.10 : 13
  5.4 : 13


Changes:
  v3:
    * Drop /proc/net/sockstat patch
    * Add a patch to make sock_inuse_add() static

  v2: https://lore.kernel.org/netdev/20241210073829.62520-1-kuniyu@amazon.com/
    * Patch 8
      * Fix build error for PF_IUCV
    * Patch 12
      * Collect Acked-by from MPTCP/RDS maintainers

  v1: https://lore.kernel.org/netdev/20241206075504.24153-1-kuniyu@amazon.com/


Kuniyuki Iwashima (15):
  socket: Un-export __sock_create().
  socket: Pass hold_net flag to __sock_create().
  smc: Pass kern to smc_sock_alloc().
  socket: Pass hold_net to struct net_proto_family.create().
  ppp: Pass hold_net to struct pppox_proto.create().
  nfc: Pass hold_net to struct nfc_protocol.create().
  socket: Add hold_net flag to struct proto_accept_arg.
  socket: Pass hold_net to sk_alloc().
  socket: Respect hold_net in sk_alloc().
  socket: Introduce sock_create_net().
  socket: Remove kernel socket conversion.
  socket: Move sock_inuse_add() to sock.c.
  socket: Use sock_create_net() instead of sock_create().
  socket: Rename sock_create() to sock_create_user().
  socket: Rename sock_create_kern() to sock_create_net_noref().

 crypto/af_alg.c                               |   7 +-
 drivers/block/drbd/drbd_receiver.c            |  12 +-
 drivers/infiniband/hw/erdma/erdma_cm.c        |   6 +-
 drivers/infiniband/sw/rxe/rxe_qp.c            |   2 +-
 drivers/infiniband/sw/siw/siw_cm.c            |   6 +-
 drivers/isdn/mISDN/l1oip_core.c               |   3 +-
 drivers/isdn/mISDN/socket.c                   |  17 +-
 drivers/net/ppp/pppoe.c                       |   5 +-
 drivers/net/ppp/pppox.c                       |   4 +-
 drivers/net/ppp/pptp.c                        |   5 +-
 drivers/net/tap.c                             |   2 +-
 drivers/net/tun.c                             |   2 +-
 drivers/nvme/host/tcp.c                       |   5 +-
 drivers/nvme/target/tcp.c                     |   5 +-
 drivers/soc/qcom/qmi_interface.c              |   4 +-
 drivers/target/iscsi/iscsi_target_login.c     |   7 +-
 drivers/xen/pvcalls-back.c                    |   7 +-
 drivers/xen/pvcalls-front.c                   |   3 +-
 fs/afs/rxrpc.c                                |   3 +-
 fs/dlm/lowcomms.c                             |   8 +-
 fs/ocfs2/cluster/tcp.c                        |  10 +-
 fs/smb/client/connect.c                       |  13 +-
 fs/smb/server/transport_tcp.c                 |   7 +-
 include/linux/if_pppox.h                      |   3 +-
 include/linux/net.h                           |  11 +-
 include/net/bluetooth/bluetooth.h             |   3 +-
 include/net/llc_conn.h                        |   2 +-
 include/net/sctp/structs.h                    |   2 +-
 include/net/sock.h                            |  12 +-
 io_uring/net.c                                |   2 +
 net/9p/trans_fd.c                             |   8 +-
 net/appletalk/ddp.c                           |   4 +-
 net/atm/common.c                              |   5 +-
 net/atm/common.h                              |   3 +-
 net/atm/pvc.c                                 |   4 +-
 net/atm/svc.c                                 |   8 +-
 net/ax25/af_ax25.c                            |   7 +-
 net/bluetooth/af_bluetooth.c                  |   9 +-
 net/bluetooth/bnep/sock.c                     |   5 +-
 net/bluetooth/cmtp/sock.c                     |   4 +-
 net/bluetooth/hci_sock.c                      |   4 +-
 net/bluetooth/hidp/sock.c                     |   5 +-
 net/bluetooth/iso.c                           |  11 +-
 net/bluetooth/l2cap_sock.c                    |  14 +-
 net/bluetooth/rfcomm/core.c                   |   3 +-
 net/bluetooth/rfcomm/sock.c                   |  12 +-
 net/bluetooth/sco.c                           |  11 +-
 net/bpf/test_run.c                            |   2 +-
 net/caif/caif_socket.c                        |   4 +-
 net/can/af_can.c                              |   4 +-
 net/ceph/messenger.c                          |   6 +-
 net/core/sock.c                               |  19 ++-
 net/handshake/handshake-test.c                |  33 ++--
 net/ieee802154/socket.c                       |   4 +-
 net/ipv4/af_inet.c                            |   7 +-
 net/ipv4/udp_tunnel_core.c                    |   2 +-
 net/ipv6/af_inet6.c                           |   4 +-
 net/ipv6/ip6_udp_tunnel.c                     |   4 +-
 net/iucv/af_iucv.c                            |  13 +-
 net/kcm/kcmsock.c                             |   6 +-
 net/key/af_key.c                              |   4 +-
 net/l2tp/l2tp_core.c                          |   8 +-
 net/l2tp/l2tp_ppp.c                           |   6 +-
 net/llc/af_llc.c                              |   6 +-
 net/llc/llc_conn.c                            |  11 +-
 net/mctp/af_mctp.c                            |   4 +-
 net/mctp/test/route-test.c                    |   6 +-
 net/mptcp/pm_netlink.c                        |   4 +-
 net/mptcp/subflow.c                           |  12 +-
 net/netfilter/ipvs/ip_vs_sync.c               |   8 +-
 net/netlink/af_netlink.c                      |  11 +-
 net/netrom/af_netrom.c                        |   7 +-
 net/nfc/af_nfc.c                              |   5 +-
 net/nfc/llcp.h                                |   3 +-
 net/nfc/llcp_core.c                           |   3 +-
 net/nfc/llcp_sock.c                           |  10 +-
 net/nfc/nfc.h                                 |   3 +-
 net/nfc/rawsock.c                             |   5 +-
 net/packet/af_packet.c                        |   4 +-
 net/phonet/af_phonet.c                        |   4 +-
 net/phonet/pep.c                              |   2 +-
 net/qrtr/af_qrtr.c                            |   4 +-
 net/qrtr/ns.c                                 |   6 +-
 net/rds/af_rds.c                              |   4 +-
 net/rds/tcp.c                                 |  14 --
 net/rds/tcp_connect.c                         |  21 ++-
 net/rds/tcp_listen.c                          |  17 +-
 net/rose/af_rose.c                            |  11 +-
 net/rxrpc/af_rxrpc.c                          |   4 +-
 net/rxrpc/rxperf.c                            |   4 +-
 net/sctp/ipv6.c                               |   7 +-
 net/sctp/protocol.c                           |   7 +-
 net/sctp/socket.c                             |   6 +-
 net/smc/af_smc.c                              |  38 ++---
 net/smc/smc_inet.c                            |   2 +-
 net/socket.c                                  | 145 +++++++++++++-----
 net/sunrpc/clnt.c                             |   4 +-
 net/sunrpc/svcsock.c                          |  12 +-
 net/sunrpc/xprtsock.c                         |  16 +-
 net/tipc/socket.c                             |   8 +-
 net/tipc/topsrv.c                             |   4 +-
 net/unix/af_unix.c                            |  17 +-
 net/vmw_vsock/af_vsock.c                      |  10 +-
 net/wireless/nl80211.c                        |   4 +-
 net/x25/af_x25.c                              |  13 +-
 net/xdp/xsk.c                                 |   4 +-
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |   4 +-
 107 files changed, 512 insertions(+), 403 deletions(-)

-- 
2.39.5 (Apple Git-154)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 01/15] socket: Un-export __sock_create().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 02/15] socket: Pass hold_net flag to __sock_create() Kuniyuki Iwashima
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Since commit eeb1bd5c40ed ("net: Add a struct net parameter to
sock_create_kern"), we no longer need to export __sock_create()
and can replace all non-core users with sock_create_kern().

Let's convert them and un-export __sock_create().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 fs/smb/client/connect.c        |  4 ++--
 include/linux/net.h            |  2 --
 net/9p/trans_fd.c              |  8 ++++----
 net/handshake/handshake-test.c | 32 ++++++++++++++------------------
 net/socket.c                   |  3 +--
 net/sunrpc/clnt.c              |  4 ++--
 net/sunrpc/svcsock.c           |  2 +-
 net/sunrpc/xprtsock.c          |  6 +++---
 net/wireless/nl80211.c         |  4 ++--
 9 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 2372538a1211..c36c1b4ffe6e 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3133,8 +3133,8 @@ generic_ip_connect(struct TCP_Server_Info *server)
 		struct net *net = cifs_net_ns(server);
 		struct sock *sk;
 
-		rc = __sock_create(net, sfamily, SOCK_STREAM,
-				   IPPROTO_TCP, &server->ssocket, 1);
+		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
+				      IPPROTO_TCP, &server->ssocket);
 		if (rc < 0) {
 			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
 			return rc;
diff --git a/include/linux/net.h b/include/linux/net.h
index b75bc534c1b3..68ac97e301be 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -251,8 +251,6 @@ int sock_wake_async(struct socket_wq *sk_wq, int how, int band);
 int sock_register(const struct net_proto_family *fam);
 void sock_unregister(int family);
 bool sock_is_registered(int family);
-int __sock_create(struct net *net, int family, int type, int proto,
-		  struct socket **res, int kern);
 int sock_create(int family, int type, int proto, struct socket **res);
 int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
 int sock_create_lite(int family, int type, int proto, struct socket **res);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138..83f81da24727 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1011,8 +1011,8 @@ p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args)
 	sin_server.sin_family = AF_INET;
 	sin_server.sin_addr.s_addr = in_aton(addr);
 	sin_server.sin_port = htons(opts.port);
-	err = __sock_create(current->nsproxy->net_ns, PF_INET,
-			    SOCK_STREAM, IPPROTO_TCP, &csocket, 1);
+	err = sock_create_kern(current->nsproxy->net_ns, PF_INET,
+			       SOCK_STREAM, IPPROTO_TCP, &csocket);
 	if (err) {
 		pr_err("%s (%d): problem creating socket\n",
 		       __func__, task_pid_nr(current));
@@ -1062,8 +1062,8 @@ p9_fd_create_unix(struct p9_client *client, const char *addr, char *args)
 
 	sun_server.sun_family = PF_UNIX;
 	strcpy(sun_server.sun_path, addr);
-	err = __sock_create(current->nsproxy->net_ns, PF_UNIX,
-			    SOCK_STREAM, 0, &csocket, 1);
+	err = sock_create_kern(current->nsproxy->net_ns, PF_UNIX,
+			       SOCK_STREAM, 0, &csocket);
 	if (err < 0) {
 		pr_err("%s (%d): problem creating socket\n",
 		       __func__, task_pid_nr(current));
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 55442b2f518a..4f300504f3e5 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -143,14 +143,18 @@ static void handshake_req_alloc_case(struct kunit *test)
 	kfree(result);
 }
 
+static int handshake_sock_create(struct socket **sock)
+{
+	return sock_create_kern(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, sock);
+}
+
 static void handshake_req_submit_test1(struct kunit *test)
 {
 	struct socket *sock;
 	int err, result;
 
 	/* Arrange */
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	/* Act */
@@ -190,8 +194,7 @@ static void handshake_req_submit_test3(struct kunit *test)
 	req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	sock->file = NULL;
 
@@ -216,8 +219,7 @@ static void handshake_req_submit_test4(struct kunit *test)
 	req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
@@ -251,8 +253,7 @@ static void handshake_req_submit_test5(struct kunit *test)
 	req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
@@ -289,8 +290,7 @@ static void handshake_req_submit_test6(struct kunit *test)
 	req2 = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req2);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
@@ -321,8 +321,7 @@ static void handshake_req_cancel_test1(struct kunit *test)
 	req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
@@ -357,8 +356,7 @@ static void handshake_req_cancel_test2(struct kunit *test)
 	req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
@@ -399,8 +397,7 @@ static void handshake_req_cancel_test3(struct kunit *test)
 	req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
@@ -457,8 +454,7 @@ static void handshake_req_destroy_test1(struct kunit *test)
 	req = handshake_req_alloc(&handshake_req_alloc_proto_destroy, GFP_KERNEL);
 	KUNIT_ASSERT_NOT_NULL(test, req);
 
-	err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
-			    &sock, 1);
+	err = handshake_sock_create(&sock);
 	KUNIT_ASSERT_EQ(test, err, 0);
 
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
diff --git a/net/socket.c b/net/socket.c
index 9a117248f18f..433f346ffc64 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1484,7 +1484,7 @@ EXPORT_SYMBOL(sock_wake_async);
  *	This function internally uses GFP_KERNEL.
  */
 
-int __sock_create(struct net *net, int family, int type, int protocol,
+static int __sock_create(struct net *net, int family, int type, int protocol,
 			 struct socket **res, int kern)
 {
 	int err;
@@ -1598,7 +1598,6 @@ int __sock_create(struct net *net, int family, int type, int protocol,
 	rcu_read_unlock();
 	goto out_sock_release;
 }
-EXPORT_SYMBOL(__sock_create);
 
 /**
  *	sock_create - creates a socket
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0090162ee8c3..37935082d799 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1450,8 +1450,8 @@ static int rpc_sockname(struct net *net, struct sockaddr *sap, size_t salen,
 	struct socket *sock;
 	int err;
 
-	err = __sock_create(net, sap->sa_family,
-				SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+	err = sock_create_kern(net, sap->sa_family,
+			       SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (err < 0) {
 		dprintk("RPC:       can't create UDP socket (%d)\n", err);
 		goto out;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 95397677673b..9583bad3d150 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1526,7 +1526,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
 		return ERR_PTR(-EINVAL);
 	}
 
-	error = __sock_create(net, family, type, protocol, &sock, 1);
+	error = sock_create_kern(net, family, type, protocol, &sock);
 	if (error < 0)
 		return ERR_PTR(error);
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index c60936d8cef7..feb1768e8a57 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1924,7 +1924,7 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
 	struct socket *sock;
 	int err;
 
-	err = __sock_create(xprt->xprt_net, family, type, protocol, &sock, 1);
+	err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
 	if (err < 0) {
 		dprintk("RPC:       can't create %d transport socket (%d).\n",
 				protocol, -err);
@@ -2003,8 +2003,8 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
 	struct socket *sock;
 	int status;
 
-	status = __sock_create(xprt->xprt_net, AF_LOCAL,
-					SOCK_STREAM, 0, &sock, 1);
+	status = sock_create_kern(xprt->xprt_net, AF_LOCAL,
+				  SOCK_STREAM, 0, &sock);
 	if (status < 0) {
 		dprintk("RPC:       can't create AF_LOCAL "
 			"transport socket (%d).\n", -status);
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index dd84fc54fb9b..27c58fd260e0 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13689,8 +13689,8 @@ static int nl80211_parse_wowlan_tcp(struct cfg80211_registered_device *rdev,
 	port = nla_get_u16_default(tb[NL80211_WOWLAN_TCP_SRC_PORT], 0);
 #ifdef CONFIG_INET
 	/* allocate a socket and port for it and use it */
-	err = __sock_create(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
-			    IPPROTO_TCP, &cfg->sock, 1);
+	err = sock_create_kern(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
+			       IPPROTO_TCP, &cfg->sock);
 	if (err) {
 		kfree(cfg);
 		return err;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 02/15] socket: Pass hold_net flag to __sock_create().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 01/15] socket: Un-export __sock_create() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 03/15] smc: Pass kern to smc_sock_alloc() Kuniyuki Iwashima
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will introduce a new API to create a kernel socket with netns
refcnt held.

As a prep, let's add a new hold_net argument to __sock_create().

Note that we still do not pass it down to pf->create() for ease
of review; otherwise, this change will be buried in the huge diff.

Another option would be to override the kern parameter, which is
int, but I chose to change parameters for the following two reasons:

  1) Compilers allow us to efficiently make sure that all paths pass
     the parameters down to sk_alloc() as is.

  2) The parameter change breaks out-of-tree drivers, allowing the
     owners to choose an appropriate API.

Regarding 1), there actually was a weird path in smc_ulp_init()
that will be fixed up in the following patch.

While at it, the kernel-doc is fixed up to render the DESCRIPTION
part correctly.

  scripts/kernel-doc -man net/socket.c | scripts/split-man.pl /tmp/man
  man /tmp/man/__sock_create.9

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/socket.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 433f346ffc64..e5b4e0d34132 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1470,22 +1470,28 @@ int sock_wake_async(struct socket_wq *wq, int how, int band)
 EXPORT_SYMBOL(sock_wake_async);
 
 /**
- *	__sock_create - creates a socket
- *	@net: net namespace
- *	@family: protocol family (AF_INET, ...)
- *	@type: communication type (SOCK_STREAM, ...)
- *	@protocol: protocol (0, ...)
- *	@res: new socket
- *	@kern: boolean for kernel space sockets
+ * __sock_create - creates a socket
  *
- *	Creates a new socket and assigns it to @res, passing through LSM.
- *	Returns 0 or an error. On failure @res is set to %NULL. @kern must
- *	be set to true if the socket resides in kernel space.
- *	This function internally uses GFP_KERNEL.
+ * @net: net namespace
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ * @kern: boolean for kernel space sockets
+ * @hold_net: boolean for netns refcnt
+ *
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ *
+ * @kern must be set to true if userspace cannot touch it via a file
+ * descriptor nor BPF hooks except for LSM.  If @hold_net is false,
+ * the caller must ensure that the socket is always freed before @net.
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error. On failure @res is set to %NULL.
  */
 
 static int __sock_create(struct net *net, int family, int type, int protocol,
-			 struct socket **res, int kern)
+			 struct socket **res, bool kern, bool hold_net)
 {
 	int err;
 	struct socket *sock;
@@ -1612,7 +1618,8 @@ static int __sock_create(struct net *net, int family, int type, int protocol,
 
 int sock_create(int family, int type, int protocol, struct socket **res)
 {
-	return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
+	return __sock_create(current->nsproxy->net_ns, family, type, protocol,
+			     res, false, true);
 }
 EXPORT_SYMBOL(sock_create);
 
@@ -1628,9 +1635,10 @@ EXPORT_SYMBOL(sock_create);
  *	Returns 0 or an error. This function internally uses GFP_KERNEL.
  */
 
-int sock_create_kern(struct net *net, int family, int type, int protocol, struct socket **res)
+int sock_create_kern(struct net *net, int family, int type, int protocol,
+		     struct socket **res)
 {
-	return __sock_create(net, family, type, protocol, res, 1);
+	return __sock_create(net, family, type, protocol, res, true, false);
 }
 EXPORT_SYMBOL(sock_create_kern);
 
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 03/15] smc: Pass kern to smc_sock_alloc().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 01/15] socket: Un-export __sock_create() Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 02/15] socket: Pass hold_net flag to __sock_create() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13 13:46   ` Wenjia Zhang
  2024-12-13  9:21 ` [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create() Kuniyuki Iwashima
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

AF_SMC was introduced in commit ac7138746e14 ("smc: establish
new socket family").

Since then, smc_create() ignores the kern argument and calls
smc_sock_alloc(), which calls sk_alloc() with hard-coded arguments.

  sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0);

This means sock_create_kern(AF_SMC) always creates a userspace
socket.

Later, commit d7cd421da9da ("net/smc: Introduce TCP ULP support")
added another confusing call site.

smc_ulp_init() calls __smc_create() with kern=1, but again,
smc_sock_alloc() allocates a userspace socket by calling
sk_alloc() with kern=0.

To fix up the weird paths, let's pass kern down to smc_sock_alloc()
and sk_alloc().

This commit does not introduce functional change because we have
no in-tree users calling sock_create_kern(AF_SMC) and we change
kern from 1 to 0 in smc_ulp_init().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/smc/af_smc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 19ebff1c2579..b52bee98a3eb 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -387,13 +387,13 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol)
 }
 
 static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
-				   int protocol)
+				   int protocol, int kern)
 {
 	struct proto *prot;
 	struct sock *sk;
 
 	prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto;
-	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0);
+	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, kern);
 	if (!sk)
 		return NULL;
 
@@ -1715,7 +1715,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
 	int rc = -EINVAL;
 
 	release_sock(lsk);
-	new_sk = smc_sock_alloc(sock_net(lsk), NULL, lsk->sk_protocol);
+	new_sk = smc_sock_alloc(sock_net(lsk), NULL, lsk->sk_protocol, 0);
 	if (!new_sk) {
 		rc = -ENOMEM;
 		lsk->sk_err = ENOMEM;
@@ -3349,7 +3349,7 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
 	rc = -ENOBUFS;
 	sock->ops = &smc_sock_ops;
 	sock->state = SS_UNCONNECTED;
-	sk = smc_sock_alloc(net, sock, protocol);
+	sk = smc_sock_alloc(net, sock, protocol, kern);
 	if (!sk)
 		goto out;
 
@@ -3408,7 +3408,7 @@ static int smc_ulp_init(struct sock *sk)
 
 	smcsock->type = SOCK_STREAM;
 	__module_get(THIS_MODULE); /* tried in __tcp_ulp_find_autoload */
-	ret = __smc_create(net, smcsock, protocol, 1, tcp);
+	ret = __smc_create(net, smcsock, protocol, 0, tcp);
 	if (ret) {
 		sock_release(smcsock); /* module_put() which ops won't be NULL */
 		return ret;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (2 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 03/15] smc: Pass kern to smc_sock_alloc() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13 13:46   ` Wenjia Zhang
  2024-12-17 10:24   ` Paolo Abeni
  2024-12-13  9:21 ` [PATCH v3 net-next 05/15] ppp: Pass hold_net to struct pppox_proto.create() Kuniyuki Iwashima
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will introduce a new API to create a kernel socket with netns refcnt
held.  Then, sk_alloc() needs the hold_net flag passed to __sock_create().

Let's pass it down to net_proto_family.create() and functions that call
sk_alloc().

While at it, we convert the kern flag to boolean.

Note that we still need to pass hold_net to struct pppox_proto.create()
and struct nfc_protocol.create() before passing hold_net to sk_alloc().

Also, we use !kern as hold_net in the accept() paths.  We will add the
hold_net flag to struct proto_accept_arg later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 crypto/af_alg.c                   |  2 +-
 drivers/isdn/mISDN/socket.c       | 13 ++++++++-----
 drivers/net/ppp/pppox.c           |  2 +-
 include/linux/net.h               |  2 +-
 include/net/bluetooth/bluetooth.h |  3 ++-
 include/net/llc_conn.h            |  2 +-
 net/appletalk/ddp.c               |  2 +-
 net/atm/common.c                  |  3 ++-
 net/atm/common.h                  |  3 ++-
 net/atm/pvc.c                     |  4 ++--
 net/atm/svc.c                     |  8 ++++----
 net/ax25/af_ax25.c                |  2 +-
 net/bluetooth/af_bluetooth.c      |  7 ++++---
 net/bluetooth/bnep/sock.c         |  5 +++--
 net/bluetooth/cmtp/sock.c         |  2 +-
 net/bluetooth/hci_sock.c          |  4 ++--
 net/bluetooth/hidp/sock.c         |  5 +++--
 net/bluetooth/iso.c               | 11 ++++++-----
 net/bluetooth/l2cap_sock.c        | 14 ++++++++------
 net/bluetooth/rfcomm/sock.c       | 12 +++++++-----
 net/bluetooth/sco.c               | 11 ++++++-----
 net/caif/caif_socket.c            |  2 +-
 net/can/af_can.c                  |  2 +-
 net/ieee802154/socket.c           |  2 +-
 net/ipv4/af_inet.c                |  2 +-
 net/ipv6/af_inet6.c               |  2 +-
 net/iucv/af_iucv.c                | 11 ++++++-----
 net/kcm/kcmsock.c                 |  2 +-
 net/key/af_key.c                  |  2 +-
 net/llc/af_llc.c                  |  6 ++++--
 net/llc/llc_conn.c                |  9 ++++++---
 net/mctp/af_mctp.c                |  2 +-
 net/netlink/af_netlink.c          |  8 ++++----
 net/netrom/af_netrom.c            |  2 +-
 net/nfc/af_nfc.c                  |  2 +-
 net/packet/af_packet.c            |  2 +-
 net/phonet/af_phonet.c            |  2 +-
 net/qrtr/af_qrtr.c                |  2 +-
 net/rds/af_rds.c                  |  2 +-
 net/rose/af_rose.c                |  2 +-
 net/rxrpc/af_rxrpc.c              |  2 +-
 net/smc/af_smc.c                  | 15 ++++++++-------
 net/socket.c                      |  2 +-
 net/tipc/socket.c                 |  6 ++++--
 net/unix/af_unix.c                |  9 +++++----
 net/vmw_vsock/af_vsock.c          |  8 ++++----
 net/x25/af_x25.c                  | 13 ++++++++-----
 net/xdp/xsk.c                     |  2 +-
 48 files changed, 133 insertions(+), 105 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 0da7c1ac778a..e60032b94d97 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -503,7 +503,7 @@ static void alg_sock_destruct(struct sock *sk)
 }
 
 static int alg_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
 	struct sock *sk;
 	int err;
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index b215b28cad7b..54157c24ccb9 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -590,7 +590,8 @@ static const struct proto_ops data_sock_ops = {
 };
 
 static int
-data_sock_create(struct net *net, struct socket *sock, int protocol, int kern)
+data_sock_create(struct net *net, struct socket *sock, int protocol,
+		 bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -746,7 +747,8 @@ static const struct proto_ops base_sock_ops = {
 
 
 static int
-base_sock_create(struct net *net, struct socket *sock, int protocol, int kern)
+base_sock_create(struct net *net, struct socket *sock, int protocol,
+		 bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -771,13 +773,14 @@ base_sock_create(struct net *net, struct socket *sock, int protocol, int kern)
 }
 
 static int
-mISDN_sock_create(struct net *net, struct socket *sock, int proto, int kern)
+mISDN_sock_create(struct net *net, struct socket *sock, int proto,
+		  bool kern, bool hold_net)
 {
 	int err = -EPROTONOSUPPORT;
 
 	switch (proto) {
 	case ISDN_P_BASE:
-		err = base_sock_create(net, sock, proto, kern);
+		err = base_sock_create(net, sock, proto, kern, hold_net);
 		break;
 	case ISDN_P_TE_S0:
 	case ISDN_P_NT_S0:
@@ -791,7 +794,7 @@ mISDN_sock_create(struct net *net, struct socket *sock, int proto, int kern)
 	case ISDN_P_B_L2DTMF:
 	case ISDN_P_B_L2DSP:
 	case ISDN_P_B_L2DSPHDLC:
-		err = data_sock_create(net, sock, proto, kern);
+		err = data_sock_create(net, sock, proto, kern, hold_net);
 		break;
 	default:
 		return err;
diff --git a/drivers/net/ppp/pppox.c b/drivers/net/ppp/pppox.c
index 08364f10a43f..53b3f790d1f5 100644
--- a/drivers/net/ppp/pppox.c
+++ b/drivers/net/ppp/pppox.c
@@ -112,7 +112,7 @@ EXPORT_SYMBOL(pppox_compat_ioctl);
 #endif
 
 static int pppox_create(struct net *net, struct socket *sock, int protocol,
-			int kern)
+			bool kern, bool hold_net)
 {
 	int rc = -EPROTOTYPE;
 
diff --git a/include/linux/net.h b/include/linux/net.h
index 68ac97e301be..c2a35a102ee2 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -233,7 +233,7 @@ struct proto_ops {
 struct net_proto_family {
 	int		family;
 	int		(*create)(struct net *net, struct socket *sock,
-				  int protocol, int kern);
+				  int protocol, bool kern, bool hold_net);
 	struct module	*owner;
 };
 
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index 435250c72d56..58afa3fd08af 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -406,7 +406,8 @@ void bt_sock_link(struct bt_sock_list *l, struct sock *s);
 void bt_sock_unlink(struct bt_sock_list *l, struct sock *s);
 bool bt_sock_linked(struct bt_sock_list *l, struct sock *s);
 struct sock *bt_sock_alloc(struct net *net, struct socket *sock,
-			   struct proto *prot, int proto, gfp_t prio, int kern);
+			   struct proto *prot, int proto, gfp_t prio,
+			   bool kern, bool hold_net);
 int  bt_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		     int flags);
 int  bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg,
diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h
index 374411b3066c..7d8b928a5ff6 100644
--- a/include/net/llc_conn.h
+++ b/include/net/llc_conn.h
@@ -97,7 +97,7 @@ static __inline__ char llc_backlog_type(struct sk_buff *skb)
 }
 
 struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
-			  struct proto *prot, int kern);
+			  struct proto *prot, bool kern, bool hold_net);
 void llc_sk_stop_all_timers(struct sock *sk, bool sync);
 void llc_sk_free(struct sock *sk);
 
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index b068651984fe..9bd361ccf5f4 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1030,7 +1030,7 @@ static struct proto ddp_proto = {
  * set the state.
  */
 static int atalk_create(struct net *net, struct socket *sock, int protocol,
-			int kern)
+			bool kern, bool hold_net)
 {
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
diff --git a/net/atm/common.c b/net/atm/common.c
index 9b75699992ff..c1e05b0c0b4b 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -137,7 +137,8 @@ static struct proto vcc_proto = {
 	.release_cb = vcc_release_cb,
 };
 
-int vcc_create(struct net *net, struct socket *sock, int protocol, int family, int kern)
+int vcc_create(struct net *net, struct socket *sock, int protocol, int family,
+	       bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct atm_vcc *vcc;
diff --git a/net/atm/common.h b/net/atm/common.h
index a1e56e8de698..410419873eb6 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -11,7 +11,8 @@
 #include <linux/poll.h> /* for poll_table */
 
 
-int vcc_create(struct net *net, struct socket *sock, int protocol, int family, int kern);
+int vcc_create(struct net *net, struct socket *sock, int protocol, int family,
+	       bool kern, bool hold_net);
 int vcc_release(struct socket *sock);
 int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index 66d9a9bd5896..6238c1809481 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -130,13 +130,13 @@ static const struct proto_ops pvc_proto_ops = {
 
 
 static int pvc_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
 	if (net != &init_net)
 		return -EAFNOSUPPORT;
 
 	sock->ops = &pvc_proto_ops;
-	return vcc_create(net, sock, protocol, PF_ATMPVC, kern);
+	return vcc_create(net, sock, protocol, PF_ATMPVC, kern, hold_net);
 }
 
 static const struct net_proto_family pvc_family_ops = {
diff --git a/net/atm/svc.c b/net/atm/svc.c
index f8137ae693b0..9795294f4c1e 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -34,7 +34,7 @@
 #endif
 
 static int svc_create(struct net *net, struct socket *sock, int protocol,
-		      int kern);
+		      bool kern, bool hold_net);
 
 /*
  * Note: since all this is still nicely synchronized with the signaling demon,
@@ -336,7 +336,7 @@ static int svc_accept(struct socket *sock, struct socket *newsock,
 
 	lock_sock(sk);
 
-	error = svc_create(sock_net(sk), newsock, 0, arg->kern);
+	error = svc_create(sock_net(sk), newsock, 0, arg->kern, !arg->kern);
 	if (error)
 		goto out;
 
@@ -658,7 +658,7 @@ static const struct proto_ops svc_proto_ops = {
 
 
 static int svc_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
 	int error;
 
@@ -666,7 +666,7 @@ static int svc_create(struct net *net, struct socket *sock, int protocol,
 		return -EAFNOSUPPORT;
 
 	sock->ops = &svc_proto_ops;
-	error = vcc_create(net, sock, protocol, AF_ATMSVC, kern);
+	error = vcc_create(net, sock, protocol, AF_ATMSVC, kern, hold_net);
 	if (error)
 		return error;
 	ATM_SD(sock)->local.sas_family = AF_ATMSVC;
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index d6f9fae06a9d..6c68b5e5b11c 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -830,7 +830,7 @@ static struct proto ax25_proto = {
 };
 
 static int ax25_create(struct net *net, struct socket *sock, int protocol,
-		       int kern)
+		       bool kern, bool hold_net)
 {
 	struct sock *sk;
 	ax25_cb *ax25;
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 0b4d0a8bd361..7c24a6f87281 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -111,7 +111,7 @@ void bt_sock_unregister(int proto)
 EXPORT_SYMBOL(bt_sock_unregister);
 
 static int bt_sock_create(struct net *net, struct socket *sock, int proto,
-			  int kern)
+			  bool kern, bool hold_net)
 {
 	int err;
 
@@ -129,7 +129,7 @@ static int bt_sock_create(struct net *net, struct socket *sock, int proto,
 	read_lock(&bt_proto_lock);
 
 	if (bt_proto[proto] && try_module_get(bt_proto[proto]->owner)) {
-		err = bt_proto[proto]->create(net, sock, proto, kern);
+		err = bt_proto[proto]->create(net, sock, proto, kern, hold_net);
 		if (!err)
 			bt_sock_reclassify_lock(sock->sk, proto);
 		module_put(bt_proto[proto]->owner);
@@ -141,7 +141,8 @@ static int bt_sock_create(struct net *net, struct socket *sock, int proto,
 }
 
 struct sock *bt_sock_alloc(struct net *net, struct socket *sock,
-			   struct proto *prot, int proto, gfp_t prio, int kern)
+			   struct proto *prot, int proto, gfp_t prio,
+			   bool kern, bool hold_net)
 {
 	struct sock *sk;
 
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index 00d47bcf4d7d..d845cdb0e48b 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -196,7 +196,7 @@ static struct proto bnep_proto = {
 };
 
 static int bnep_sock_create(struct net *net, struct socket *sock, int protocol,
-			    int kern)
+			    bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -205,7 +205,8 @@ static int bnep_sock_create(struct net *net, struct socket *sock, int protocol,
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
 
-	sk = bt_sock_alloc(net, sock, &bnep_proto, protocol, GFP_ATOMIC, kern);
+	sk = bt_sock_alloc(net, sock, &bnep_proto, protocol, GFP_ATOMIC,
+			   kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index 96d49d9fae96..2ea9da9fe1d5 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -198,7 +198,7 @@ static struct proto cmtp_proto = {
 };
 
 static int cmtp_sock_create(struct net *net, struct socket *sock, int protocol,
-			    int kern)
+			    bool kern, bool hold_net)
 {
 	struct sock *sk;
 
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index 022b86797acd..4c51d7ee8a3e 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -2188,7 +2188,7 @@ static struct proto hci_sk_proto = {
 };
 
 static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
-			   int kern)
+			   bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -2200,7 +2200,7 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
 	sock->ops = &hci_sock_ops;
 
 	sk = bt_sock_alloc(net, sock, &hci_sk_proto, protocol, GFP_ATOMIC,
-			   kern);
+			   kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
index c93aaeb3a3fa..0ebe94f39906 100644
--- a/net/bluetooth/hidp/sock.c
+++ b/net/bluetooth/hidp/sock.c
@@ -247,7 +247,7 @@ static struct proto hidp_proto = {
 };
 
 static int hidp_sock_create(struct net *net, struct socket *sock, int protocol,
-			    int kern)
+			    bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -256,7 +256,8 @@ static int hidp_sock_create(struct net *net, struct socket *sock, int protocol,
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
 
-	sk = bt_sock_alloc(net, sock, &hidp_proto, protocol, GFP_ATOMIC, kern);
+	sk = bt_sock_alloc(net, sock, &hidp_proto, protocol, GFP_ATOMIC,
+			   kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
index 43d0ebe11100..9f3529fbadf4 100644
--- a/net/bluetooth/iso.c
+++ b/net/bluetooth/iso.c
@@ -874,11 +874,12 @@ static struct bt_iso_qos default_qos = {
 };
 
 static struct sock *iso_sock_alloc(struct net *net, struct socket *sock,
-				   int proto, gfp_t prio, int kern)
+				   int proto, gfp_t prio,
+				   bool kern, bool hold_net)
 {
 	struct sock *sk;
 
-	sk = bt_sock_alloc(net, sock, &iso_proto, proto, prio, kern);
+	sk = bt_sock_alloc(net, sock, &iso_proto, proto, prio, kern, hold_net);
 	if (!sk)
 		return NULL;
 
@@ -896,7 +897,7 @@ static struct sock *iso_sock_alloc(struct net *net, struct socket *sock,
 }
 
 static int iso_sock_create(struct net *net, struct socket *sock, int protocol,
-			   int kern)
+			   bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -909,7 +910,7 @@ static int iso_sock_create(struct net *net, struct socket *sock, int protocol,
 
 	sock->ops = &iso_sock_ops;
 
-	sk = iso_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
+	sk = iso_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
@@ -1911,7 +1912,7 @@ static void iso_conn_ready(struct iso_conn *conn)
 		lock_sock(parent);
 
 		sk = iso_sock_alloc(sock_net(parent), NULL,
-				    BTPROTO_ISO, GFP_ATOMIC, 0);
+				    BTPROTO_ISO, GFP_ATOMIC, false, true);
 		if (!sk) {
 			release_sock(parent);
 			return;
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 3d2553dcdb1b..04fe3c622210 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -45,7 +45,8 @@ static struct bt_sock_list l2cap_sk_list = {
 static const struct proto_ops l2cap_sock_ops;
 static void l2cap_sock_init(struct sock *sk, struct sock *parent);
 static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock,
-				     int proto, gfp_t prio, int kern);
+				     int proto, gfp_t prio,
+				     bool kern, bool hold_net);
 static void l2cap_sock_cleanup_listen(struct sock *parent);
 
 bool l2cap_is_socket(struct socket *sock)
@@ -1468,7 +1469,7 @@ static struct l2cap_chan *l2cap_sock_new_connection_cb(struct l2cap_chan *chan)
 	}
 
 	sk = l2cap_sock_alloc(sock_net(parent), NULL, BTPROTO_L2CAP,
-			      GFP_ATOMIC, 0);
+			      GFP_ATOMIC, false, true);
 	if (!sk) {
 		release_sock(parent);
 		return NULL;
@@ -1871,12 +1872,13 @@ static struct proto l2cap_proto = {
 };
 
 static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock,
-				     int proto, gfp_t prio, int kern)
+				     int proto, gfp_t prio,
+				     bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct l2cap_chan *chan;
 
-	sk = bt_sock_alloc(net, sock, &l2cap_proto, proto, prio, kern);
+	sk = bt_sock_alloc(net, sock, &l2cap_proto, proto, prio, kern, hold_net);
 	if (!sk)
 		return NULL;
 
@@ -1900,7 +1902,7 @@ static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock,
 }
 
 static int l2cap_sock_create(struct net *net, struct socket *sock, int protocol,
-			     int kern)
+			     bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -1917,7 +1919,7 @@ static int l2cap_sock_create(struct net *net, struct socket *sock, int protocol,
 
 	sock->ops = &l2cap_sock_ops;
 
-	sk = l2cap_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
+	sk = l2cap_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 913402806fa0..b96046914a63 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -269,7 +269,8 @@ static struct proto rfcomm_proto = {
 };
 
 static struct sock *rfcomm_sock_alloc(struct net *net, struct socket *sock,
-				      int proto, gfp_t prio, int kern)
+				      int proto, gfp_t prio,
+				      bool kern, bool hold_net)
 {
 	struct rfcomm_dlc *d;
 	struct sock *sk;
@@ -278,7 +279,7 @@ static struct sock *rfcomm_sock_alloc(struct net *net, struct socket *sock,
 	if (!d)
 		return NULL;
 
-	sk = bt_sock_alloc(net, sock, &rfcomm_proto, proto, prio, kern);
+	sk = bt_sock_alloc(net, sock, &rfcomm_proto, proto, prio, kern, hold_net);
 	if (!sk) {
 		rfcomm_dlc_free(d);
 		return NULL;
@@ -303,7 +304,7 @@ static struct sock *rfcomm_sock_alloc(struct net *net, struct socket *sock,
 }
 
 static int rfcomm_sock_create(struct net *net, struct socket *sock,
-			      int protocol, int kern)
+			      int protocol, bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -316,7 +317,7 @@ static int rfcomm_sock_create(struct net *net, struct socket *sock,
 
 	sock->ops = &rfcomm_sock_ops;
 
-	sk = rfcomm_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
+	sk = rfcomm_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
@@ -952,7 +953,8 @@ int rfcomm_connect_ind(struct rfcomm_session *s, u8 channel, struct rfcomm_dlc *
 		goto done;
 	}
 
-	sk = rfcomm_sock_alloc(sock_net(parent), NULL, BTPROTO_RFCOMM, GFP_ATOMIC, 0);
+	sk = rfcomm_sock_alloc(sock_net(parent), NULL, BTPROTO_RFCOMM, GFP_ATOMIC,
+			       false, true);
 	if (!sk)
 		goto done;
 
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index aa7bfe26cb40..a1865df18d59 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -545,11 +545,12 @@ static struct proto sco_proto = {
 };
 
 static struct sock *sco_sock_alloc(struct net *net, struct socket *sock,
-				   int proto, gfp_t prio, int kern)
+				   int proto, gfp_t prio,
+				   bool kern, bool hold_net)
 {
 	struct sock *sk;
 
-	sk = bt_sock_alloc(net, sock, &sco_proto, proto, prio, kern);
+	sk = bt_sock_alloc(net, sock, &sco_proto, proto, prio, kern, hold_net);
 	if (!sk)
 		return NULL;
 
@@ -567,7 +568,7 @@ static struct sock *sco_sock_alloc(struct net *net, struct socket *sock,
 }
 
 static int sco_sock_create(struct net *net, struct socket *sock, int protocol,
-			   int kern)
+			   bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -580,7 +581,7 @@ static int sco_sock_create(struct net *net, struct socket *sock, int protocol,
 
 	sock->ops = &sco_sock_ops;
 
-	sk = sco_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
+	sk = sco_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
@@ -1341,7 +1342,7 @@ static void sco_conn_ready(struct sco_conn *conn)
 		lock_sock(parent);
 
 		sk = sco_sock_alloc(sock_net(parent), NULL,
-				    BTPROTO_SCO, GFP_ATOMIC, 0);
+				    BTPROTO_SCO, GFP_ATOMIC, false, true);
 		if (!sk) {
 			release_sock(parent);
 			sco_conn_unlock(conn);
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 039dfbd367c9..6eef0e83f442 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1015,7 +1015,7 @@ static void caif_sock_destructor(struct sock *sk)
 }
 
 static int caif_create(struct net *net, struct socket *sock, int protocol,
-		       int kern)
+		       bool kern, bool hold_net)
 {
 	struct sock *sk = NULL;
 	struct caifsock *cf_sk = NULL;
diff --git a/net/can/af_can.c b/net/can/af_can.c
index 01f3fbb3b67d..c4094ccc9978 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -112,7 +112,7 @@ static inline void can_put_proto(const struct can_proto *cp)
 }
 
 static int can_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
 	struct sock *sk;
 	const struct can_proto *cp;
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index 18d267921bb5..0dd1a8829c42 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -999,7 +999,7 @@ static void ieee802154_sock_destruct(struct sock *sk)
  * set the state.
  */
 static int ieee802154_create(struct net *net, struct socket *sock,
-			     int protocol, int kern)
+			     int protocol, bool kern, bool hold_net)
 {
 	struct sock *sk;
 	int rc;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8095e82de808..7313ec410fb5 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -250,7 +250,7 @@ EXPORT_SYMBOL(inet_listen);
  */
 
 static int inet_create(struct net *net, struct socket *sock, int protocol,
-		       int kern)
+		       bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct inet_protosw *answer;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index f60ec8b0f8ea..8f951e5e58ab 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -118,7 +118,7 @@ void inet6_sock_destruct(struct sock *sk)
 EXPORT_SYMBOL_GPL(inet6_sock_destruct);
 
 static int inet6_create(struct net *net, struct socket *sock, int protocol,
-			int kern)
+			bool kern, bool hold_net)
 {
 	struct inet_sock *inet;
 	struct ipv6_pinfo *np;
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 7929df08d4e0..b7bbd4947855 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -446,7 +446,8 @@ static void iucv_sock_init(struct sock *sk, struct sock *parent)
 	}
 }
 
-static struct sock *iucv_sock_alloc(struct socket *sock, int proto, gfp_t prio, int kern)
+static struct sock *iucv_sock_alloc(struct socket *sock, int proto, gfp_t prio,
+				    bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct iucv_sock *iucv;
@@ -1632,7 +1633,7 @@ static int iucv_callback_connreq(struct iucv_path *path,
 	}
 
 	/* Create the new socket */
-	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, 0);
+	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, false, true);
 	if (!nsk) {
 		err = pr_iucv->path_sever(path, user_data);
 		iucv_path_free(path);
@@ -1854,7 +1855,7 @@ static int afiucv_hs_callback_syn(struct sock *sk, struct sk_buff *skb)
 		goto out;
 	}
 
-	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, 0);
+	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, false, true);
 	bh_lock_sock(sk);
 	if ((sk->sk_state != IUCV_LISTEN) ||
 	    sk_acceptq_is_full(sk) ||
@@ -2229,7 +2230,7 @@ static const struct proto_ops iucv_sock_ops = {
 };
 
 static int iucv_sock_create(struct net *net, struct socket *sock, int protocol,
-			    int kern)
+			    bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -2248,7 +2249,7 @@ static int iucv_sock_create(struct net *net, struct socket *sock, int protocol,
 		return -ESOCKTNOSUPPORT;
 	}
 
-	sk = iucv_sock_alloc(sock, protocol, GFP_KERNEL, kern);
+	sk = iucv_sock_alloc(sock, protocol, GFP_KERNEL, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 24aec295a51c..50925046a392 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1778,7 +1778,7 @@ static const struct proto_ops kcm_seqpacket_ops = {
 
 /* Create proto operation for kcm sockets */
 static int kcm_create(struct net *net, struct socket *sock,
-		      int protocol, int kern)
+		      int protocol, bool kern, bool hold_net)
 {
 	struct kcm_net *knet = net_generic(net, kcm_net_id);
 	struct sock *sk;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index c56bb4f451e6..1c35b1cfb1c5 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -136,7 +136,7 @@ static struct proto key_proto = {
 };
 
 static int pfkey_create(struct net *net, struct socket *sock, int protocol,
-			int kern)
+			bool kern, bool hold_net)
 {
 	struct netns_pfkey *net_pfkey = net_generic(net, pfkey_net_id);
 	struct sock *sk;
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 0259cde394ba..5d865f4a5cb4 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -163,13 +163,14 @@ static struct proto llc_proto = {
  *	@sock: Socket to initialize and attach allocated sk to.
  *	@protocol: Unused.
  *	@kern: on behalf of kernel or userspace
+ *	@hold_net: hold netns refcnt or not
  *
  *	Allocate and initialize a new llc_ui socket, validate the user wants a
  *	socket type we have available.
  *	Returns 0 upon success, negative upon failure.
  */
 static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
-			 int kern)
+			 bool kern, bool hold_net)
 {
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
@@ -182,7 +183,8 @@ static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
 
 	if (likely(sock->type == SOCK_DGRAM || sock->type == SOCK_STREAM)) {
 		rc = -ENOMEM;
-		sk = llc_sk_alloc(net, PF_LLC, GFP_KERNEL, &llc_proto, kern);
+		sk = llc_sk_alloc(net, PF_LLC, GFP_KERNEL, &llc_proto,
+				  kern, hold_net);
 		if (sk) {
 			rc = 0;
 			llc_ui_sk_init(sock, sk);
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index afc6974eafda..75b2e21bfd2b 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -761,10 +761,11 @@ static struct sock *llc_create_incoming_sock(struct sock *sk,
 					     struct llc_addr *saddr,
 					     struct llc_addr *daddr)
 {
-	struct sock *newsk = llc_sk_alloc(sock_net(sk), sk->sk_family, GFP_ATOMIC,
-					  sk->sk_prot, 0);
 	struct llc_sock *newllc, *llc = llc_sk(sk);
+	struct sock *newsk;
 
+	newsk = llc_sk_alloc(sock_net(sk), sk->sk_family, GFP_ATOMIC,
+			     sk->sk_prot, false, true);
 	if (!newsk)
 		goto out;
 	newllc = llc_sk(newsk);
@@ -923,11 +924,13 @@ static void llc_sk_init(struct sock *sk)
  *	@priority: for allocation (%GFP_KERNEL, %GFP_ATOMIC, etc)
  *	@prot: struct proto associated with this new sock instance
  *	@kern: is this to be a kernel socket?
+ *	@hold_net: hold netns refcnt or not
  *
  *	Allocates a LLC sock and initializes it. Returns the new LLC sock
  *	or %NULL if there's no memory available for one
  */
-struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority, struct proto *prot, int kern)
+struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
+			  struct proto *prot, bool kern, bool hold_net)
 {
 	struct sock *sk = sk_alloc(net, family, priority, prot, kern);
 
diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c
index f6de136008f6..17821c976213 100644
--- a/net/mctp/af_mctp.c
+++ b/net/mctp/af_mctp.c
@@ -682,7 +682,7 @@ static struct proto mctp_proto = {
 };
 
 static int mctp_pf_create(struct net *net, struct socket *sock,
-			  int protocol, int kern)
+			  int protocol, bool kern, bool hold_net)
 {
 	const struct proto_ops *ops;
 	struct proto *proto;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index f4e7b5e4bb59..ddc51cb89c5b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -619,7 +619,7 @@ static struct proto netlink_proto = {
 };
 
 static int __netlink_create(struct net *net, struct socket *sock,
-			    int protocol, int kern)
+			    int protocol, bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct netlink_sock *nlk;
@@ -645,7 +645,7 @@ static int __netlink_create(struct net *net, struct socket *sock,
 }
 
 static int netlink_create(struct net *net, struct socket *sock, int protocol,
-			  int kern)
+			  bool kern, bool hold_net)
 {
 	struct module *module = NULL;
 	struct netlink_sock *nlk;
@@ -684,7 +684,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
 	if (err < 0)
 		goto out;
 
-	err = __netlink_create(net, sock, protocol, kern);
+	err = __netlink_create(net, sock, protocol, kern, hold_net);
 	if (err < 0)
 		goto out_module;
 
@@ -2012,7 +2012,7 @@ __netlink_kernel_create(struct net *net, int unit, struct module *module,
 	if (sock_create_lite(PF_NETLINK, SOCK_DGRAM, unit, &sock))
 		return NULL;
 
-	if (__netlink_create(net, sock, unit, 1) < 0)
+	if (__netlink_create(net, sock, unit, true, false) < 0)
 		goto out_sock_release_nosk;
 
 	sk = sock->sk;
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 6ee148f0e6d0..483f78951a19 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -424,7 +424,7 @@ static struct proto nr_proto = {
 };
 
 static int nr_create(struct net *net, struct socket *sock, int protocol,
-		     int kern)
+		     bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct nr_sock *nr;
diff --git a/net/nfc/af_nfc.c b/net/nfc/af_nfc.c
index dda323e0a473..4fb1c86fcc81 100644
--- a/net/nfc/af_nfc.c
+++ b/net/nfc/af_nfc.c
@@ -16,7 +16,7 @@ static DEFINE_RWLOCK(proto_tab_lock);
 static const struct nfc_protocol *proto_tab[NFC_SOCKPROTO_MAX];
 
 static int nfc_sock_create(struct net *net, struct socket *sock, int proto,
-			   int kern)
+			   bool kern, bool hold_net)
 {
 	int rc = -EPROTONOSUPPORT;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 886c0dd47b66..5a25dac333b0 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3398,7 +3398,7 @@ static struct proto packet_proto = {
  */
 
 static int packet_create(struct net *net, struct socket *sock, int protocol,
-			 int kern)
+			 bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct packet_sock *po;
diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
index a27efa4faa4e..4bdbc93c74fb 100644
--- a/net/phonet/af_phonet.c
+++ b/net/phonet/af_phonet.c
@@ -48,7 +48,7 @@ static inline void phonet_proto_put(const struct phonet_protocol *pp)
 /* protocol family functions */
 
 static int pn_socket_create(struct net *net, struct socket *sock, int protocol,
-			    int kern)
+			    bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct pn_sock *pn;
diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c
index 00c51cf693f3..c05711f79a37 100644
--- a/net/qrtr/af_qrtr.c
+++ b/net/qrtr/af_qrtr.c
@@ -1258,7 +1258,7 @@ static struct proto qrtr_proto = {
 };
 
 static int qrtr_create(struct net *net, struct socket *sock,
-		       int protocol, int kern)
+		       int protocol, bool kern, bool hold_net)
 {
 	struct qrtr_sock *ipc;
 	struct sock *sk;
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 8435a20968ef..3e1bb40485ad 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -695,7 +695,7 @@ static int __rds_create(struct socket *sock, struct sock *sk, int protocol)
 }
 
 static int rds_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
 	struct sock *sk;
 
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 59050caab65c..1c175c92aa42 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -544,7 +544,7 @@ static struct proto rose_proto = {
 };
 
 static int rose_create(struct net *net, struct socket *sock, int protocol,
-		       int kern)
+		       bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct rose_sock *rose;
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 86873399f7d5..f2374f65b1c0 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -811,7 +811,7 @@ static __poll_t rxrpc_poll(struct file *file, struct socket *sock,
  * create an RxRPC socket
  */
 static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
-			int kern)
+			bool kern, bool hold_net)
 {
 	struct rxrpc_net *rxnet;
 	struct rxrpc_sock *rx;
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index b52bee98a3eb..2535b922f760 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -387,7 +387,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol)
 }
 
 static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
-				   int protocol, int kern)
+				   int protocol, bool kern, bool hold_net)
 {
 	struct proto *prot;
 	struct sock *sk;
@@ -1715,7 +1715,8 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
 	int rc = -EINVAL;
 
 	release_sock(lsk);
-	new_sk = smc_sock_alloc(sock_net(lsk), NULL, lsk->sk_protocol, 0);
+	new_sk = smc_sock_alloc(sock_net(lsk), NULL, lsk->sk_protocol,
+				false, true);
 	if (!new_sk) {
 		rc = -ENOMEM;
 		lsk->sk_err = ENOMEM;
@@ -3331,7 +3332,7 @@ int smc_create_clcsk(struct net *net, struct sock *sk, int family)
 }
 
 static int __smc_create(struct net *net, struct socket *sock, int protocol,
-			int kern, struct socket *clcsock)
+			bool kern, bool hold_net, struct socket *clcsock)
 {
 	int family = (protocol == SMCPROTO_SMC6) ? PF_INET6 : PF_INET;
 	struct smc_sock *smc;
@@ -3349,7 +3350,7 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
 	rc = -ENOBUFS;
 	sock->ops = &smc_sock_ops;
 	sock->state = SS_UNCONNECTED;
-	sk = smc_sock_alloc(net, sock, protocol, kern);
+	sk = smc_sock_alloc(net, sock, protocol, kern, hold_net);
 	if (!sk)
 		goto out;
 
@@ -3371,9 +3372,9 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
 }
 
 static int smc_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
-	return __smc_create(net, sock, protocol, kern, NULL);
+	return __smc_create(net, sock, protocol, kern, hold_net, NULL);
 }
 
 static const struct net_proto_family smc_sock_family_ops = {
@@ -3408,7 +3409,7 @@ static int smc_ulp_init(struct sock *sk)
 
 	smcsock->type = SOCK_STREAM;
 	__module_get(THIS_MODULE); /* tried in __tcp_ulp_find_autoload */
-	ret = __smc_create(net, smcsock, protocol, 0, tcp);
+	ret = __smc_create(net, smcsock, protocol, false, true, tcp);
 	if (ret) {
 		sock_release(smcsock); /* module_put() which ops won't be NULL */
 		return ret;
diff --git a/net/socket.c b/net/socket.c
index e5b4e0d34132..d1b4dadd67e4 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1561,7 +1561,7 @@ static int __sock_create(struct net *net, int family, int type, int protocol,
 	/* Now protected by module ref count */
 	rcu_read_unlock();
 
-	err = pf->create(net, sock, protocol, kern);
+	err = pf->create(net, sock, protocol, kern, hold_net);
 	if (err < 0) {
 		/* ->create should release the allocated sock->sk object on error
 		 * and make sure sock->sk is set to NULL to avoid use-after-free
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 65dcbb54f55d..4ee0bd1043e1 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -449,6 +449,7 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout)
  * @sock: pre-allocated socket structure
  * @protocol: protocol indicator (must be 0)
  * @kern: caused by kernel or by userspace?
+ * @hold_net: hold netns refcnt or not
  *
  * This routine creates additional data structures used by the TIPC socket,
  * initializes them, and links them together.
@@ -456,7 +457,7 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout)
  * Return: 0 on success, errno otherwise
  */
 static int tipc_sk_create(struct net *net, struct socket *sock,
-			  int protocol, int kern)
+			  int protocol, bool kern, bool hold_net)
 {
 	const struct proto_ops *ops;
 	struct sock *sk;
@@ -2735,7 +2736,8 @@ static int tipc_accept(struct socket *sock, struct socket *new_sock,
 
 	buf = skb_peek(&sk->sk_receive_queue);
 
-	res = tipc_sk_create(sock_net(sock->sk), new_sock, 0, arg->kern);
+	res = tipc_sk_create(sock_net(sock->sk), new_sock, 0,
+			     arg->kern, !arg->kern);
 	if (res)
 		goto exit;
 	security_sk_clone(sock->sk, new_sock->sk);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 6b1762300443..393be726004c 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1006,7 +1006,8 @@ struct proto unix_stream_proto = {
 #endif
 };
 
-static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, int type)
+static struct sock *unix_create1(struct net *net, struct socket *sock, int type,
+				 bool kern, bool hold_net)
 {
 	struct unix_sock *u;
 	struct sock *sk;
@@ -1061,7 +1062,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 }
 
 static int unix_create(struct net *net, struct socket *sock, int protocol,
-		       int kern)
+		       bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -1091,7 +1092,7 @@ static int unix_create(struct net *net, struct socket *sock, int protocol,
 		return -ESOCKTNOSUPPORT;
 	}
 
-	sk = unix_create1(net, sock, kern, sock->type);
+	sk = unix_create1(net, sock, sock->type, kern, hold_net);
 	if (IS_ERR(sk))
 		return PTR_ERR(sk);
 
@@ -1568,7 +1569,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	 */
 
 	/* create new sock for complete connection */
-	newsk = unix_create1(net, NULL, 0, sock->type);
+	newsk = unix_create1(net, NULL, sock->type, false, true);
 	if (IS_ERR(newsk)) {
 		err = PTR_ERR(newsk);
 		newsk = NULL;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5cf8109f672a..f2ce92cd57c4 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -732,7 +732,7 @@ static struct sock *__vsock_create(struct net *net,
 				   struct sock *parent,
 				   gfp_t priority,
 				   unsigned short type,
-				   int kern)
+				   bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct vsock_sock *psk;
@@ -864,7 +864,7 @@ static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 struct sock *vsock_create_connected(struct sock *parent)
 {
 	return __vsock_create(sock_net(parent), NULL, parent, GFP_KERNEL,
-			      parent->sk_type, 0);
+			      parent->sk_type, false, true);
 }
 EXPORT_SYMBOL_GPL(vsock_create_connected);
 
@@ -2399,7 +2399,7 @@ static const struct proto_ops vsock_seqpacket_ops = {
 };
 
 static int vsock_create(struct net *net, struct socket *sock,
-			int protocol, int kern)
+			int protocol, bool kern, bool hold_net)
 {
 	struct vsock_sock *vsk;
 	struct sock *sk;
@@ -2427,7 +2427,7 @@ static int vsock_create(struct net *net, struct socket *sock,
 
 	sock->state = SS_UNCONNECTED;
 
-	sk = __vsock_create(net, sock, NULL, GFP_KERNEL, 0, kern);
+	sk = __vsock_create(net, sock, NULL, GFP_KERNEL, 0, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 8dda4178497c..0b6c22b979e7 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -505,11 +505,12 @@ static struct proto x25_proto = {
 	.obj_size = sizeof(struct x25_sock),
 };
 
-static struct sock *x25_alloc_socket(struct net *net, int kern)
+static struct sock *x25_alloc_socket(struct net *net, bool kern, bool hold_net)
 {
 	struct x25_sock *x25;
-	struct sock *sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern);
+	struct sock *sk;
 
+	sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern);
 	if (!sk)
 		goto out;
 
@@ -525,7 +526,7 @@ static struct sock *x25_alloc_socket(struct net *net, int kern)
 }
 
 static int x25_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct x25_sock *x25;
@@ -543,7 +544,8 @@ static int x25_create(struct net *net, struct socket *sock, int protocol,
 		goto out;
 
 	rc = -ENOMEM;
-	if ((sk = x25_alloc_socket(net, kern)) == NULL)
+	sk = x25_alloc_socket(net, kern, hold_net);
+	if (!sk)
 		goto out;
 
 	x25 = x25_sk(sk);
@@ -592,7 +594,8 @@ static struct sock *x25_make_new(struct sock *osk)
 	if (osk->sk_type != SOCK_SEQPACKET)
 		goto out;
 
-	if ((sk = x25_alloc_socket(sock_net(osk), 0)) == NULL)
+	sk = x25_alloc_socket(sock_net(osk), false, true);
+	if (!sk)
 		goto out;
 
 	x25 = x25_sk(sk);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 3fa70286c846..5763ef355c73 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1688,7 +1688,7 @@ static void xsk_destruct(struct sock *sk)
 }
 
 static int xsk_create(struct net *net, struct socket *sock, int protocol,
-		      int kern)
+		      bool kern, bool hold_net)
 {
 	struct xdp_sock *xs;
 	struct sock *sk;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 05/15] ppp: Pass hold_net to struct pppox_proto.create().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (3 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 06/15] nfc: Pass hold_net to struct nfc_protocol.create() Kuniyuki Iwashima
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will introduce a new API to create a kernel socket with netns refcnt
held.  Then, sk_alloc() need the hold_net flag passed to pppox_create().

Let's pass it down to struct pppox_proto.create().

While at it, we convert the kern flag to boolean.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 drivers/net/ppp/pppoe.c  | 3 ++-
 drivers/net/ppp/pppox.c  | 2 +-
 drivers/net/ppp/pptp.c   | 3 ++-
 include/linux/if_pppox.h | 3 ++-
 net/l2tp/l2tp_ppp.c      | 3 ++-
 5 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 2ea4f4890d23..90995f8a08a3 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -533,7 +533,8 @@ static struct proto pppoe_sk_proto __read_mostly = {
  * Initialize a new struct sock.
  *
  **********************************************************************/
-static int pppoe_create(struct net *net, struct socket *sock, int kern)
+static int pppoe_create(struct net *net, struct socket *sock,
+			bool kern, bool hold_net)
 {
 	struct sock *sk;
 
diff --git a/drivers/net/ppp/pppox.c b/drivers/net/ppp/pppox.c
index 53b3f790d1f5..823b1facac6f 100644
--- a/drivers/net/ppp/pppox.c
+++ b/drivers/net/ppp/pppox.c
@@ -126,7 +126,7 @@ static int pppox_create(struct net *net, struct socket *sock, int protocol,
 	    !try_module_get(pppox_protos[protocol]->owner))
 		goto out;
 
-	rc = pppox_protos[protocol]->create(net, sock, kern);
+	rc = pppox_protos[protocol]->create(net, sock, kern, hold_net);
 
 	module_put(pppox_protos[protocol]->owner);
 out:
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 689687bd2574..7bfb5c227c40 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -538,7 +538,8 @@ static void pptp_sock_destruct(struct sock *sk)
 	dst_release(rcu_dereference_protected(sk->sk_dst_cache, 1));
 }
 
-static int pptp_create(struct net *net, struct socket *sock, int kern)
+static int pptp_create(struct net *net, struct socket *sock,
+		       bool kern, bool hold_net)
 {
 	int error = -ENOMEM;
 	struct sock *sk;
diff --git a/include/linux/if_pppox.h b/include/linux/if_pppox.h
index ff3beda1312c..a38047e308fd 100644
--- a/include/linux/if_pppox.h
+++ b/include/linux/if_pppox.h
@@ -68,7 +68,8 @@ static inline struct sock *sk_pppox(struct pppox_sock *po)
 struct module;
 
 struct pppox_proto {
-	int		(*create)(struct net *net, struct socket *sock, int kern);
+	int		(*create)(struct net *net, struct socket *sock,
+				  bool kern, bool hold_net);
 	int		(*ioctl)(struct socket *sock, unsigned int cmd,
 				 unsigned long arg);
 	struct module	*owner;
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 53baf2dd5d5d..bab3c7b943db 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -477,7 +477,8 @@ static int pppol2tp_backlog_recv(struct sock *sk, struct sk_buff *skb)
 
 /* socket() handler. Initialize a new struct sock.
  */
-static int pppol2tp_create(struct net *net, struct socket *sock, int kern)
+static int pppol2tp_create(struct net *net, struct socket *sock,
+			   bool kern, bool hold_net)
 {
 	int error = -ENOMEM;
 	struct sock *sk;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 06/15] nfc: Pass hold_net to struct nfc_protocol.create().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (4 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 05/15] ppp: Pass hold_net to struct pppox_proto.create() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 07/15] socket: Add hold_net flag to struct proto_accept_arg Kuniyuki Iwashima
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will introduce a new API to create a kernel socket with netns refcnt
held.  Then, sk_alloc() need the hold_net flag passed to nfc_sock_create().

Let's pass it down to struct nfc_protocol.create() and functions that call
sk_alloc().

While at it, we convert the kern flag to boolean.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/nfc/af_nfc.c    | 3 ++-
 net/nfc/llcp.h      | 3 ++-
 net/nfc/llcp_core.c | 3 ++-
 net/nfc/llcp_sock.c | 8 +++++---
 net/nfc/nfc.h       | 3 ++-
 net/nfc/rawsock.c   | 3 ++-
 6 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/nfc/af_nfc.c b/net/nfc/af_nfc.c
index 4fb1c86fcc81..6cdeeccd15bc 100644
--- a/net/nfc/af_nfc.c
+++ b/net/nfc/af_nfc.c
@@ -28,7 +28,8 @@ static int nfc_sock_create(struct net *net, struct socket *sock, int proto,
 
 	read_lock(&proto_tab_lock);
 	if (proto_tab[proto] &&	try_module_get(proto_tab[proto]->owner)) {
-		rc = proto_tab[proto]->create(net, sock, proto_tab[proto], kern);
+		rc = proto_tab[proto]->create(net, sock, proto_tab[proto],
+					      kern, hold_net);
 		module_put(proto_tab[proto]->owner);
 	}
 	read_unlock(&proto_tab_lock);
diff --git a/net/nfc/llcp.h b/net/nfc/llcp.h
index d8345ed57c95..b9d539358e65 100644
--- a/net/nfc/llcp.h
+++ b/net/nfc/llcp.h
@@ -211,7 +211,8 @@ void nfc_llcp_send_to_raw_sock(struct nfc_llcp_local *local,
 			       struct sk_buff *skb, u8 direction);
 
 /* Sock API */
-struct sock *nfc_llcp_sock_alloc(struct socket *sock, int type, gfp_t gfp, int kern);
+struct sock *nfc_llcp_sock_alloc(struct socket *sock, int type, gfp_t gfp,
+				 bool kern, bool hold_net);
 void nfc_llcp_sock_free(struct nfc_llcp_sock *sock);
 void nfc_llcp_accept_unlink(struct sock *sk);
 void nfc_llcp_accept_enqueue(struct sock *parent, struct sock *sk);
diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index 18be13fb9b75..96d8df013bda 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -965,7 +965,8 @@ static void nfc_llcp_recv_connect(struct nfc_llcp_local *local,
 		sock->ssap = ssap;
 	}
 
-	new_sk = nfc_llcp_sock_alloc(NULL, parent->sk_type, GFP_ATOMIC, 0);
+	new_sk = nfc_llcp_sock_alloc(NULL, parent->sk_type, GFP_ATOMIC,
+				     false, true);
 	if (new_sk == NULL) {
 		reason = LLCP_DM_REJ;
 		release_sock(&sock->sk);
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 57a2f97004e1..14f592becce0 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -971,7 +971,8 @@ static void llcp_sock_destruct(struct sock *sk)
 	}
 }
 
-struct sock *nfc_llcp_sock_alloc(struct socket *sock, int type, gfp_t gfp, int kern)
+struct sock *nfc_llcp_sock_alloc(struct socket *sock, int type, gfp_t gfp,
+				 bool kern, bool hold_net)
 {
 	struct sock *sk;
 	struct nfc_llcp_sock *llcp_sock;
@@ -1022,7 +1023,8 @@ void nfc_llcp_sock_free(struct nfc_llcp_sock *sock)
 }
 
 static int llcp_sock_create(struct net *net, struct socket *sock,
-			    const struct nfc_protocol *nfc_proto, int kern)
+			    const struct nfc_protocol *nfc_proto,
+			    bool kern, bool hold_net)
 {
 	struct sock *sk;
 
@@ -1041,7 +1043,7 @@ static int llcp_sock_create(struct net *net, struct socket *sock,
 		sock->ops = &llcp_sock_ops;
 	}
 
-	sk = nfc_llcp_sock_alloc(sock, sock->type, GFP_ATOMIC, kern);
+	sk = nfc_llcp_sock_alloc(sock, sock->type, GFP_ATOMIC, kern, hold_net);
 	if (sk == NULL)
 		return -ENOMEM;
 
diff --git a/net/nfc/nfc.h b/net/nfc/nfc.h
index 0b1e6466f4fb..6dac305a32d3 100644
--- a/net/nfc/nfc.h
+++ b/net/nfc/nfc.h
@@ -21,7 +21,8 @@ struct nfc_protocol {
 	struct proto *proto;
 	struct module *owner;
 	int (*create)(struct net *net, struct socket *sock,
-		      const struct nfc_protocol *nfc_proto, int kern);
+		      const struct nfc_protocol *nfc_proto,
+		      bool kern, bool hold_net);
 };
 
 struct nfc_rawsock {
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index 5125392bb68e..4485b1ccb1c7 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -321,7 +321,8 @@ static void rawsock_destruct(struct sock *sk)
 }
 
 static int rawsock_create(struct net *net, struct socket *sock,
-			  const struct nfc_protocol *nfc_proto, int kern)
+			  const struct nfc_protocol *nfc_proto,
+			  bool kern, bool hold_net)
 {
 	struct sock *sk;
 
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 07/15] socket: Add hold_net flag to struct proto_accept_arg.
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (5 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 06/15] nfc: Pass hold_net to struct nfc_protocol.create() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 08/15] socket: Pass hold_net to sk_alloc() Kuniyuki Iwashima
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will introduce a new API to create a kernel socket with netns refcnt
held.  Then, sk_alloc() need the hold_net flag passed from the accept()
paths.

Let's add a new hold_net flag to struct proto_accept_arg and pass it
down before sk_alloc().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 drivers/xen/pvcalls-back.c | 1 +
 fs/ocfs2/cluster/tcp.c     | 2 ++
 include/net/sctp/structs.h | 2 +-
 include/net/sock.h         | 1 +
 io_uring/net.c             | 2 ++
 net/atm/svc.c              | 2 +-
 net/rds/tcp_listen.c       | 1 +
 net/sctp/ipv6.c            | 7 ++++---
 net/sctp/protocol.c        | 7 ++++---
 net/sctp/socket.c          | 2 +-
 net/socket.c               | 6 +++++-
 net/tipc/socket.c          | 2 +-
 12 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index fd7ed65e0197..f0f8b4862983 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -520,6 +520,7 @@ static void __pvcalls_back_accept(struct work_struct *work)
 	struct proto_accept_arg arg = {
 		.flags = O_NONBLOCK,
 		.kern = true,
+		.hold_net = false,
 	};
 	struct sock_mapping *map;
 	struct pvcalls_ioworker *iow;
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 2b8fa3e782fb..6ef03a02d19b 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1786,6 +1786,8 @@ static int o2net_accept_one(struct socket *sock, int *more)
 	struct o2net_sock_container *sc = NULL;
 	struct proto_accept_arg arg = {
 		.flags = O_NONBLOCK,
+		.kern = false,
+		.hold_net = true,
 	};
 	struct o2net_node *nn;
 	unsigned int nofs_flag;
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 31248cfdfb23..ae2729ab2040 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -502,7 +502,7 @@ struct sctp_pf {
 	int  (*supported_addrs)(const struct sctp_sock *, __be16 *);
 	struct sock *(*create_accept_sk) (struct sock *sk,
 					  struct sctp_association *asoc,
-					  bool kern);
+					  struct proto_accept_arg *arg);
 	int (*addr_to_user)(struct sctp_sock *sk, union sctp_addr *addr);
 	void (*to_sk_saddr)(union sctp_addr *, struct sock *sk);
 	void (*to_sk_daddr)(union sctp_addr *, struct sock *sk);
diff --git a/include/net/sock.h b/include/net/sock.h
index 7464e9f9f47c..9963dccec2f8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1214,6 +1214,7 @@ struct proto_accept_arg {
 	int err;
 	int is_empty;
 	bool kern;
+	bool hold_net;
 };
 
 /* Networking protocol blocks we attach to sockets.
diff --git a/io_uring/net.c b/io_uring/net.c
index df1f7dc6f1c8..93418208b37d 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1559,6 +1559,8 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 	bool fixed = !!accept->file_slot;
 	struct proto_accept_arg arg = {
 		.flags = force_nonblock ? O_NONBLOCK : 0,
+		.kern = false,
+		.hold_net = true,
 	};
 	struct file *file;
 	unsigned cflags;
diff --git a/net/atm/svc.c b/net/atm/svc.c
index 9795294f4c1e..a23699acb3fd 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -336,7 +336,7 @@ static int svc_accept(struct socket *sock, struct socket *newsock,
 
 	lock_sock(sk);
 
-	error = svc_create(sock_net(sk), newsock, 0, arg->kern, !arg->kern);
+	error = svc_create(sock_net(sk), newsock, 0, arg->kern, arg->hold_net);
 	if (error)
 		goto out;
 
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index d89bd8d0c354..69aaf03ab93e 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -108,6 +108,7 @@ int rds_tcp_accept_one(struct socket *sock)
 	struct proto_accept_arg arg = {
 		.flags = O_NONBLOCK,
 		.kern = true,
+		.hold_net = false,
 	};
 #if !IS_ENABLED(CONFIG_IPV6)
 	struct in6_addr saddr, daddr;
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index a9ed2ccab1bd..2c4e4dd79246 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -777,13 +777,14 @@ static enum sctp_scope sctp_v6_scope(union sctp_addr *addr)
 /* Create and initialize a new sk for the socket to be returned by accept(). */
 static struct sock *sctp_v6_create_accept_sk(struct sock *sk,
 					     struct sctp_association *asoc,
-					     bool kern)
+					     struct proto_accept_arg *arg)
 {
-	struct sock *newsk;
 	struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
 	struct sctp6_sock *newsctp6sk;
+	struct sock *newsk;
 
-	newsk = sk_alloc(sock_net(sk), PF_INET6, GFP_KERNEL, sk->sk_prot, kern);
+	newsk = sk_alloc(sock_net(sk), PF_INET6, GFP_KERNEL, sk->sk_prot,
+			 arg->kern);
 	if (!newsk)
 		goto out;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 8b9a1b96695e..7b2ae3df171a 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -581,12 +581,13 @@ static int sctp_v4_is_ce(const struct sk_buff *skb)
 /* Create and initialize a new sk for the socket returned by accept(). */
 static struct sock *sctp_v4_create_accept_sk(struct sock *sk,
 					     struct sctp_association *asoc,
-					     bool kern)
+					     struct proto_accept_arg *arg)
 {
-	struct sock *newsk = sk_alloc(sock_net(sk), PF_INET, GFP_KERNEL,
-			sk->sk_prot, kern);
 	struct inet_sock *newinet;
+	struct sock *newsk;
 
+	newsk = sk_alloc(sock_net(sk), PF_INET, GFP_KERNEL, sk->sk_prot,
+			 arg->kern);
 	if (!newsk)
 		goto out;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 36ee34f483d7..a1add0b7fd9f 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4887,7 +4887,7 @@ static struct sock *sctp_accept(struct sock *sk, struct proto_accept_arg *arg)
 	 */
 	asoc = list_entry(ep->asocs.next, struct sctp_association, asocs);
 
-	newsk = sp->pf->create_accept_sk(sk, asoc, arg->kern);
+	newsk = sp->pf->create_accept_sk(sk, asoc, arg);
 	if (!newsk) {
 		error = -ENOMEM;
 		goto out;
diff --git a/net/socket.c b/net/socket.c
index d1b4dadd67e4..a8796d7f06be 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1971,7 +1971,10 @@ struct file *do_accept(struct file *file, struct proto_accept_arg *arg,
 static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_sockaddr,
 			      int __user *upeer_addrlen, int flags)
 {
-	struct proto_accept_arg arg = { };
+	struct proto_accept_arg arg = {
+		.kern = false,
+		.hold_net = true,
+	};
 	struct file *newfile;
 	int newfd;
 
@@ -3586,6 +3589,7 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
 	struct proto_accept_arg arg = {
 		.flags = flags,
 		.kern = true,
+		.hold_net = false,
 	};
 	int err;
 
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 4ee0bd1043e1..26566ff1d4c7 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2737,7 +2737,7 @@ static int tipc_accept(struct socket *sock, struct socket *new_sock,
 	buf = skb_peek(&sk->sk_receive_queue);
 
 	res = tipc_sk_create(sock_net(sock->sk), new_sock, 0,
-			     arg->kern, !arg->kern);
+			     arg->kern, arg->hold_net);
 	if (res)
 		goto exit;
 	security_sk_clone(sock->sk, new_sock->sk);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 08/15] socket: Pass hold_net to sk_alloc().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (6 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 07/15] socket: Add hold_net flag to struct proto_accept_arg Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13 13:45   ` Wenjia Zhang
  2024-12-13  9:21 ` [PATCH v3 net-next 09/15] socket: Respect hold_net in sk_alloc() Kuniyuki Iwashima
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will introduce a new API to create a kernel socket with netns refcnt
held.  Then, sk_alloc() need the hold_net flag passed to __sock_create().

Let's pass it to sk_alloc().

The actual use of hold_net will be in the next patch to make its review
easy.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
v2:
  * Fix build error in iucv_sock_alloc()
---
 crypto/af_alg.c              | 5 +++--
 drivers/isdn/mISDN/socket.c  | 4 ++--
 drivers/net/ppp/pppoe.c      | 2 +-
 drivers/net/ppp/pptp.c       | 2 +-
 drivers/net/tap.c            | 2 +-
 drivers/net/tun.c            | 2 +-
 drivers/xen/pvcalls-front.c  | 3 ++-
 include/net/sock.h           | 2 +-
 net/appletalk/ddp.c          | 2 +-
 net/atm/common.c             | 2 +-
 net/ax25/af_ax25.c           | 5 +++--
 net/bluetooth/af_bluetooth.c | 2 +-
 net/bluetooth/cmtp/sock.c    | 2 +-
 net/bpf/test_run.c           | 2 +-
 net/caif/caif_socket.c       | 2 +-
 net/can/af_can.c             | 2 +-
 net/core/sock.c              | 3 ++-
 net/ieee802154/socket.c      | 2 +-
 net/ipv4/af_inet.c           | 2 +-
 net/ipv6/af_inet6.c          | 2 +-
 net/iucv/af_iucv.c           | 2 +-
 net/kcm/kcmsock.c            | 4 ++--
 net/key/af_key.c             | 2 +-
 net/l2tp/l2tp_ppp.c          | 3 ++-
 net/llc/llc_conn.c           | 2 +-
 net/mctp/af_mctp.c           | 2 +-
 net/netlink/af_netlink.c     | 3 ++-
 net/netrom/af_netrom.c       | 5 +++--
 net/nfc/llcp_sock.c          | 2 +-
 net/nfc/rawsock.c            | 2 +-
 net/packet/af_packet.c       | 2 +-
 net/phonet/af_phonet.c       | 2 +-
 net/phonet/pep.c             | 2 +-
 net/qrtr/af_qrtr.c           | 2 +-
 net/rds/af_rds.c             | 2 +-
 net/rose/af_rose.c           | 9 +++++----
 net/rxrpc/af_rxrpc.c         | 2 +-
 net/sctp/ipv6.c              | 2 +-
 net/sctp/protocol.c          | 2 +-
 net/smc/af_smc.c             | 2 +-
 net/tipc/socket.c            | 2 +-
 net/unix/af_unix.c           | 8 +++++---
 net/vmw_vsock/af_vsock.c     | 2 +-
 net/x25/af_x25.c             | 2 +-
 net/xdp/xsk.c                | 2 +-
 45 files changed, 65 insertions(+), 55 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index e60032b94d97..bef4f0c8dac8 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -423,7 +423,8 @@ int af_alg_accept(struct sock *sk, struct socket *newsock,
 	if (!type)
 		goto unlock;
 
-	sk2 = sk_alloc(sock_net(sk), PF_ALG, GFP_KERNEL, &alg_proto, arg->kern);
+	sk2 = sk_alloc(sock_net(sk), PF_ALG, GFP_KERNEL, &alg_proto,
+		       arg->kern, arg->hold_net);
 	err = -ENOMEM;
 	if (!sk2)
 		goto unlock;
@@ -514,7 +515,7 @@ static int alg_create(struct net *net, struct socket *sock, int protocol,
 		return -EPROTONOSUPPORT;
 
 	err = -ENOMEM;
-	sk = sk_alloc(net, PF_ALG, GFP_KERNEL, &alg_proto, kern);
+	sk = sk_alloc(net, PF_ALG, GFP_KERNEL, &alg_proto, kern, hold_net);
 	if (!sk)
 		goto out;
 
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 54157c24ccb9..2d2404cf5649 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -598,7 +598,7 @@ data_sock_create(struct net *net, struct socket *sock, int protocol,
 	if (sock->type != SOCK_DGRAM)
 		return -ESOCKTNOSUPPORT;
 
-	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern);
+	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
@@ -757,7 +757,7 @@ base_sock_create(struct net *net, struct socket *sock, int protocol,
 	if (!capable(CAP_NET_RAW))
 		return -EPERM;
 
-	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern);
+	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 90995f8a08a3..6606aa4374e9 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -538,7 +538,7 @@ static int pppoe_create(struct net *net, struct socket *sock,
 {
 	struct sock *sk;
 
-	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppoe_sk_proto, kern);
+	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppoe_sk_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 7bfb5c227c40..4c41e07ec497 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -546,7 +546,7 @@ static int pptp_create(struct net *net, struct socket *sock,
 	struct pppox_sock *po;
 	struct pptp_opt *opt;
 
-	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pptp_sk_proto, kern);
+	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pptp_sk_proto, kern, hold_net);
 	if (!sk)
 		goto out;
 
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 5aa41d5f7765..7bce097e96a5 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -522,7 +522,7 @@ static int tap_open(struct inode *inode, struct file *file)
 
 	err = -ENOMEM;
 	q = (struct tap_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-					     &tap_proto, 0);
+					 &tap_proto, false, true);
 	if (!q)
 		goto err;
 	if (ptr_ring_init(&q->ring, tap->dev->tx_queue_len, GFP_KERNEL)) {
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8e94df88392c..13bbee8d0a4b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3481,7 +3481,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	struct tun_file *tfile;
 
 	tfile = (struct tun_file *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-					    &tun_proto, 0);
+					    &tun_proto, false, true);
 	if (!tfile)
 		return -ENOMEM;
 	if (ptr_ring_init(&tfile->tx_ring, 0, GFP_KERNEL)) {
diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index b72ee9379d77..a2308d24e67d 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -882,7 +882,8 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
 
 received:
 	map2->sock = newsock;
-	newsock->sk = sk_alloc(sock_net(sock->sk), PF_INET, GFP_KERNEL, &pvcalls_proto, false);
+	newsock->sk = sk_alloc(sock_net(sock->sk), PF_INET, GFP_KERNEL, &pvcalls_proto,
+			       false, true);
 	if (!newsock->sk) {
 		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
 		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
diff --git a/include/net/sock.h b/include/net/sock.h
index 9963dccec2f8..8de415fefe3b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1743,7 +1743,7 @@ static inline bool sock_allow_reclassification(const struct sock *csk)
 }
 
 struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
-		      struct proto *prot, int kern);
+		      struct proto *prot, bool kern, bool hold_net);
 void sk_free(struct sock *sk);
 void sk_destruct(struct sock *sk);
 struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 9bd361ccf5f4..3eab462100e0 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1050,7 +1050,7 @@ static int atalk_create(struct net *net, struct socket *sock, int protocol,
 		goto out;
 
 	rc = -ENOMEM;
-	sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, &ddp_proto, kern);
+	sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, &ddp_proto, kern, hold_net);
 	if (!sk)
 		goto out;
 	rc = 0;
diff --git a/net/atm/common.c b/net/atm/common.c
index c1e05b0c0b4b..2cf074c3e8a5 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -146,7 +146,7 @@ int vcc_create(struct net *net, struct socket *sock, int protocol, int family,
 	sock->sk = NULL;
 	if (sock->type == SOCK_STREAM)
 		return -EINVAL;
-	sk = sk_alloc(net, family, GFP_KERNEL, &vcc_proto, kern);
+	sk = sk_alloc(net, family, GFP_KERNEL, &vcc_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 	sock_init_data(sock, sk);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 6c68b5e5b11c..6f572c0b3f59 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -890,7 +890,7 @@ static int ax25_create(struct net *net, struct socket *sock, int protocol,
 		return -ESOCKTNOSUPPORT;
 	}
 
-	sk = sk_alloc(net, PF_AX25, GFP_ATOMIC, &ax25_proto, kern);
+	sk = sk_alloc(net, PF_AX25, GFP_ATOMIC, &ax25_proto, kern, hold_net);
 	if (sk == NULL)
 		return -ENOMEM;
 
@@ -916,7 +916,8 @@ struct sock *ax25_make_new(struct sock *osk, struct ax25_dev *ax25_dev)
 	struct sock *sk;
 	ax25_cb *ax25, *oax25;
 
-	sk = sk_alloc(sock_net(osk), PF_AX25, GFP_ATOMIC, osk->sk_prot, 0);
+	sk = sk_alloc(sock_net(osk), PF_AX25, GFP_ATOMIC, osk->sk_prot,
+		      false, true);
 	if (sk == NULL)
 		return NULL;
 
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 7c24a6f87281..6c89fa2d9ccd 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -146,7 +146,7 @@ struct sock *bt_sock_alloc(struct net *net, struct socket *sock,
 {
 	struct sock *sk;
 
-	sk = sk_alloc(net, PF_BLUETOOTH, prio, prot, kern);
+	sk = sk_alloc(net, PF_BLUETOOTH, prio, prot, kern, hold_net);
 	if (!sk)
 		return NULL;
 
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index 2ea9da9fe1d5..6e9138748317 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -207,7 +207,7 @@ static int cmtp_sock_create(struct net *net, struct socket *sock, int protocol,
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
 
-	sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC, &cmtp_proto, kern);
+	sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC, &cmtp_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 9ae2a7f1738b..f663f760bcb8 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -1024,7 +1024,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		break;
 	}
 
-	sk = sk_alloc(net, AF_UNSPEC, GFP_USER, &bpf_dummy_proto, 1);
+	sk = sk_alloc(net, AF_UNSPEC, GFP_USER, &bpf_dummy_proto, true, false);
 	if (!sk) {
 		kfree(data);
 		kfree(ctx);
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 6eef0e83f442..60fa870cfe97 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1048,7 +1048,7 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
 	 * is really not used at all in the net/core or socket.c but the
 	 * initialization makes sure that sock->state is not uninitialized.
 	 */
-	sk = sk_alloc(net, PF_CAIF, GFP_KERNEL, &prot, kern);
+	sk = sk_alloc(net, PF_CAIF, GFP_KERNEL, &prot, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/can/af_can.c b/net/can/af_can.c
index c4094ccc9978..cecdc8b7420c 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -155,7 +155,7 @@ static int can_create(struct net *net, struct socket *sock, int protocol,
 
 	sock->ops = cp->ops;
 
-	sk = sk_alloc(net, PF_CAN, GFP_KERNEL, cp->prot, kern);
+	sk = sk_alloc(net, PF_CAN, GFP_KERNEL, cp->prot, kern, hold_net);
 	if (!sk) {
 		err = -ENOMEM;
 		goto errout;
diff --git a/net/core/sock.c b/net/core/sock.c
index 74729d20cd00..8546d97cc6ec 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2209,9 +2209,10 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
  *	@priority: for allocation (%GFP_KERNEL, %GFP_ATOMIC, etc)
  *	@prot: struct proto associated with this new sock instance
  *	@kern: is this to be a kernel socket?
+ *	@hold_net: hold netns refcnt or not
  */
 struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
-		      struct proto *prot, int kern)
+		      struct proto *prot, bool kern, bool hold_net)
 {
 	struct sock *sk;
 
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index 0dd1a8829c42..6144338c420d 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -1027,7 +1027,7 @@ static int ieee802154_create(struct net *net, struct socket *sock,
 	}
 
 	rc = -ENOMEM;
-	sk = sk_alloc(net, PF_IEEE802154, GFP_KERNEL, proto, kern);
+	sk = sk_alloc(net, PF_IEEE802154, GFP_KERNEL, proto, kern, hold_net);
 	if (!sk)
 		goto out;
 	rc = 0;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7313ec410fb5..d22bb0d3ddc1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -323,7 +323,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 	WARN_ON(!answer_prot->slab);
 
 	err = -ENOMEM;
-	sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern);
+	sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern, hold_net);
 	if (!sk)
 		goto out;
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 8f951e5e58ab..c30fa8de7451 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -190,7 +190,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	WARN_ON(!answer_prot->slab);
 
 	err = -ENOBUFS;
-	sk = sk_alloc(net, PF_INET6, GFP_KERNEL, answer_prot, kern);
+	sk = sk_alloc(net, PF_INET6, GFP_KERNEL, answer_prot, kern, hold_net);
 	if (!sk)
 		goto out;
 
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index b7bbd4947855..76ecc64ec60c 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -452,7 +452,7 @@ static struct sock *iucv_sock_alloc(struct socket *sock, int proto, gfp_t prio,
 	struct sock *sk;
 	struct iucv_sock *iucv;
 
-	sk = sk_alloc(&init_net, PF_IUCV, prio, &iucv_proto, kern);
+	sk = sk_alloc(&init_net, PF_IUCV, prio, &iucv_proto, kern, hold_net);
 	if (!sk)
 		return NULL;
 	iucv = iucv_sk(sk);
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 50925046a392..8c791d1272cc 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1517,7 +1517,7 @@ static struct file *kcm_clone(struct socket *osock)
 	__module_get(newsock->ops->owner);
 
 	newsk = sk_alloc(sock_net(osock->sk), PF_KCM, GFP_KERNEL,
-			 &kcm_proto, false);
+			 &kcm_proto, false, true);
 	if (!newsk) {
 		sock_release(newsock);
 		return ERR_PTR(-ENOMEM);
@@ -1798,7 +1798,7 @@ static int kcm_create(struct net *net, struct socket *sock,
 	if (protocol != KCMPROTO_CONNECTED)
 		return -EPROTONOSUPPORT;
 
-	sk = sk_alloc(net, PF_KCM, GFP_KERNEL, &kcm_proto, kern);
+	sk = sk_alloc(net, PF_KCM, GFP_KERNEL, &kcm_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1c35b1cfb1c5..765cc86d7923 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -149,7 +149,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	if (protocol != PF_KEY_V2)
 		return -EPROTONOSUPPORT;
 
-	sk = sk_alloc(net, PF_KEY, GFP_KERNEL, &key_proto, kern);
+	sk = sk_alloc(net, PF_KEY, GFP_KERNEL, &key_proto, kern, hold_net);
 	if (sk == NULL)
 		return -ENOMEM;
 
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index bab3c7b943db..5bd99d5ca128 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -483,7 +483,8 @@ static int pppol2tp_create(struct net *net, struct socket *sock,
 	int error = -ENOMEM;
 	struct sock *sk;
 
-	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppol2tp_sk_proto, kern);
+	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppol2tp_sk_proto,
+		      kern, hold_net);
 	if (!sk)
 		goto out;
 
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index 75b2e21bfd2b..ba0ed49b3085 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -932,7 +932,7 @@ static void llc_sk_init(struct sock *sk)
 struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
 			  struct proto *prot, bool kern, bool hold_net)
 {
-	struct sock *sk = sk_alloc(net, family, priority, prot, kern);
+	struct sock *sk = sk_alloc(net, family, priority, prot, kern, hold_net);
 
 	if (!sk)
 		goto out;
diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c
index 17821c976213..5de6bc967271 100644
--- a/net/mctp/af_mctp.c
+++ b/net/mctp/af_mctp.c
@@ -702,7 +702,7 @@ static int mctp_pf_create(struct net *net, struct socket *sock,
 	sock->state = SS_UNCONNECTED;
 	sock->ops = ops;
 
-	sk = sk_alloc(net, PF_MCTP, GFP_KERNEL, proto, kern);
+	sk = sk_alloc(net, PF_MCTP, GFP_KERNEL, proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ddc51cb89c5b..273f3e43938a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -626,7 +626,8 @@ static int __netlink_create(struct net *net, struct socket *sock,
 
 	sock->ops = &netlink_ops;
 
-	sk = sk_alloc(net, PF_NETLINK, GFP_KERNEL, &netlink_proto, kern);
+	sk = sk_alloc(net, PF_NETLINK, GFP_KERNEL, &netlink_proto,
+		      kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 483f78951a19..0803ca64385d 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -435,7 +435,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
 	if (sock->type != SOCK_SEQPACKET || protocol != 0)
 		return -ESOCKTNOSUPPORT;
 
-	sk = sk_alloc(net, PF_NETROM, GFP_ATOMIC, &nr_proto, kern);
+	sk = sk_alloc(net, PF_NETROM, GFP_ATOMIC, &nr_proto, kern, hold_net);
 	if (sk  == NULL)
 		return -ENOMEM;
 
@@ -478,7 +478,8 @@ static struct sock *nr_make_new(struct sock *osk)
 	if (osk->sk_type != SOCK_SEQPACKET)
 		return NULL;
 
-	sk = sk_alloc(sock_net(osk), PF_NETROM, GFP_ATOMIC, osk->sk_prot, 0);
+	sk = sk_alloc(sock_net(osk), PF_NETROM, GFP_ATOMIC, osk->sk_prot,
+		      false, true);
 	if (sk == NULL)
 		return NULL;
 
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 14f592becce0..80c427c32a91 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -977,7 +977,7 @@ struct sock *nfc_llcp_sock_alloc(struct socket *sock, int type, gfp_t gfp,
 	struct sock *sk;
 	struct nfc_llcp_sock *llcp_sock;
 
-	sk = sk_alloc(&init_net, PF_NFC, gfp, &llcp_sock_proto, kern);
+	sk = sk_alloc(&init_net, PF_NFC, gfp, &llcp_sock_proto, kern, hold_net);
 	if (!sk)
 		return NULL;
 
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index 4485b1ccb1c7..f2443d274065 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -339,7 +339,7 @@ static int rawsock_create(struct net *net, struct socket *sock,
 		sock->ops = &rawsock_ops;
 	}
 
-	sk = sk_alloc(net, PF_NFC, GFP_ATOMIC, nfc_proto->proto, kern);
+	sk = sk_alloc(net, PF_NFC, GFP_ATOMIC, nfc_proto->proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 5a25dac333b0..2d1cab4839cd 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3414,7 +3414,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	sock->state = SS_UNCONNECTED;
 
 	err = -ENOBUFS;
-	sk = sk_alloc(net, PF_PACKET, GFP_KERNEL, &packet_proto, kern);
+	sk = sk_alloc(net, PF_PACKET, GFP_KERNEL, &packet_proto, kern, hold_net);
 	if (sk == NULL)
 		goto out;
 
diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
index 4bdbc93c74fb..dc2e03edd65d 100644
--- a/net/phonet/af_phonet.c
+++ b/net/phonet/af_phonet.c
@@ -84,7 +84,7 @@ static int pn_socket_create(struct net *net, struct socket *sock, int protocol,
 		goto out;
 	}
 
-	sk = sk_alloc(net, PF_PHONET, GFP_KERNEL, pnp->prot, kern);
+	sk = sk_alloc(net, PF_PHONET, GFP_KERNEL, pnp->prot, kern, hold_net);
 	if (sk == NULL) {
 		err = -ENOMEM;
 		goto out;
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 53a858478e22..9b6e83b92f6f 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -836,7 +836,7 @@ static struct sock *pep_sock_accept(struct sock *sk,
 
 	/* Create a new to-be-accepted sock */
 	newsk = sk_alloc(sock_net(sk), PF_PHONET, GFP_KERNEL, sk->sk_prot,
-			 arg->kern);
+			 arg->kern, arg->hold_net);
 	if (!newsk) {
 		pep_reject_conn(sk, skb, PN_PIPE_ERR_OVERLOAD, GFP_KERNEL);
 		err = -ENOBUFS;
diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c
index c05711f79a37..05a3b00fddf8 100644
--- a/net/qrtr/af_qrtr.c
+++ b/net/qrtr/af_qrtr.c
@@ -1266,7 +1266,7 @@ static int qrtr_create(struct net *net, struct socket *sock,
 	if (sock->type != SOCK_DGRAM)
 		return -EPROTOTYPE;
 
-	sk = sk_alloc(net, AF_QIPCRTR, GFP_KERNEL, &qrtr_proto, kern);
+	sk = sk_alloc(net, AF_QIPCRTR, GFP_KERNEL, &qrtr_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 3e1bb40485ad..a0999d9ee5ae 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -702,7 +702,7 @@ static int rds_create(struct net *net, struct socket *sock, int protocol,
 	if (sock->type != SOCK_SEQPACKET || protocol)
 		return -ESOCKTNOSUPPORT;
 
-	sk = sk_alloc(net, AF_RDS, GFP_KERNEL, &rds_proto, kern);
+	sk = sk_alloc(net, AF_RDS, GFP_KERNEL, &rds_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 1c175c92aa42..6aeaa526382a 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -555,8 +555,8 @@ static int rose_create(struct net *net, struct socket *sock, int protocol,
 	if (sock->type != SOCK_SEQPACKET || protocol != 0)
 		return -ESOCKTNOSUPPORT;
 
-	sk = sk_alloc(net, PF_ROSE, GFP_ATOMIC, &rose_proto, kern);
-	if (sk == NULL)
+	sk = sk_alloc(net, PF_ROSE, GFP_ATOMIC, &rose_proto, kern, hold_net);
+	if (!sk)
 		return -ENOMEM;
 
 	rose = rose_sk(sk);
@@ -594,8 +594,9 @@ static struct sock *rose_make_new(struct sock *osk)
 	if (osk->sk_type != SOCK_SEQPACKET)
 		return NULL;
 
-	sk = sk_alloc(sock_net(osk), PF_ROSE, GFP_ATOMIC, &rose_proto, 0);
-	if (sk == NULL)
+	sk = sk_alloc(sock_net(osk), PF_ROSE, GFP_ATOMIC, &rose_proto,
+		      false, true);
+	if (!sk)
 		return NULL;
 
 	rose = rose_sk(sk);
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index f2374f65b1c0..7e7e1163c476 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -830,7 +830,7 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
 	sock->ops = &rxrpc_rpc_ops;
 	sock->state = SS_UNCONNECTED;
 
-	sk = sk_alloc(net, PF_RXRPC, GFP_KERNEL, &rxrpc_proto, kern);
+	sk = sk_alloc(net, PF_RXRPC, GFP_KERNEL, &rxrpc_proto, kern, hold_net);
 	if (!sk)
 		return -ENOMEM;
 
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 2c4e4dd79246..5e62c77a6f47 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -784,7 +784,7 @@ static struct sock *sctp_v6_create_accept_sk(struct sock *sk,
 	struct sock *newsk;
 
 	newsk = sk_alloc(sock_net(sk), PF_INET6, GFP_KERNEL, sk->sk_prot,
-			 arg->kern);
+			 arg->kern, arg->hold_net);
 	if (!newsk)
 		goto out;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 7b2ae3df171a..73ee2ca9ff31 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -587,7 +587,7 @@ static struct sock *sctp_v4_create_accept_sk(struct sock *sk,
 	struct sock *newsk;
 
 	newsk = sk_alloc(sock_net(sk), PF_INET, GFP_KERNEL, sk->sk_prot,
-			 arg->kern);
+			 arg->kern, arg->hold_net);
 	if (!newsk)
 		goto out;
 
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 2535b922f760..6e93f188a908 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -393,7 +393,7 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
 	struct sock *sk;
 
 	prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto;
-	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, kern);
+	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, kern, hold_net);
 	if (!sk)
 		return NULL;
 
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 26566ff1d4c7..aba5b139c7d9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -484,7 +484,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
 	}
 
 	/* Allocate socket's protocol area */
-	sk = sk_alloc(net, AF_TIPC, GFP_KERNEL, &tipc_proto, kern);
+	sk = sk_alloc(net, AF_TIPC, GFP_KERNEL, &tipc_proto, kern, hold_net);
 	if (sk == NULL)
 		return -ENOMEM;
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 393be726004c..136f4b1d05da 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1020,9 +1020,11 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int type,
 	}
 
 	if (type == SOCK_STREAM)
-		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_stream_proto, kern);
-	else /*dgram and  seqpacket */
-		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_dgram_proto, kern);
+		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_stream_proto,
+			      kern, hold_net);
+	else /* dgram and seqpacket */
+		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_dgram_proto,
+			      kern, hold_net);
 
 	if (!sk) {
 		err = -ENOMEM;
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f2ce92cd57c4..10aa09e1a291 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -738,7 +738,7 @@ static struct sock *__vsock_create(struct net *net,
 	struct vsock_sock *psk;
 	struct vsock_sock *vsk;
 
-	sk = sk_alloc(net, AF_VSOCK, priority, &vsock_proto, kern);
+	sk = sk_alloc(net, AF_VSOCK, priority, &vsock_proto, kern, hold_net);
 	if (!sk)
 		return NULL;
 
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 0b6c22b979e7..3619982cbb32 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -510,7 +510,7 @@ static struct sock *x25_alloc_socket(struct net *net, bool kern, bool hold_net)
 	struct x25_sock *x25;
 	struct sock *sk;
 
-	sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern);
+	sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern, hold_net);
 	if (!sk)
 		goto out;
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5763ef355c73..a93b600c6583 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1703,7 +1703,7 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
 
 	sock->state = SS_UNCONNECTED;
 
-	sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern);
+	sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern, hold_net);
 	if (!sk)
 		return -ENOBUFS;
 
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 09/15] socket: Respect hold_net in sk_alloc().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (7 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 08/15] socket: Pass hold_net to sk_alloc() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 10/15] socket: Introduce sock_create_net() Kuniyuki Iwashima
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

We will introduce a new API to create a kernel socket with netns
refcnt held.

sk->sk_net_refcnt was set to 0 when kern was 1 in sk_alloc().

Now we have the hold_net flag in sk_alloc().

Let's set it to sk->sk_net_refcnt and add an assertion to catch
only one illegal pattern.

No functional change is introduced for now because currently
hold_net == !kern.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/sock.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 8546d97cc6ec..11aa6d8c0cdd 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2224,9 +2224,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		 * why we need sk_prot_creator -acme
 		 */
 		sk->sk_prot = sk->sk_prot_creator = prot;
+
+		DEBUG_NET_WARN_ON_ONCE(!kern && !hold_net);
 		sk->sk_kern_sock = kern;
 		sock_lock_init(sk);
-		sk->sk_net_refcnt = kern ? 0 : 1;
+
+		sk->sk_net_refcnt = hold_net;
 		if (likely(sk->sk_net_refcnt)) {
 			get_net_track(net, &sk->ns_tracker, priority);
 			sock_inuse_add(net, 1);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 10/15] socket: Introduce sock_create_net().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (8 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 09/15] socket: Respect hold_net in sk_alloc() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion Kuniyuki Iwashima
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Let's add a new API to create a kernel socket with netns refcnt held.

We will remove the ugly kernel socket conversion in the next patch.

DEBUG_NET_WARN_ON_ONCE() is to catch a path calling sock_create_net()
from __net_init functions, which leak netns.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/linux/net.h |  2 ++
 net/core/sock.c     |  1 +
 net/socket.c        | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/include/linux/net.h b/include/linux/net.h
index c2a35a102ee2..758c99af6cf4 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -252,6 +252,8 @@ int sock_register(const struct net_proto_family *fam);
 void sock_unregister(int family);
 bool sock_is_registered(int family);
 int sock_create(int family, int type, int proto, struct socket **res);
+int sock_create_net(struct net *net, int family, int type, int proto,
+		    struct socket **res);
 int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
 int sock_create_lite(int family, int type, int proto, struct socket **res);
 struct socket *sock_alloc(void);
diff --git a/net/core/sock.c b/net/core/sock.c
index 11aa6d8c0cdd..9fb57afe6848 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2229,6 +2229,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sk->sk_kern_sock = kern;
 		sock_lock_init(sk);
 
+		DEBUG_NET_WARN_ON_ONCE(hold_net && !net_initialized(net));
 		sk->sk_net_refcnt = hold_net;
 		if (likely(sk->sk_net_refcnt)) {
 			get_net_track(net, &sk->ns_tracker, priority);
diff --git a/net/socket.c b/net/socket.c
index a8796d7f06be..00ece8401b17 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1623,6 +1623,38 @@ int sock_create(int family, int type, int protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create);
 
+/**
+ * sock_create_net - creates a socket for kernel space
+ *
+ * @net: net namespace
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ *
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ *
+ * The socket is for kernel space and should not be exposed to
+ * userspace via a file descriptor nor BPF hooks except for LSM
+ * (see inet_create(), inet_release(), etc).
+ *
+ * The socket holds a reference count of @net so that the caller does
+ * not need to care about @net's lifetime.
+ *
+ * This MUST NOT be called from the __net_init path and @net MUST be
+ * alive as of calling sock_create_net().
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error.
+ */
+
+int sock_create_net(struct net *net, int family, int type, int protocol,
+		    struct socket **res)
+{
+	return __sock_create(net, family, type, protocol, res, true, true);
+}
+EXPORT_SYMBOL(sock_create_net);
+
 /**
  *	sock_create_kern - creates a socket (kernel space)
  *	@net: net namespace
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion.
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (9 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 10/15] socket: Introduce sock_create_net() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13 13:45   ` Wenjia Zhang
                     ` (2 more replies)
  2024-12-13  9:21 ` [PATCH v3 net-next 12/15] socket: Move sock_inuse_add() to sock.c Kuniyuki Iwashima
                   ` (3 subsequent siblings)
  14 siblings, 3 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, Matthieu Baerts,
	Allison Henderson, Steve French, Wenjia Zhang, Jan Karcher,
	Chuck Lever, Jeff Layton

Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference count
the netns of kernel sockets."), TCP kernel socket has caused many UAF.

We have converted such sockets to hold netns refcnt, and we have the
same pattern in cifs, mptcp, rds, smc, and sunrpc.

Let's drop the conversion and use sock_create_net() instead.

The changes for cifs, mptcp, and smc are straightforward.

For rds, we need to move maybe_get_net() before sock_create_net() and
sock->ops->accept().

For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
call sock_create_kern() for others.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Allison Henderson <allison.henderson@oracle.com>
---
v3: Add missing mutex_unlock in rds_tcp_conn_path_connect().
v2: Collect Acked-by from MPTCP and RDS maintainers

Cc: Steve French <sfrench@samba.org>
Cc: Wenjia Zhang <wenjia@linux.ibm.com>
Cc: Jan Karcher <jaka@linux.ibm.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
---
 fs/smb/client/connect.c | 13 ++-----------
 net/mptcp/subflow.c     | 10 +---------
 net/rds/tcp.c           | 14 --------------
 net/rds/tcp_connect.c   | 21 +++++++++++++++------
 net/rds/tcp_listen.c    | 14 ++++++++++++--
 net/smc/af_smc.c        | 21 ++-------------------
 net/sunrpc/svcsock.c    | 12 ++++++------
 net/sunrpc/xprtsock.c   | 12 ++++--------
 8 files changed, 42 insertions(+), 75 deletions(-)

diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index c36c1b4ffe6e..7a67b86c0423 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3130,22 +3130,13 @@ generic_ip_connect(struct TCP_Server_Info *server)
 	if (server->ssocket) {
 		socket = server->ssocket;
 	} else {
-		struct net *net = cifs_net_ns(server);
-		struct sock *sk;
-
-		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
-				      IPPROTO_TCP, &server->ssocket);
+		rc = sock_create_net(cifs_net_ns(server), sfamily, SOCK_STREAM,
+				     IPPROTO_TCP, &server->ssocket);
 		if (rc < 0) {
 			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
 			return rc;
 		}
 
-		sk = server->ssocket->sk;
-		__netns_tracker_free(net, &sk->ns_tracker, false);
-		sk->sk_net_refcnt = 1;
-		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
-		sock_inuse_add(net, 1);
-
 		/* BB other socket options to set KEEPALIVE, NODELAY? */
 		cifs_dbg(FYI, "Socket created\n");
 		socket = server->ssocket;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index fd021cf8286e..e7e8972bdfca 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1755,7 +1755,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 	if (unlikely(!sk->sk_socket))
 		return -EINVAL;
 
-	err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
+	err = sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
 	if (err)
 		return err;
 
@@ -1768,14 +1768,6 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 	/* the newly created socket has to be in the same cgroup as its parent */
 	mptcp_attach_cgroup(sk, sf->sk);
 
-	/* kernel sockets do not by default acquire net ref, but TCP timer
-	 * needs it.
-	 * Update ns_tracker to current stack trace and refcounted tracker.
-	 */
-	__netns_tracker_free(net, &sf->sk->ns_tracker, false);
-	sf->sk->sk_net_refcnt = 1;
-	get_net_track(net, &sf->sk->ns_tracker, GFP_KERNEL);
-	sock_inuse_add(net, 1);
 	err = tcp_set_ulp(sf->sk, "mptcp");
 	if (err)
 		goto err_free;
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 351ac1747224..4509900476f7 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -494,21 +494,7 @@ bool rds_tcp_tune(struct socket *sock)
 
 	tcp_sock_set_nodelay(sock->sk);
 	lock_sock(sk);
-	/* TCP timer functions might access net namespace even after
-	 * a process which created this net namespace terminated.
-	 */
-	if (!sk->sk_net_refcnt) {
-		if (!maybe_get_net(net)) {
-			release_sock(sk);
-			return false;
-		}
-		/* Update ns_tracker to current stack trace and refcounted tracker */
-		__netns_tracker_free(net, &sk->ns_tracker, false);
 
-		sk->sk_net_refcnt = 1;
-		netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL);
-		sock_inuse_add(net, 1);
-	}
 	rtn = net_generic(net, rds_tcp_netid);
 	if (rtn->sndbuf_size > 0) {
 		sk->sk_sndbuf = rtn->sndbuf_size;
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index a0046e99d6df..c9449780f952 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -93,6 +93,7 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
 	struct sockaddr_in6 sin6;
 	struct sockaddr_in sin;
 	struct sockaddr *addr;
+	struct net *net;
 	int addrlen;
 	bool isv6;
 	int ret;
@@ -107,20 +108,28 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
 
 	mutex_lock(&tc->t_conn_path_lock);
 
+	net = rds_conn_net(conn);
+
 	if (rds_conn_path_up(cp)) {
-		mutex_unlock(&tc->t_conn_path_lock);
-		return 0;
+		ret = 0;
+		goto out;
 	}
+
+	if (!maybe_get_net(net)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	if (ipv6_addr_v4mapped(&conn->c_laddr)) {
-		ret = sock_create_kern(rds_conn_net(conn), PF_INET,
-				       SOCK_STREAM, IPPROTO_TCP, &sock);
+		ret = sock_create_net(net, PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
 		isv6 = false;
 	} else {
-		ret = sock_create_kern(rds_conn_net(conn), PF_INET6,
-				       SOCK_STREAM, IPPROTO_TCP, &sock);
+		ret = sock_create_net(net, PF_INET6, SOCK_STREAM, IPPROTO_TCP, &sock);
 		isv6 = true;
 	}
 
+	put_net(net);
+
 	if (ret < 0)
 		goto out;
 
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 69aaf03ab93e..440ac9057148 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -101,6 +101,7 @@ int rds_tcp_accept_one(struct socket *sock)
 	struct rds_connection *conn;
 	int ret;
 	struct inet_sock *inet;
+	struct net *net;
 	struct rds_tcp_connection *rs_tcp = NULL;
 	int conn_state;
 	struct rds_conn_path *cp;
@@ -108,7 +109,7 @@ int rds_tcp_accept_one(struct socket *sock)
 	struct proto_accept_arg arg = {
 		.flags = O_NONBLOCK,
 		.kern = true,
-		.hold_net = false,
+		.hold_net = true,
 	};
 #if !IS_ENABLED(CONFIG_IPV6)
 	struct in6_addr saddr, daddr;
@@ -118,13 +119,22 @@ int rds_tcp_accept_one(struct socket *sock)
 	if (!sock) /* module unload or netns delete in progress */
 		return -ENETUNREACH;
 
+	net = sock_net(sock->sk);
+
+	if (!maybe_get_net(net))
+		return -EINVAL;
+
 	ret = sock_create_lite(sock->sk->sk_family,
 			       sock->sk->sk_type, sock->sk->sk_protocol,
 			       &new_sock);
-	if (ret)
+	if (ret) {
+		put_net(net);
 		goto out;
+	}
 
 	ret = sock->ops->accept(sock, new_sock, &arg);
+	put_net(net);
+
 	if (ret < 0)
 		goto out;
 
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 6e93f188a908..7b0de80b3aca 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -3310,25 +3310,8 @@ static const struct proto_ops smc_sock_ops = {
 
 int smc_create_clcsk(struct net *net, struct sock *sk, int family)
 {
-	struct smc_sock *smc = smc_sk(sk);
-	int rc;
-
-	rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
-			      &smc->clcsock);
-	if (rc)
-		return rc;
-
-	/* smc_clcsock_release() does not wait smc->clcsock->sk's
-	 * destruction;  its sk_state might not be TCP_CLOSE after
-	 * smc->sk is close()d, and TCP timers can be fired later,
-	 * which need net ref.
-	 */
-	sk = smc->clcsock->sk;
-	__netns_tracker_free(net, &sk->ns_tracker, false);
-	sk->sk_net_refcnt = 1;
-	get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
-	sock_inuse_add(net, 1);
-	return 0;
+	return sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP,
+			       &smc_sk(sk)->clcsock);
 }
 
 static int __smc_create(struct net *net, struct socket *sock, int protocol,
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 9583bad3d150..cde5765f6f81 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1526,7 +1526,10 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
 		return ERR_PTR(-EINVAL);
 	}
 
-	error = sock_create_kern(net, family, type, protocol, &sock);
+	if (protocol == IPPROTO_TCP)
+		error = sock_create_net(net, family, type, protocol, &sock);
+	else
+		error = sock_create_kern(net, family, type, protocol, &sock);
 	if (error < 0)
 		return ERR_PTR(error);
 
@@ -1551,11 +1554,8 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
 	newlen = error;
 
 	if (protocol == IPPROTO_TCP) {
-		__netns_tracker_free(net, &sock->sk->ns_tracker, false);
-		sock->sk->sk_net_refcnt = 1;
-		get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL);
-		sock_inuse_add(net, 1);
-		if ((error = kernel_listen(sock, 64)) < 0)
+		error = kernel_listen(sock, 64);
+		if (error < 0)
 			goto bummer;
 	}
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index feb1768e8a57..f3e139c30442 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1924,7 +1924,10 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
 	struct socket *sock;
 	int err;
 
-	err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
+	if (protocol == IPPROTO_TCP)
+		err = sock_create_net(xprt->xprt_net, family, type, protocol, &sock);
+	else
+		err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
 	if (err < 0) {
 		dprintk("RPC:       can't create %d transport socket (%d).\n",
 				protocol, -err);
@@ -1941,13 +1944,6 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
 		goto out;
 	}
 
-	if (protocol == IPPROTO_TCP) {
-		__netns_tracker_free(xprt->xprt_net, &sock->sk->ns_tracker, false);
-		sock->sk->sk_net_refcnt = 1;
-		get_net_track(xprt->xprt_net, &sock->sk->ns_tracker, GFP_KERNEL);
-		sock_inuse_add(xprt->xprt_net, 1);
-	}
-
 	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
 	if (IS_ERR(filp))
 		return ERR_CAST(filp);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 12/15] socket: Move sock_inuse_add() to sock.c.
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (10 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 13/15] socket: Use sock_create_net() instead of sock_create() Kuniyuki Iwashima
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

sock_inuse_add() is used only in net/core/sock.c.

Let's move sock_inuse_add() there.

This is mostly revert of commit d477eb900484 ("net: make
sock_inuse_add() available").

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/sock.h |  9 ---------
 net/core/sock.c    | 10 ++++++++++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 8de415fefe3b..845efc3c5568 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1433,11 +1433,6 @@ static inline void sock_prot_inuse_add(const struct net *net,
 	this_cpu_add(net->core.prot_inuse->val[prot->inuse_idx], val);
 }
 
-static inline void sock_inuse_add(const struct net *net, int val)
-{
-	this_cpu_add(net->core.prot_inuse->all, val);
-}
-
 int sock_prot_inuse_get(struct net *net, struct proto *proto);
 int sock_inuse_get(struct net *net);
 #else
@@ -1445,10 +1440,6 @@ static inline void sock_prot_inuse_add(const struct net *net,
 				       const struct proto *prot, int val)
 {
 }
-
-static inline void sock_inuse_add(const struct net *net, int val)
-{
-}
 #endif
 
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 9fb57afe6848..78c812205e35 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -153,6 +153,7 @@
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
 
+static void sock_inuse_add(const struct net *net, int val);
 static void sock_def_write_space_wfree(struct sock *sk);
 static void sock_def_write_space(struct sock *sk);
 
@@ -3885,6 +3886,11 @@ int sock_prot_inuse_get(struct net *net, struct proto *prot)
 }
 EXPORT_SYMBOL_GPL(sock_prot_inuse_get);
 
+static void sock_inuse_add(const struct net *net, int val)
+{
+	this_cpu_add(net->core.prot_inuse->all, val);
+}
+
 int sock_inuse_get(struct net *net)
 {
 	int cpu, res = 0;
@@ -3944,6 +3950,10 @@ static void release_proto_idx(struct proto *prot)
 		clear_bit(prot->inuse_idx, proto_inuse_idx);
 }
 #else
+static void sock_inuse_add(const struct net *net, int val)
+{
+}
+
 static inline int assign_proto_idx(struct proto *prot)
 {
 	return 0;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 13/15] socket: Use sock_create_net() instead of sock_create().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (11 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 12/15] socket: Move sock_inuse_add() to sock.c Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 14/15] socket: Rename sock_create() to sock_create_user() Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref() Kuniyuki Iwashima
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

sock_create() is a bad name, and no one really cares about what it's for.

In fact, except for only one user, sctp_do_peeloff(), all sockets created
by drivers and fs are not tied to userspace processes nor exposed via file
descriptors.

Let's use sock_create_net() for such in-kernel use cases.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 drivers/infiniband/hw/erdma/erdma_cm.c    | 6 ++++--
 drivers/infiniband/sw/siw/siw_cm.c        | 6 ++++--
 drivers/isdn/mISDN/l1oip_core.c           | 3 ++-
 drivers/nvme/host/tcp.c                   | 5 +++--
 drivers/nvme/target/tcp.c                 | 5 +++--
 drivers/target/iscsi/iscsi_target_login.c | 7 ++++---
 drivers/xen/pvcalls-back.c                | 6 ++++--
 fs/ocfs2/cluster/tcp.c                    | 8 +++++---
 fs/smb/server/transport_tcp.c             | 7 ++++---
 9 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/hw/erdma/erdma_cm.c b/drivers/infiniband/hw/erdma/erdma_cm.c
index 771059a8eb7d..5014237127cb 100644
--- a/drivers/infiniband/hw/erdma/erdma_cm.c
+++ b/drivers/infiniband/hw/erdma/erdma_cm.c
@@ -1023,7 +1023,8 @@ int erdma_connect(struct iw_cm_id *id, struct iw_cm_conn_param *params)
 		return -ENOENT;
 	erdma_qp_get(qp);
 
-	ret = sock_create(AF_INET, SOCK_STREAM, IPPROTO_TCP, &s);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      AF_INET, SOCK_STREAM, IPPROTO_TCP, &s);
 	if (ret < 0)
 		goto error_put_qp;
 
@@ -1299,7 +1300,8 @@ int erdma_create_listen(struct iw_cm_id *id, int backlog)
 	if (addr_family != AF_INET)
 		return -EAFNOSUPPORT;
 
-	ret = sock_create(addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
 	if (ret < 0)
 		return ret;
 
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 86323918a570..4c8f0e3ec0ce 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -1391,7 +1391,8 @@ int siw_connect(struct iw_cm_id *id, struct iw_cm_conn_param *params)
 	siw_dbg_qp(qp, "pd_len %d, laddr %pISp, raddr %pISp\n", pd_len, laddr,
 		   raddr);
 
-	rv = sock_create(v4 ? AF_INET : AF_INET6, SOCK_STREAM, IPPROTO_TCP, &s);
+	rv = sock_create_net(current->nsproxy->net_ns,
+			     v4 ? AF_INET : AF_INET6, SOCK_STREAM, IPPROTO_TCP, &s);
 	if (rv < 0)
 		goto error;
 
@@ -1766,7 +1767,8 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
 	if (addr_family != AF_INET && addr_family != AF_INET6)
 		return -EAFNOSUPPORT;
 
-	rv = sock_create(addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
+	rv = sock_create_net(current->nsproxy->net_ns,
+			     addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
 	if (rv < 0)
 		return rv;
 
diff --git a/drivers/isdn/mISDN/l1oip_core.c b/drivers/isdn/mISDN/l1oip_core.c
index a5ad88a960d0..04733bcc8d07 100644
--- a/drivers/isdn/mISDN/l1oip_core.c
+++ b/drivers/isdn/mISDN/l1oip_core.c
@@ -659,7 +659,8 @@ l1oip_socket_thread(void *data)
 	allow_signal(SIGTERM);
 
 	/* create socket */
-	if (sock_create(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &socket)) {
+	if (sock_create_net(current->nsproxy->net_ns,
+			    PF_INET, SOCK_DGRAM, IPPROTO_UDP, &socket)) {
 		printk(KERN_ERR "%s: Failed to create socket.\n", __func__);
 		ret = -EIO;
 		goto fail;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 28c76a3e1bd2..32af5fe66882 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1682,8 +1682,9 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 		queue->cmnd_capsule_len = sizeof(struct nvme_command) +
 						NVME_TCP_ADMIN_CCSZ;
 
-	ret = sock_create(ctrl->addr.ss_family, SOCK_STREAM,
-			IPPROTO_TCP, &queue->sock);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      ctrl->addr.ss_family, SOCK_STREAM,
+			      IPPROTO_TCP, &queue->sock);
 	if (ret) {
 		dev_err(nctrl->device,
 			"failed to create socket: %d\n", ret);
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 7c51c2a8c109..dff8c56d1783 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -2042,8 +2042,9 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
 	if (port->nport->inline_data_size < 0)
 		port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
 
-	ret = sock_create(port->addr.ss_family, SOCK_STREAM,
-				IPPROTO_TCP, &port->sock);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      port->addr.ss_family, SOCK_STREAM,
+			      IPPROTO_TCP, &port->sock);
 	if (ret) {
 		pr_err("failed to create a socket\n");
 		goto err_port;
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 90b870f234f0..b7135b6d96d7 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -837,10 +837,11 @@ int iscsit_setup_np(
 		return -EINVAL;
 	}
 
-	ret = sock_create(sockaddr->ss_family, np->np_sock_type,
-			np->np_ip_proto, &sock);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      sockaddr->ss_family, np->np_sock_type,
+			      np->np_ip_proto, &sock);
 	if (ret < 0) {
-		pr_err("sock_create() failed.\n");
+		pr_err("sock_create_net() failed.\n");
 		return ret;
 	}
 	np->np_socket = sock;
diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index f0f8b4862983..83b6bfff5cd4 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -406,7 +406,8 @@ static int pvcalls_back_connect(struct xenbus_device *dev,
 	    sa->sa_family != AF_INET)
 		goto out;
 
-	ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      AF_INET, SOCK_STREAM, 0, &sock);
 	if (ret < 0)
 		goto out;
 	ret = inet_stream_connect(sock, sa, req->u.connect.len, 0);
@@ -647,7 +648,8 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
 		goto out;
 	}
 
-	ret = sock_create(AF_INET, SOCK_STREAM, 0, &map->sock);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      AF_INET, SOCK_STREAM, 0, &map->sock);
 	if (ret < 0)
 		goto out;
 
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 6ef03a02d19b..9e0571ec3ca1 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1558,7 +1558,7 @@ static void o2net_start_connect(struct work_struct *work)
 	unsigned int nofs_flag;
 
 	/*
-	 * sock_create allocates the sock with GFP_KERNEL. We must
+	 * sock_create_net() allocates the sock with GFP_KERNEL. We must
 	 * prevent the filesystem from being reentered by memory reclaim.
 	 */
 	nofs_flag = memalloc_nofs_save();
@@ -1600,7 +1600,8 @@ static void o2net_start_connect(struct work_struct *work)
 		goto out;
 	}
 
-	ret = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
 	if (ret < 0) {
 		mlog(0, "can't create socket: %d\n", ret);
 		goto out;
@@ -1986,7 +1987,8 @@ static int o2net_open_listening_sock(__be32 addr, __be16 port)
 		.sin_port = port,
 	};
 
-	ret = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
+	ret = sock_create_net(current->nsproxy->net_ns,
+			      PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
 	if (ret < 0) {
 		printk(KERN_ERR "o2net: Error %d while creating socket\n", ret);
 		goto out;
diff --git a/fs/smb/server/transport_tcp.c b/fs/smb/server/transport_tcp.c
index 0d9007285e30..ada40b0502a1 100644
--- a/fs/smb/server/transport_tcp.c
+++ b/fs/smb/server/transport_tcp.c
@@ -423,18 +423,19 @@ static void tcp_destroy_socket(struct socket *ksmbd_socket)
  */
 static int create_socket(struct interface *iface)
 {
+	struct net *net = current->nsproxy->net_ns;
 	int ret;
 	struct sockaddr_in6 sin6;
 	struct sockaddr_in sin;
 	struct socket *ksmbd_socket;
 	bool ipv4 = false;
 
-	ret = sock_create(PF_INET6, SOCK_STREAM, IPPROTO_TCP, &ksmbd_socket);
+	ret = sock_create_net(net, PF_INET6, SOCK_STREAM, IPPROTO_TCP, &ksmbd_socket);
 	if (ret) {
 		if (ret != -EAFNOSUPPORT)
 			pr_err("Can't create socket for ipv6, fallback to ipv4: %d\n", ret);
-		ret = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP,
-				  &ksmbd_socket);
+		ret = sock_create_net(net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
+				      &ksmbd_socket);
 		if (ret) {
 			pr_err("Can't create socket for ipv4: %d\n", ret);
 			goto out_clear;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 14/15] socket: Rename sock_create() to sock_create_user().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (12 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 13/15] socket: Use sock_create_net() instead of sock_create() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13  9:21 ` [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref() Kuniyuki Iwashima
  14 siblings, 0 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

sock_create() is a bad name and was used in incorrect places.

Let's rename it to sock_create_user() and add fat documentation
to catch future developers' attention.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/linux/net.h |  2 +-
 net/sctp/socket.c   |  2 +-
 net/socket.c        | 33 +++++++++++++++++++++------------
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 758c99af6cf4..1ba4abb18863 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -251,7 +251,7 @@ int sock_wake_async(struct socket_wq *sk_wq, int how, int band);
 int sock_register(const struct net_proto_family *fam);
 void sock_unregister(int family);
 bool sock_is_registered(int family);
-int sock_create(int family, int type, int proto, struct socket **res);
+int sock_create_user(int family, int type, int proto, struct socket **res);
 int sock_create_net(struct net *net, int family, int type, int proto,
 		    struct socket **res);
 int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a1add0b7fd9f..e49904f08559 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5647,7 +5647,7 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp)
 		return -EINVAL;
 
 	/* Create a new socket.  */
-	err = sock_create(sk->sk_family, SOCK_SEQPACKET, IPPROTO_SCTP, &sock);
+	err = sock_create_user(sk->sk_family, SOCK_SEQPACKET, IPPROTO_SCTP, &sock);
 	if (err < 0)
 		return err;
 
diff --git a/net/socket.c b/net/socket.c
index 00ece8401b17..992de3dd94b8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1606,22 +1606,31 @@ static int __sock_create(struct net *net, int family, int type, int protocol,
 }
 
 /**
- *	sock_create - creates a socket
- *	@family: protocol family (AF_INET, ...)
- *	@type: communication type (SOCK_STREAM, ...)
- *	@protocol: protocol (0, ...)
- *	@res: new socket
+ * sock_create_user - creates a socket for userspace
  *
- *	A wrapper around __sock_create().
- *	Returns 0 or an error. This function internally uses GFP_KERNEL.
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ *
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ *
+ * The socket is for userspace and should be exposed via a file
+ * descriptor and BPF hooks (see inet_create(), inet_release(), etc).
+ *
+ * The number of sockets is available in the first line of
+ * /proc/net/sockstat.
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error.
  */
 
-int sock_create(int family, int type, int protocol, struct socket **res)
+int sock_create_user(int family, int type, int protocol, struct socket **res)
 {
 	return __sock_create(current->nsproxy->net_ns, family, type, protocol,
 			     res, false, true);
 }
-EXPORT_SYMBOL(sock_create);
+EXPORT_SYMBOL(sock_create_user);
 
 /**
  * sock_create_net - creates a socket for kernel space
@@ -1689,7 +1698,7 @@ static struct socket *__sys_socket_create(int family, int type, int protocol)
 		return ERR_PTR(-EINVAL);
 	type &= SOCK_TYPE_MASK;
 
-	retval = sock_create(family, type, protocol, &sock);
+	retval = sock_create_user(family, type, protocol, &sock);
 	if (retval < 0)
 		return ERR_PTR(retval);
 
@@ -1799,11 +1808,11 @@ int __sys_socketpair(int family, int type, int protocol, int __user *usockvec)
 	 * supports the socketpair call.
 	 */
 
-	err = sock_create(family, type, protocol, &sock1);
+	err = sock_create_user(family, type, protocol, &sock1);
 	if (unlikely(err < 0))
 		goto out;
 
-	err = sock_create(family, type, protocol, &sock2);
+	err = sock_create_user(family, type, protocol, &sock2);
 	if (unlikely(err < 0)) {
 		sock_release(sock1);
 		goto out;
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref().
  2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
                   ` (13 preceding siblings ...)
  2024-12-13  9:21 ` [PATCH v3 net-next 14/15] socket: Rename sock_create() to sock_create_user() Kuniyuki Iwashima
@ 2024-12-13  9:21 ` Kuniyuki Iwashima
  2024-12-13 13:46   ` Wenjia Zhang
  2024-12-17 10:32   ` Paolo Abeni
  14 siblings, 2 replies; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13  9:21 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

sock_create_kern() is quite a bad name, and the non-netdev folks tend
to use it without taking care of the netns lifetime.

Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference count
the netns of kernel sockets."), TCP sockets created by sock_create_kern()
have caused many use-after-free.

Let's rename sock_create_kern() to sock_create_net_noref() and add fat
documentation so that we no longer introduce the same issue in the future.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 drivers/block/drbd/drbd_receiver.c            | 12 +++----
 drivers/infiniband/sw/rxe/rxe_qp.c            |  2 +-
 drivers/soc/qcom/qmi_interface.c              |  4 +--
 fs/afs/rxrpc.c                                |  3 +-
 fs/dlm/lowcomms.c                             |  8 ++---
 include/linux/net.h                           |  3 +-
 net/9p/trans_fd.c                             |  8 ++---
 net/bluetooth/rfcomm/core.c                   |  3 +-
 net/ceph/messenger.c                          |  6 ++--
 net/handshake/handshake-test.c                |  3 +-
 net/ipv4/af_inet.c                            |  3 +-
 net/ipv4/udp_tunnel_core.c                    |  2 +-
 net/ipv6/ip6_udp_tunnel.c                     |  4 +--
 net/l2tp/l2tp_core.c                          |  8 ++---
 net/mctp/test/route-test.c                    |  6 ++--
 net/mptcp/pm_netlink.c                        |  4 +--
 net/mptcp/subflow.c                           |  2 +-
 net/netfilter/ipvs/ip_vs_sync.c               |  8 ++---
 net/qrtr/ns.c                                 |  6 ++--
 net/rds/tcp_listen.c                          |  4 +--
 net/rxrpc/rxperf.c                            |  4 +--
 net/sctp/socket.c                             |  2 +-
 net/smc/smc_inet.c                            |  2 +-
 net/socket.c                                  | 35 +++++++++++++------
 net/sunrpc/clnt.c                             |  4 +--
 net/sunrpc/svcsock.c                          |  2 +-
 net/sunrpc/xprtsock.c                         |  6 ++--
 net/tipc/topsrv.c                             |  4 +--
 net/wireless/nl80211.c                        |  4 +--
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  4 +--
 30 files changed, 92 insertions(+), 74 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 0c9f54197768..39be44e5db8a 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -618,9 +618,9 @@ static struct socket *drbd_try_connect(struct drbd_connection *connection)
 	peer_addr_len = min_t(int, connection->peer_addr_len, sizeof(src_in6));
 	memcpy(&peer_in6, &connection->peer_addr, peer_addr_len);
 
-	what = "sock_create_kern";
-	err = sock_create_kern(&init_net, ((struct sockaddr *)&src_in6)->sa_family,
-			       SOCK_STREAM, IPPROTO_TCP, &sock);
+	what = "sock_create_net_noref";
+	err = sock_create_net_noref(&init_net, ((struct sockaddr *)&src_in6)->sa_family,
+				    SOCK_STREAM, IPPROTO_TCP, &sock);
 	if (err < 0) {
 		sock = NULL;
 		goto out;
@@ -713,9 +713,9 @@ static int prepare_listen_socket(struct drbd_connection *connection, struct acce
 	my_addr_len = min_t(int, connection->my_addr_len, sizeof(struct sockaddr_in6));
 	memcpy(&my_addr, &connection->my_addr, my_addr_len);
 
-	what = "sock_create_kern";
-	err = sock_create_kern(&init_net, ((struct sockaddr *)&my_addr)->sa_family,
-			       SOCK_STREAM, IPPROTO_TCP, &s_listen);
+	what = "sock_create_net_noref";
+	err = sock_create_net_noref(&init_net, ((struct sockaddr *)&my_addr)->sa_family,
+				    SOCK_STREAM, IPPROTO_TCP, &s_listen);
 	if (err) {
 		s_listen = NULL;
 		goto out;
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 91d329e90308..250673cf6cbf 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -241,7 +241,7 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
 	/* if we don't finish qp create make sure queue is valid */
 	skb_queue_head_init(&qp->req_pkts);
 
-	err = sock_create_kern(&init_net, AF_INET, SOCK_DGRAM, 0, &qp->sk);
+	err = sock_create_net_noref(&init_net, AF_INET, SOCK_DGRAM, 0, &qp->sk);
 	if (err < 0)
 		return err;
 	qp->sk->sk->sk_user_data = (void *)(uintptr_t)qp->elem.index;
diff --git a/drivers/soc/qcom/qmi_interface.c b/drivers/soc/qcom/qmi_interface.c
index bc6d6379d8b1..eb5a64f6fd6f 100644
--- a/drivers/soc/qcom/qmi_interface.c
+++ b/drivers/soc/qcom/qmi_interface.c
@@ -588,8 +588,8 @@ static struct socket *qmi_sock_create(struct qmi_handle *qmi,
 	struct socket *sock;
 	int ret;
 
-	ret = sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
-			       PF_QIPCRTR, &sock);
+	ret = sock_create_net_noref(&init_net, AF_QIPCRTR, SOCK_DGRAM,
+				    PF_QIPCRTR, &sock);
 	if (ret < 0)
 		return ERR_PTR(ret);
 
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 9f2a3bb56ec6..7443fe801894 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -44,7 +44,8 @@ int afs_open_socket(struct afs_net *net)
 
 	_enter("");
 
-	ret = sock_create_kern(net->net, AF_RXRPC, SOCK_DGRAM, PF_INET6, &socket);
+	ret = sock_create_net_noref(net->net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
+				    &socket);
 	if (ret < 0)
 		goto error_1;
 
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index df40c3fd1070..b0450aff4cd4 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1579,8 +1579,8 @@ static int dlm_connect(struct connection *con)
 	}
 
 	/* Create a socket to communicate with */
-	result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
-				  SOCK_STREAM, dlm_proto_ops->proto, &sock);
+	result = sock_create_net_noref(&init_net, dlm_local_addr[0].ss_family,
+				       SOCK_STREAM, dlm_proto_ops->proto, &sock);
 	if (result < 0)
 		return result;
 
@@ -1760,8 +1760,8 @@ static int dlm_listen_for_all(void)
 	if (result < 0)
 		return result;
 
-	result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
-				  SOCK_STREAM, dlm_proto_ops->proto, &sock);
+	result = sock_create_net_noref(&init_net, dlm_local_addr[0].ss_family,
+				       SOCK_STREAM, dlm_proto_ops->proto, &sock);
 	if (result < 0) {
 		log_print("Can't create comms socket: %d", result);
 		return result;
diff --git a/include/linux/net.h b/include/linux/net.h
index 1ba4abb18863..582faf2fdd08 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -254,7 +254,8 @@ bool sock_is_registered(int family);
 int sock_create_user(int family, int type, int proto, struct socket **res);
 int sock_create_net(struct net *net, int family, int type, int proto,
 		    struct socket **res);
-int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
+int sock_create_net_noref(struct net *net, int family, int type, int proto,
+			  struct socket **res);
 int sock_create_lite(int family, int type, int proto, struct socket **res);
 struct socket *sock_alloc(void);
 void sock_release(struct socket *sock);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 83f81da24727..ae014999040f 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1011,8 +1011,8 @@ p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args)
 	sin_server.sin_family = AF_INET;
 	sin_server.sin_addr.s_addr = in_aton(addr);
 	sin_server.sin_port = htons(opts.port);
-	err = sock_create_kern(current->nsproxy->net_ns, PF_INET,
-			       SOCK_STREAM, IPPROTO_TCP, &csocket);
+	err = sock_create_net_noref(current->nsproxy->net_ns, PF_INET,
+				    SOCK_STREAM, IPPROTO_TCP, &csocket);
 	if (err) {
 		pr_err("%s (%d): problem creating socket\n",
 		       __func__, task_pid_nr(current));
@@ -1062,8 +1062,8 @@ p9_fd_create_unix(struct p9_client *client, const char *addr, char *args)
 
 	sun_server.sun_family = PF_UNIX;
 	strcpy(sun_server.sun_path, addr);
-	err = sock_create_kern(current->nsproxy->net_ns, PF_UNIX,
-			       SOCK_STREAM, 0, &csocket);
+	err = sock_create_net_noref(current->nsproxy->net_ns, PF_UNIX,
+				    SOCK_STREAM, 0, &csocket);
 	if (err < 0) {
 		pr_err("%s (%d): problem creating socket\n",
 		       __func__, task_pid_nr(current));
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 4c56ca5a216c..6204514667b6 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -200,7 +200,8 @@ static int rfcomm_l2sock_create(struct socket **sock)
 
 	BT_DBG("");
 
-	err = sock_create_kern(&init_net, PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP, sock);
+	err = sock_create_net_noref(&init_net, PF_BLUETOOTH, SOCK_SEQPACKET,
+				    BTPROTO_L2CAP, sock);
 	if (!err) {
 		struct sock *sk = (*sock)->sk;
 		sk->sk_data_ready   = rfcomm_l2data_ready;
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index d1b5705dc0c6..cb6a1532ff9f 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -442,10 +442,10 @@ int ceph_tcp_connect(struct ceph_connection *con)
 	     ceph_pr_addr(&con->peer_addr));
 	BUG_ON(con->sock);
 
-	/* sock_create_kern() allocates with GFP_KERNEL */
+	/* sock_create_net_noref() allocates with GFP_KERNEL */
 	noio_flag = memalloc_noio_save();
-	ret = sock_create_kern(read_pnet(&con->msgr->net), ss.ss_family,
-			       SOCK_STREAM, IPPROTO_TCP, &sock);
+	ret = sock_create_net_noref(read_pnet(&con->msgr->net), ss.ss_family,
+				    SOCK_STREAM, IPPROTO_TCP, &sock);
 	memalloc_noio_restore(noio_flag);
 	if (ret)
 		return ret;
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 4f300504f3e5..54793f9e4d30 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -145,7 +145,8 @@ static void handshake_req_alloc_case(struct kunit *test)
 
 static int handshake_sock_create(struct socket **sock)
 {
-	return sock_create_kern(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, sock);
+	return sock_create_net_noref(&init_net, PF_INET, SOCK_STREAM,
+				     IPPROTO_TCP, sock);
 }
 
 static void handshake_req_submit_test1(struct kunit *test)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d22bb0d3ddc1..03c3854f382a 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1644,8 +1644,9 @@ int inet_ctl_sock_create(struct sock **sk, unsigned short family,
 			 struct net *net)
 {
 	struct socket *sock;
-	int rc = sock_create_kern(net, family, type, protocol, &sock);
+	int rc;
 
+	rc = sock_create_net_noref(net, family, type, protocol, &sock);
 	if (rc == 0) {
 		*sk = sock->sk;
 		(*sk)->sk_allocation = GFP_ATOMIC;
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index 619a53eb672d..e8e079ebca36 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -15,7 +15,7 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
 	struct socket *sock = NULL;
 	struct sockaddr_in udp_addr;
 
-	err = sock_create_kern(net, AF_INET, SOCK_DGRAM, 0, &sock);
+	err = sock_create_net_noref(net, AF_INET, SOCK_DGRAM, 0, &sock);
 	if (err < 0)
 		goto error;
 
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index c99053189ea8..65d859c7d9c4 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -18,10 +18,10 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
 		     struct socket **sockp)
 {
 	struct sockaddr_in6 udp6_addr = {};
-	int err;
 	struct socket *sock = NULL;
+	int err;
 
-	err = sock_create_kern(net, AF_INET6, SOCK_DGRAM, 0, &sock);
+	err = sock_create_net_noref(net, AF_INET6, SOCK_DGRAM, 0, &sock);
 	if (err < 0)
 		goto error;
 
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 369a2f2e459c..e43534185f45 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1494,8 +1494,8 @@ static int l2tp_tunnel_sock_create(struct net *net,
 		if (cfg->local_ip6 && cfg->peer_ip6) {
 			struct sockaddr_l2tpip6 ip6_addr = {0};
 
-			err = sock_create_kern(net, AF_INET6, SOCK_DGRAM,
-					       IPPROTO_L2TP, &sock);
+			err = sock_create_net_noref(net, AF_INET6, SOCK_DGRAM,
+						    IPPROTO_L2TP, &sock);
 			if (err < 0)
 				goto out;
 
@@ -1522,8 +1522,8 @@ static int l2tp_tunnel_sock_create(struct net *net,
 		{
 			struct sockaddr_l2tpip ip_addr = {0};
 
-			err = sock_create_kern(net, AF_INET, SOCK_DGRAM,
-					       IPPROTO_L2TP, &sock);
+			err = sock_create_net_noref(net, AF_INET, SOCK_DGRAM,
+						    IPPROTO_L2TP, &sock);
 			if (err < 0)
 				goto out;
 
diff --git a/net/mctp/test/route-test.c b/net/mctp/test/route-test.c
index 8551dab1d1e6..f1b2cf0c8b48 100644
--- a/net/mctp/test/route-test.c
+++ b/net/mctp/test/route-test.c
@@ -310,7 +310,7 @@ static void __mctp_route_test_init(struct kunit *test,
 	rt = mctp_test_create_route(&init_net, dev->mdev, 8, 68);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
 
-	rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
+	rc = sock_create_net_noref(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
 	KUNIT_ASSERT_EQ(test, rc, 0);
 
 	addr.smctp_family = AF_MCTP;
@@ -568,7 +568,7 @@ static void mctp_test_route_input_sk_keys(struct kunit *test)
 	rt = mctp_test_create_route(&init_net, dev->mdev, 8, 68);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
 
-	rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
+	rc = sock_create_net_noref(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
 	KUNIT_ASSERT_EQ(test, rc, 0);
 
 	msk = container_of(sock->sk, struct mctp_sock, sk);
@@ -994,7 +994,7 @@ static void mctp_test_route_output_key_create(struct kunit *test)
 	rt = mctp_test_create_route(&init_net, dev->mdev, dst, 68);
 	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
 
-	rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
+	rc = sock_create_net_noref(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
 	KUNIT_ASSERT_EQ(test, rc, 0);
 
 	dev->mdev->addrs = kmalloc(sizeof(u8), GFP_KERNEL);
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 7a0f7998376a..3dc40a364fb2 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1083,8 +1083,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
 	int backlog = 1024;
 	int err;
 
-	err = sock_create_kern(sock_net(sk), entry->addr.family,
-			       SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
+	err = sock_create_net_noref(sock_net(sk), entry->addr.family,
+				    SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
 	if (err)
 		return err;
 
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index e7e8972bdfca..7162873a232a 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1953,7 +1953,7 @@ static int subflow_ulp_init(struct sock *sk)
 	int err = 0;
 
 	/* disallow attaching ULP to a socket unless it has been
-	 * created with sock_create_kern()
+	 * created with sock_create_net()
 	 */
 	if (!sk->sk_kern_sock) {
 		err = -EOPNOTSUPP;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 3402675bf521..e97cd30f196a 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1470,8 +1470,8 @@ static int make_send_sock(struct netns_ipvs *ipvs, int id,
 	int result, salen;
 
 	/* First create a socket */
-	result = sock_create_kern(ipvs->net, ipvs->mcfg.mcast_af, SOCK_DGRAM,
-				  IPPROTO_UDP, &sock);
+	result = sock_create_net_noref(ipvs->net, ipvs->mcfg.mcast_af, SOCK_DGRAM,
+				       IPPROTO_UDP, &sock);
 	if (result < 0) {
 		pr_err("Error during creation of socket; terminating\n");
 		goto error;
@@ -1527,8 +1527,8 @@ static int make_receive_sock(struct netns_ipvs *ipvs, int id,
 	int result, salen;
 
 	/* First create a socket */
-	result = sock_create_kern(ipvs->net, ipvs->bcfg.mcast_af, SOCK_DGRAM,
-				  IPPROTO_UDP, &sock);
+	result = sock_create_net_noref(ipvs->net, ipvs->bcfg.mcast_af, SOCK_DGRAM,
+				       IPPROTO_UDP, &sock);
 	if (result < 0) {
 		pr_err("Error during creation of socket; terminating\n");
 		goto error;
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index 3de9350cbf30..2f8f347150c0 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -692,8 +692,8 @@ int qrtr_ns_init(void)
 	INIT_LIST_HEAD(&qrtr_ns.lookups);
 	INIT_WORK(&qrtr_ns.work, qrtr_ns_worker);
 
-	ret = sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
-			       PF_QIPCRTR, &qrtr_ns.sock);
+	ret = sock_create_net_noref(&init_net, AF_QIPCRTR, SOCK_DGRAM,
+				    PF_QIPCRTR, &qrtr_ns.sock);
 	if (ret < 0)
 		return ret;
 
@@ -735,7 +735,7 @@ int qrtr_ns_init(void)
 	 *  qrtr module is inserted successfully.
 	 *
 	 * However, the reference count is increased twice in
-	 * sock_create_kern(): one is to increase the reference count of owner
+	 * sock_create_net_noref(): one is to increase the reference count of owner
 	 * of qrtr socket's proto_ops struct; another is to increment the
 	 * reference count of owner of qrtr proto struct. Therefore, we must
 	 * decrement the module reference count twice to ensure that it keeps
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 440ac9057148..202afd77b532 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -289,8 +289,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6)
 	int addr_len;
 	int ret;
 
-	ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
-			       IPPROTO_TCP, &sock);
+	ret = sock_create_net_noref(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
+				    IPPROTO_TCP, &sock);
 	if (ret < 0) {
 		rdsdebug("could not create %s listener socket: %d\n",
 			 isv6 ? "IPv6" : "IPv4", ret);
diff --git a/net/rxrpc/rxperf.c b/net/rxrpc/rxperf.c
index 7ef93407be83..1c784d449a6b 100644
--- a/net/rxrpc/rxperf.c
+++ b/net/rxrpc/rxperf.c
@@ -182,8 +182,8 @@ static int rxperf_open_socket(void)
 	struct socket *socket;
 	int ret;
 
-	ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
-			       &socket);
+	ret = sock_create_net_noref(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
+				    &socket);
 	if (ret < 0)
 		goto error_1;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index e49904f08559..fb8ed0290a4a 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1328,7 +1328,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk, struct sockaddr *kaddrs,
 		return err;
 
 	/* in-kernel sockets don't generally have a file allocated to them
-	 * if all they do is call sock_create_kern().
+	 * if all they do is call sock_create_net_noref().
 	 */
 	if (sk->sk_socket->file)
 		flags = sk->sk_socket->file->f_flags;
diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c
index a944e7dcb8b9..dbd76070e05e 100644
--- a/net/smc/smc_inet.c
+++ b/net/smc/smc_inet.c
@@ -111,7 +111,7 @@ static struct inet_protosw smc_inet6_protosw = {
 static unsigned int smc_sync_mss(struct sock *sk, u32 pmtu)
 {
 	/* No need pass it through to clcsock, mss can always be set by
-	 * sock_create_kern or smc_setsockopt.
+	 * sock_create_net or smc_setsockopt.
 	 */
 	return 0;
 }
diff --git a/net/socket.c b/net/socket.c
index 992de3dd94b8..8f45d17e52c3 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1665,23 +1665,36 @@ int sock_create_net(struct net *net, int family, int type, int protocol,
 EXPORT_SYMBOL(sock_create_net);
 
 /**
- *	sock_create_kern - creates a socket (kernel space)
- *	@net: net namespace
- *	@family: protocol family (AF_INET, ...)
- *	@type: communication type (SOCK_STREAM, ...)
- *	@protocol: protocol (0, ...)
- *	@res: new socket
+ * sock_create_net_noref - creates a socket for kernel space
+ *
+ * @net: net namespace
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
  *
- *	A wrapper around __sock_create().
- *	Returns 0 or an error. This function internally uses GFP_KERNEL.
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ *
+ * The socket is for kernel space and should not be exposed to
+ * userspace via a file descriptor nor BPF hooks except for LSM
+ * (see inet_create(), inet_release(), etc).
+ *
+ * The socket DOES NOT hold a reference count of @net to allow it to
+ * be removed; the caller MUST ensure that the socket is always freed
+ * before @net.
+ *
+ * @net MUST be alive as of calling sock_create_net_noref().
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error.
  */
 
-int sock_create_kern(struct net *net, int family, int type, int protocol,
-		     struct socket **res)
+int sock_create_net_noref(struct net *net, int family, int type, int protocol,
+			  struct socket **res)
 {
 	return __sock_create(net, family, type, protocol, res, true, false);
 }
-EXPORT_SYMBOL(sock_create_kern);
+EXPORT_SYMBOL(sock_create_net_noref);
 
 static struct socket *__sys_socket_create(int family, int type, int protocol)
 {
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 37935082d799..4e8723403e07 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1450,8 +1450,8 @@ static int rpc_sockname(struct net *net, struct sockaddr *sap, size_t salen,
 	struct socket *sock;
 	int err;
 
-	err = sock_create_kern(net, sap->sa_family,
-			       SOCK_DGRAM, IPPROTO_UDP, &sock);
+	err = sock_create_net_noref(net, sap->sa_family,
+				    SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (err < 0) {
 		dprintk("RPC:       can't create UDP socket (%d)\n", err);
 		goto out;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index cde5765f6f81..e20465c20b16 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1529,7 +1529,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
 	if (protocol == IPPROTO_TCP)
 		error = sock_create_net(net, family, type, protocol, &sock);
 	else
-		error = sock_create_kern(net, family, type, protocol, &sock);
+		error = sock_create_net_noref(net, family, type, protocol, &sock);
 	if (error < 0)
 		return ERR_PTR(error);
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index f3e139c30442..e793914d48f6 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1927,7 +1927,7 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
 	if (protocol == IPPROTO_TCP)
 		err = sock_create_net(xprt->xprt_net, family, type, protocol, &sock);
 	else
-		err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
+		err = sock_create_net_noref(xprt->xprt_net, family, type, protocol, &sock);
 	if (err < 0) {
 		dprintk("RPC:       can't create %d transport socket (%d).\n",
 				protocol, -err);
@@ -1999,8 +1999,8 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
 	struct socket *sock;
 	int status;
 
-	status = sock_create_kern(xprt->xprt_net, AF_LOCAL,
-				  SOCK_STREAM, 0, &sock);
+	status = sock_create_net_noref(xprt->xprt_net, AF_LOCAL,
+				       SOCK_STREAM, 0, &sock);
 	if (status < 0) {
 		dprintk("RPC:       can't create AF_LOCAL "
 			"transport socket (%d).\n", -status);
diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
index 8ee0c07d00e9..2e03391c1bd1 100644
--- a/net/tipc/topsrv.c
+++ b/net/tipc/topsrv.c
@@ -515,7 +515,7 @@ static int tipc_topsrv_create_listener(struct tipc_topsrv *srv)
 	struct sock *sk;
 	int rc;
 
-	rc = sock_create_kern(srv->net, AF_TIPC, SOCK_SEQPACKET, 0, &lsock);
+	rc = sock_create_net_noref(srv->net, AF_TIPC, SOCK_SEQPACKET, 0, &lsock);
 	if (rc < 0)
 		return rc;
 
@@ -553,7 +553,7 @@ static int tipc_topsrv_create_listener(struct tipc_topsrv *srv)
 	 * after TIPC module is inserted successfully.
 	 *
 	 * However, the reference count is ever increased twice in
-	 * sock_create_kern(): one is to increase the reference count of owner
+	 * sock_create_net_noref(): one is to increase the reference count of owner
 	 * of TIPC socket's proto_ops struct; another is to increment the
 	 * reference count of owner of TIPC proto struct. Therefore, we must
 	 * decrement the module reference count twice to ensure that it keeps
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 27c58fd260e0..fef671d39d5e 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13689,8 +13689,8 @@ static int nl80211_parse_wowlan_tcp(struct cfg80211_registered_device *rdev,
 	port = nla_get_u16_default(tb[NL80211_WOWLAN_TCP_SRC_PORT], 0);
 #ifdef CONFIG_INET
 	/* allocate a socket and port for it and use it */
-	err = sock_create_kern(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
-			       IPPROTO_TCP, &cfg->sock);
+	err = sock_create_net_noref(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
+				    IPPROTO_TCP, &cfg->sock);
 	if (err) {
 		kfree(cfg);
 		return err;
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index cc9dde507aba..b6e78e9d3280 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -804,8 +804,8 @@ __bpf_kfunc int bpf_kfunc_init_sock(struct init_sock_args *args)
 		goto out;
 	}
 
-	err = sock_create_kern(current->nsproxy->net_ns, args->af, args->type,
-			       proto, &sock);
+	err = sock_create_net_noref(current->nsproxy->net_ns, args->af, args->type,
+				    proto, &sock);
 
 	if (!err)
 		/* Set timeout for call to kernel_connect() to prevent it from hanging,
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion.
  2024-12-13  9:21 ` [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion Kuniyuki Iwashima
@ 2024-12-13 13:45   ` Wenjia Zhang
  2024-12-13 13:54     ` Kuniyuki Iwashima
  2024-12-13 14:15   ` Chuck Lever
  2024-12-13 23:29   ` Allison Henderson
  2 siblings, 1 reply; 27+ messages in thread
From: Wenjia Zhang @ 2024-12-13 13:45 UTC (permalink / raw)
  To: Kuniyuki Iwashima, alibuda, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: Kuniyuki Iwashima, netdev, Matthieu Baerts, Allison Henderson,
	Steve French, Jan Karcher, Chuck Lever, Jeff Layton



On 13.12.24 10:21, Kuniyuki Iwashima wrote:
> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference count
> the netns of kernel sockets."), TCP kernel socket has caused many UAF.
> 
> We have converted such sockets to hold netns refcnt, and we have the
> same pattern in cifs, mptcp, rds, smc, and sunrpc.
> 
> Let's drop the conversion and use sock_create_net() instead.
> 
> The changes for cifs, mptcp, and smc are straightforward.
> 
> For rds, we need to move maybe_get_net() before sock_create_net() and
> sock->ops->accept().
> 
> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> call sock_create_kern() for others.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Acked-by: Allison Henderson <allison.henderson@oracle.com>
> ---
> v3: Add missing mutex_unlock in rds_tcp_conn_path_connect().
> v2: Collect Acked-by from MPTCP and RDS maintainers
> 
> Cc: Steve French <sfrench@samba.org>
> Cc: Wenjia Zhang <wenjia@linux.ibm.com>
> Cc: Jan Karcher <jaka@linux.ibm.com>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/smb/client/connect.c | 13 ++-----------
>   net/mptcp/subflow.c     | 10 +---------
>   net/rds/tcp.c           | 14 --------------
>   net/rds/tcp_connect.c   | 21 +++++++++++++++------
>   net/rds/tcp_listen.c    | 14 ++++++++++++--
>   net/smc/af_smc.c        | 21 ++-------------------
>   net/sunrpc/svcsock.c    | 12 ++++++------
>   net/sunrpc/xprtsock.c   | 12 ++++--------
>   8 files changed, 42 insertions(+), 75 deletions(-)
> 
> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index c36c1b4ffe6e..7a67b86c0423 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -3130,22 +3130,13 @@ generic_ip_connect(struct TCP_Server_Info *server)
>   	if (server->ssocket) {
>   		socket = server->ssocket;
>   	} else {
> -		struct net *net = cifs_net_ns(server);
> -		struct sock *sk;
> -
> -		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
> -				      IPPROTO_TCP, &server->ssocket);
> +		rc = sock_create_net(cifs_net_ns(server), sfamily, SOCK_STREAM,
> +				     IPPROTO_TCP, &server->ssocket);
>   		if (rc < 0) {
>   			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
>   			return rc;
>   		}
>   
> -		sk = server->ssocket->sk;
> -		__netns_tracker_free(net, &sk->ns_tracker, false);
> -		sk->sk_net_refcnt = 1;
> -		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -
>   		/* BB other socket options to set KEEPALIVE, NODELAY? */
>   		cifs_dbg(FYI, "Socket created\n");
>   		socket = server->ssocket;
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index fd021cf8286e..e7e8972bdfca 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1755,7 +1755,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>   	if (unlikely(!sk->sk_socket))
>   		return -EINVAL;
>   
> -	err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
> +	err = sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
>   	if (err)
>   		return err;
>   
> @@ -1768,14 +1768,6 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>   	/* the newly created socket has to be in the same cgroup as its parent */
>   	mptcp_attach_cgroup(sk, sf->sk);
>   
> -	/* kernel sockets do not by default acquire net ref, but TCP timer
> -	 * needs it.
> -	 * Update ns_tracker to current stack trace and refcounted tracker.
> -	 */
> -	__netns_tracker_free(net, &sf->sk->ns_tracker, false);
> -	sf->sk->sk_net_refcnt = 1;
> -	get_net_track(net, &sf->sk->ns_tracker, GFP_KERNEL);
> -	sock_inuse_add(net, 1);
>   	err = tcp_set_ulp(sf->sk, "mptcp");
>   	if (err)
>   		goto err_free;
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 351ac1747224..4509900476f7 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -494,21 +494,7 @@ bool rds_tcp_tune(struct socket *sock)
>   
>   	tcp_sock_set_nodelay(sock->sk);
>   	lock_sock(sk);
> -	/* TCP timer functions might access net namespace even after
> -	 * a process which created this net namespace terminated.
> -	 */
> -	if (!sk->sk_net_refcnt) {
> -		if (!maybe_get_net(net)) {
> -			release_sock(sk);
> -			return false;
> -		}
> -		/* Update ns_tracker to current stack trace and refcounted tracker */
> -		__netns_tracker_free(net, &sk->ns_tracker, false);
>   
> -		sk->sk_net_refcnt = 1;
> -		netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -	}
>   	rtn = net_generic(net, rds_tcp_netid);
>   	if (rtn->sndbuf_size > 0) {
>   		sk->sk_sndbuf = rtn->sndbuf_size;
> diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
> index a0046e99d6df..c9449780f952 100644
> --- a/net/rds/tcp_connect.c
> +++ b/net/rds/tcp_connect.c
> @@ -93,6 +93,7 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
>   	struct sockaddr_in6 sin6;
>   	struct sockaddr_in sin;
>   	struct sockaddr *addr;
> +	struct net *net;
>   	int addrlen;
>   	bool isv6;
>   	int ret;
> @@ -107,20 +108,28 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
>   
>   	mutex_lock(&tc->t_conn_path_lock);
>   
> +	net = rds_conn_net(conn);
> +
>   	if (rds_conn_path_up(cp)) {
> -		mutex_unlock(&tc->t_conn_path_lock);
> -		return 0;
> +		ret = 0;
> +		goto out;
>   	}
> +
> +	if (!maybe_get_net(net)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
>   	if (ipv6_addr_v4mapped(&conn->c_laddr)) {
> -		ret = sock_create_kern(rds_conn_net(conn), PF_INET,
> -				       SOCK_STREAM, IPPROTO_TCP, &sock);
> +		ret = sock_create_net(net, PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
>   		isv6 = false;
>   	} else {
> -		ret = sock_create_kern(rds_conn_net(conn), PF_INET6,
> -				       SOCK_STREAM, IPPROTO_TCP, &sock);
> +		ret = sock_create_net(net, PF_INET6, SOCK_STREAM, IPPROTO_TCP, &sock);
>   		isv6 = true;
>   	}
>   
> +	put_net(net);
> +
>   	if (ret < 0)
>   		goto out;
>   
> diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
> index 69aaf03ab93e..440ac9057148 100644
> --- a/net/rds/tcp_listen.c
> +++ b/net/rds/tcp_listen.c
> @@ -101,6 +101,7 @@ int rds_tcp_accept_one(struct socket *sock)
>   	struct rds_connection *conn;
>   	int ret;
>   	struct inet_sock *inet;
> +	struct net *net;
>   	struct rds_tcp_connection *rs_tcp = NULL;
>   	int conn_state;
>   	struct rds_conn_path *cp;
> @@ -108,7 +109,7 @@ int rds_tcp_accept_one(struct socket *sock)
>   	struct proto_accept_arg arg = {
>   		.flags = O_NONBLOCK,
>   		.kern = true,
> -		.hold_net = false,
> +		.hold_net = true,
>   	};
>   #if !IS_ENABLED(CONFIG_IPV6)
>   	struct in6_addr saddr, daddr;
> @@ -118,13 +119,22 @@ int rds_tcp_accept_one(struct socket *sock)
>   	if (!sock) /* module unload or netns delete in progress */
>   		return -ENETUNREACH;
>   
> +	net = sock_net(sock->sk);
> +
> +	if (!maybe_get_net(net))
> +		return -EINVAL;
> +
>   	ret = sock_create_lite(sock->sk->sk_family,
>   			       sock->sk->sk_type, sock->sk->sk_protocol,
>   			       &new_sock);
> -	if (ret)
> +	if (ret) {
> +		put_net(net);
>   		goto out;
> +	}
>   
>   	ret = sock->ops->accept(sock, new_sock, &arg);
> +	put_net(net);
> +
>   	if (ret < 0)
>   		goto out;
>   
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 6e93f188a908..7b0de80b3aca 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -3310,25 +3310,8 @@ static const struct proto_ops smc_sock_ops = {
>   
>   int smc_create_clcsk(struct net *net, struct sock *sk, int family)
>   {
> -	struct smc_sock *smc = smc_sk(sk);
> -	int rc;
> -
> -	rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
> -			      &smc->clcsock);
> -	if (rc)
> -		return rc;
> -
> -	/* smc_clcsock_release() does not wait smc->clcsock->sk's
> -	 * destruction;  its sk_state might not be TCP_CLOSE after
> -	 * smc->sk is close()d, and TCP timers can be fired later,
> -	 * which need net ref.
> -	 */
> -	sk = smc->clcsock->sk;
> -	__netns_tracker_free(net, &sk->ns_tracker, false);
> -	sk->sk_net_refcnt = 1;
> -	get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> -	sock_inuse_add(net, 1);
I don't think this line shoud be removed. Otherwise, the popurse here to 
manage the per namespace statistics in the case of network namespace 
isolation would be lost.
@D. Wythe, could you please check it again? Maybe you have some good 
testing on this case.

> -	return 0;
> +	return sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP,
> +			       &smc_sk(sk)->clcsock);
>   }
>   
>   static int __smc_create(struct net *net, struct socket *sock, int protocol,
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 9583bad3d150..cde5765f6f81 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1526,7 +1526,10 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>   		return ERR_PTR(-EINVAL);
>   	}
>   
> -	error = sock_create_kern(net, family, type, protocol, &sock);
> +	if (protocol == IPPROTO_TCP)
> +		error = sock_create_net(net, family, type, protocol, &sock);
> +	else
> +		error = sock_create_kern(net, family, type, protocol, &sock);
>   	if (error < 0)
>   		return ERR_PTR(error);
>   
> @@ -1551,11 +1554,8 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>   	newlen = error;
>   
>   	if (protocol == IPPROTO_TCP) {
> -		__netns_tracker_free(net, &sock->sk->ns_tracker, false);
> -		sock->sk->sk_net_refcnt = 1;
> -		get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -		if ((error = kernel_listen(sock, 64)) < 0)
> +		error = kernel_listen(sock, 64);
> +		if (error < 0)
>   			goto bummer;
>   	}
>   
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index feb1768e8a57..f3e139c30442 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1924,7 +1924,10 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
>   	struct socket *sock;
>   	int err;
>   
> -	err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
> +	if (protocol == IPPROTO_TCP)
> +		err = sock_create_net(xprt->xprt_net, family, type, protocol, &sock);
> +	else
> +		err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
>   	if (err < 0) {
>   		dprintk("RPC:       can't create %d transport socket (%d).\n",
>   				protocol, -err);
> @@ -1941,13 +1944,6 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
>   		goto out;
>   	}
>   
> -	if (protocol == IPPROTO_TCP) {
> -		__netns_tracker_free(xprt->xprt_net, &sock->sk->ns_tracker, false);
> -		sock->sk->sk_net_refcnt = 1;
> -		get_net_track(xprt->xprt_net, &sock->sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(xprt->xprt_net, 1);
> -	}
> -
>   	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
>   	if (IS_ERR(filp))
>   		return ERR_CAST(filp);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 08/15] socket: Pass hold_net to sk_alloc().
  2024-12-13  9:21 ` [PATCH v3 net-next 08/15] socket: Pass hold_net to sk_alloc() Kuniyuki Iwashima
@ 2024-12-13 13:45   ` Wenjia Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Wenjia Zhang @ 2024-12-13 13:45 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: Kuniyuki Iwashima, netdev



On 13.12.24 10:21, Kuniyuki Iwashima wrote:
> We will introduce a new API to create a kernel socket with netns refcnt
> held.  Then, sk_alloc() need the hold_net flag passed to __sock_create().
> 
> Let's pass it to sk_alloc().
> 
> The actual use of hold_net will be in the next patch to make its review
> easy.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> v2:
>    * Fix build error in iucv_sock_alloc()
> ---
>   crypto/af_alg.c              | 5 +++--
>   drivers/isdn/mISDN/socket.c  | 4 ++--
>   drivers/net/ppp/pppoe.c      | 2 +-
>   drivers/net/ppp/pptp.c       | 2 +-
>   drivers/net/tap.c            | 2 +-
>   drivers/net/tun.c            | 2 +-
>   drivers/xen/pvcalls-front.c  | 3 ++-
>   include/net/sock.h           | 2 +-
>   net/appletalk/ddp.c          | 2 +-
>   net/atm/common.c             | 2 +-
>   net/ax25/af_ax25.c           | 5 +++--
>   net/bluetooth/af_bluetooth.c | 2 +-
>   net/bluetooth/cmtp/sock.c    | 2 +-
>   net/bpf/test_run.c           | 2 +-
>   net/caif/caif_socket.c       | 2 +-
>   net/can/af_can.c             | 2 +-
>   net/core/sock.c              | 3 ++-
>   net/ieee802154/socket.c      | 2 +-
>   net/ipv4/af_inet.c           | 2 +-
>   net/ipv6/af_inet6.c          | 2 +-
>   net/iucv/af_iucv.c           | 2 +-
>   net/kcm/kcmsock.c            | 4 ++--
>   net/key/af_key.c             | 2 +-
>   net/l2tp/l2tp_ppp.c          | 3 ++-
>   net/llc/llc_conn.c           | 2 +-
>   net/mctp/af_mctp.c           | 2 +-
>   net/netlink/af_netlink.c     | 3 ++-
>   net/netrom/af_netrom.c       | 5 +++--
>   net/nfc/llcp_sock.c          | 2 +-
>   net/nfc/rawsock.c            | 2 +-
>   net/packet/af_packet.c       | 2 +-
>   net/phonet/af_phonet.c       | 2 +-
>   net/phonet/pep.c             | 2 +-
>   net/qrtr/af_qrtr.c           | 2 +-
>   net/rds/af_rds.c             | 2 +-
>   net/rose/af_rose.c           | 9 +++++----
>   net/rxrpc/af_rxrpc.c         | 2 +-
>   net/sctp/ipv6.c              | 2 +-
>   net/sctp/protocol.c          | 2 +-
>   net/smc/af_smc.c             | 2 +-
>   net/tipc/socket.c            | 2 +-
>   net/unix/af_unix.c           | 8 +++++---
>   net/vmw_vsock/af_vsock.c     | 2 +-
>   net/x25/af_x25.c             | 2 +-
>   net/xdp/xsk.c                | 2 +-
>   45 files changed, 65 insertions(+), 55 deletions(-)
> 
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index e60032b94d97..bef4f0c8dac8 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -423,7 +423,8 @@ int af_alg_accept(struct sock *sk, struct socket *newsock,
>   	if (!type)
>   		goto unlock;
>   
> -	sk2 = sk_alloc(sock_net(sk), PF_ALG, GFP_KERNEL, &alg_proto, arg->kern);
> +	sk2 = sk_alloc(sock_net(sk), PF_ALG, GFP_KERNEL, &alg_proto,
> +		       arg->kern, arg->hold_net);
>   	err = -ENOMEM;
>   	if (!sk2)
>   		goto unlock;
> @@ -514,7 +515,7 @@ static int alg_create(struct net *net, struct socket *sock, int protocol,
>   		return -EPROTONOSUPPORT;
>   
>   	err = -ENOMEM;
> -	sk = sk_alloc(net, PF_ALG, GFP_KERNEL, &alg_proto, kern);
> +	sk = sk_alloc(net, PF_ALG, GFP_KERNEL, &alg_proto, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   
> diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
> index 54157c24ccb9..2d2404cf5649 100644
> --- a/drivers/isdn/mISDN/socket.c
> +++ b/drivers/isdn/mISDN/socket.c
> @@ -598,7 +598,7 @@ data_sock_create(struct net *net, struct socket *sock, int protocol,
>   	if (sock->type != SOCK_DGRAM)
>   		return -ESOCKTNOSUPPORT;
>   
> -	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern);
> +	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> @@ -757,7 +757,7 @@ base_sock_create(struct net *net, struct socket *sock, int protocol,
>   	if (!capable(CAP_NET_RAW))
>   		return -EPERM;
>   
> -	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern);
> +	sk = sk_alloc(net, PF_ISDN, GFP_KERNEL, &mISDN_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> index 90995f8a08a3..6606aa4374e9 100644
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
> @@ -538,7 +538,7 @@ static int pppoe_create(struct net *net, struct socket *sock,
>   {
>   	struct sock *sk;
>   
> -	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppoe_sk_proto, kern);
> +	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppoe_sk_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
> index 7bfb5c227c40..4c41e07ec497 100644
> --- a/drivers/net/ppp/pptp.c
> +++ b/drivers/net/ppp/pptp.c
> @@ -546,7 +546,7 @@ static int pptp_create(struct net *net, struct socket *sock,
>   	struct pppox_sock *po;
>   	struct pptp_opt *opt;
>   
> -	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pptp_sk_proto, kern);
> +	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pptp_sk_proto, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   
> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> index 5aa41d5f7765..7bce097e96a5 100644
> --- a/drivers/net/tap.c
> +++ b/drivers/net/tap.c
> @@ -522,7 +522,7 @@ static int tap_open(struct inode *inode, struct file *file)
>   
>   	err = -ENOMEM;
>   	q = (struct tap_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
> -					     &tap_proto, 0);
> +					 &tap_proto, false, true);
>   	if (!q)
>   		goto err;
>   	if (ptr_ring_init(&q->ring, tap->dev->tx_queue_len, GFP_KERNEL)) {
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 8e94df88392c..13bbee8d0a4b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -3481,7 +3481,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>   	struct tun_file *tfile;
>   
>   	tfile = (struct tun_file *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
> -					    &tun_proto, 0);
> +					    &tun_proto, false, true);
>   	if (!tfile)
>   		return -ENOMEM;
>   	if (ptr_ring_init(&tfile->tx_ring, 0, GFP_KERNEL)) {
> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> index b72ee9379d77..a2308d24e67d 100644
> --- a/drivers/xen/pvcalls-front.c
> +++ b/drivers/xen/pvcalls-front.c
> @@ -882,7 +882,8 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags)
>   
>   received:
>   	map2->sock = newsock;
> -	newsock->sk = sk_alloc(sock_net(sock->sk), PF_INET, GFP_KERNEL, &pvcalls_proto, false);
> +	newsock->sk = sk_alloc(sock_net(sock->sk), PF_INET, GFP_KERNEL, &pvcalls_proto,
> +			       false, true);
>   	if (!newsock->sk) {
>   		bedata->rsp[req_id].req_id = PVCALLS_INVALID_ID;
>   		map->passive.inflight_req_id = PVCALLS_INVALID_ID;
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 9963dccec2f8..8de415fefe3b 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1743,7 +1743,7 @@ static inline bool sock_allow_reclassification(const struct sock *csk)
>   }
>   
>   struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
> -		      struct proto *prot, int kern);
> +		      struct proto *prot, bool kern, bool hold_net);
>   void sk_free(struct sock *sk);
>   void sk_destruct(struct sock *sk);
>   struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority);
> diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
> index 9bd361ccf5f4..3eab462100e0 100644
> --- a/net/appletalk/ddp.c
> +++ b/net/appletalk/ddp.c
> @@ -1050,7 +1050,7 @@ static int atalk_create(struct net *net, struct socket *sock, int protocol,
>   		goto out;
>   
>   	rc = -ENOMEM;
> -	sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, &ddp_proto, kern);
> +	sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, &ddp_proto, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   	rc = 0;
> diff --git a/net/atm/common.c b/net/atm/common.c
> index c1e05b0c0b4b..2cf074c3e8a5 100644
> --- a/net/atm/common.c
> +++ b/net/atm/common.c
> @@ -146,7 +146,7 @@ int vcc_create(struct net *net, struct socket *sock, int protocol, int family,
>   	sock->sk = NULL;
>   	if (sock->type == SOCK_STREAM)
>   		return -EINVAL;
> -	sk = sk_alloc(net, family, GFP_KERNEL, &vcc_proto, kern);
> +	sk = sk_alloc(net, family, GFP_KERNEL, &vcc_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   	sock_init_data(sock, sk);
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index 6c68b5e5b11c..6f572c0b3f59 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -890,7 +890,7 @@ static int ax25_create(struct net *net, struct socket *sock, int protocol,
>   		return -ESOCKTNOSUPPORT;
>   	}
>   
> -	sk = sk_alloc(net, PF_AX25, GFP_ATOMIC, &ax25_proto, kern);
> +	sk = sk_alloc(net, PF_AX25, GFP_ATOMIC, &ax25_proto, kern, hold_net);
>   	if (sk == NULL)
>   		return -ENOMEM;
>   
> @@ -916,7 +916,8 @@ struct sock *ax25_make_new(struct sock *osk, struct ax25_dev *ax25_dev)
>   	struct sock *sk;
>   	ax25_cb *ax25, *oax25;
>   
> -	sk = sk_alloc(sock_net(osk), PF_AX25, GFP_ATOMIC, osk->sk_prot, 0);
> +	sk = sk_alloc(sock_net(osk), PF_AX25, GFP_ATOMIC, osk->sk_prot,
> +		      false, true);
>   	if (sk == NULL)
>   		return NULL;
>   
> diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
> index 7c24a6f87281..6c89fa2d9ccd 100644
> --- a/net/bluetooth/af_bluetooth.c
> +++ b/net/bluetooth/af_bluetooth.c
> @@ -146,7 +146,7 @@ struct sock *bt_sock_alloc(struct net *net, struct socket *sock,
>   {
>   	struct sock *sk;
>   
> -	sk = sk_alloc(net, PF_BLUETOOTH, prio, prot, kern);
> +	sk = sk_alloc(net, PF_BLUETOOTH, prio, prot, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   
> diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
> index 2ea9da9fe1d5..6e9138748317 100644
> --- a/net/bluetooth/cmtp/sock.c
> +++ b/net/bluetooth/cmtp/sock.c
> @@ -207,7 +207,7 @@ static int cmtp_sock_create(struct net *net, struct socket *sock, int protocol,
>   	if (sock->type != SOCK_RAW)
>   		return -ESOCKTNOSUPPORT;
>   
> -	sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC, &cmtp_proto, kern);
> +	sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC, &cmtp_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 9ae2a7f1738b..f663f760bcb8 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -1024,7 +1024,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>   		break;
>   	}
>   
> -	sk = sk_alloc(net, AF_UNSPEC, GFP_USER, &bpf_dummy_proto, 1);
> +	sk = sk_alloc(net, AF_UNSPEC, GFP_USER, &bpf_dummy_proto, true, false);
>   	if (!sk) {
>   		kfree(data);
>   		kfree(ctx);
> diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
> index 6eef0e83f442..60fa870cfe97 100644
> --- a/net/caif/caif_socket.c
> +++ b/net/caif/caif_socket.c
> @@ -1048,7 +1048,7 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
>   	 * is really not used at all in the net/core or socket.c but the
>   	 * initialization makes sure that sock->state is not uninitialized.
>   	 */
> -	sk = sk_alloc(net, PF_CAIF, GFP_KERNEL, &prot, kern);
> +	sk = sk_alloc(net, PF_CAIF, GFP_KERNEL, &prot, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index c4094ccc9978..cecdc8b7420c 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -155,7 +155,7 @@ static int can_create(struct net *net, struct socket *sock, int protocol,
>   
>   	sock->ops = cp->ops;
>   
> -	sk = sk_alloc(net, PF_CAN, GFP_KERNEL, cp->prot, kern);
> +	sk = sk_alloc(net, PF_CAN, GFP_KERNEL, cp->prot, kern, hold_net);
>   	if (!sk) {
>   		err = -ENOMEM;
>   		goto errout;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 74729d20cd00..8546d97cc6ec 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2209,9 +2209,10 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
>    *	@priority: for allocation (%GFP_KERNEL, %GFP_ATOMIC, etc)
>    *	@prot: struct proto associated with this new sock instance
>    *	@kern: is this to be a kernel socket?
> + *	@hold_net: hold netns refcnt or not
>    */
>   struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
> -		      struct proto *prot, int kern)
> +		      struct proto *prot, bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
> index 0dd1a8829c42..6144338c420d 100644
> --- a/net/ieee802154/socket.c
> +++ b/net/ieee802154/socket.c
> @@ -1027,7 +1027,7 @@ static int ieee802154_create(struct net *net, struct socket *sock,
>   	}
>   
>   	rc = -ENOMEM;
> -	sk = sk_alloc(net, PF_IEEE802154, GFP_KERNEL, proto, kern);
> +	sk = sk_alloc(net, PF_IEEE802154, GFP_KERNEL, proto, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   	rc = 0;
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 7313ec410fb5..d22bb0d3ddc1 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -323,7 +323,7 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
>   	WARN_ON(!answer_prot->slab);
>   
>   	err = -ENOMEM;
> -	sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern);
> +	sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 8f951e5e58ab..c30fa8de7451 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -190,7 +190,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
>   	WARN_ON(!answer_prot->slab);
>   
>   	err = -ENOBUFS;
> -	sk = sk_alloc(net, PF_INET6, GFP_KERNEL, answer_prot, kern);
> +	sk = sk_alloc(net, PF_INET6, GFP_KERNEL, answer_prot, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   
> diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
> index b7bbd4947855..76ecc64ec60c 100644
> --- a/net/iucv/af_iucv.c
> +++ b/net/iucv/af_iucv.c
> @@ -452,7 +452,7 @@ static struct sock *iucv_sock_alloc(struct socket *sock, int proto, gfp_t prio,
>   	struct sock *sk;
>   	struct iucv_sock *iucv;
>   
> -	sk = sk_alloc(&init_net, PF_IUCV, prio, &iucv_proto, kern);
> +	sk = sk_alloc(&init_net, PF_IUCV, prio, &iucv_proto, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   	iucv = iucv_sk(sk);
> diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
> index 50925046a392..8c791d1272cc 100644
> --- a/net/kcm/kcmsock.c
> +++ b/net/kcm/kcmsock.c
> @@ -1517,7 +1517,7 @@ static struct file *kcm_clone(struct socket *osock)
>   	__module_get(newsock->ops->owner);
>   
>   	newsk = sk_alloc(sock_net(osock->sk), PF_KCM, GFP_KERNEL,
> -			 &kcm_proto, false);
> +			 &kcm_proto, false, true);
>   	if (!newsk) {
>   		sock_release(newsock);
>   		return ERR_PTR(-ENOMEM);
> @@ -1798,7 +1798,7 @@ static int kcm_create(struct net *net, struct socket *sock,
>   	if (protocol != KCMPROTO_CONNECTED)
>   		return -EPROTONOSUPPORT;
>   
> -	sk = sk_alloc(net, PF_KCM, GFP_KERNEL, &kcm_proto, kern);
> +	sk = sk_alloc(net, PF_KCM, GFP_KERNEL, &kcm_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index 1c35b1cfb1c5..765cc86d7923 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -149,7 +149,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
>   	if (protocol != PF_KEY_V2)
>   		return -EPROTONOSUPPORT;
>   
> -	sk = sk_alloc(net, PF_KEY, GFP_KERNEL, &key_proto, kern);
> +	sk = sk_alloc(net, PF_KEY, GFP_KERNEL, &key_proto, kern, hold_net);
>   	if (sk == NULL)
>   		return -ENOMEM;
>   
> diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
> index bab3c7b943db..5bd99d5ca128 100644
> --- a/net/l2tp/l2tp_ppp.c
> +++ b/net/l2tp/l2tp_ppp.c
> @@ -483,7 +483,8 @@ static int pppol2tp_create(struct net *net, struct socket *sock,
>   	int error = -ENOMEM;
>   	struct sock *sk;
>   
> -	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppol2tp_sk_proto, kern);
> +	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppol2tp_sk_proto,
> +		      kern, hold_net);
>   	if (!sk)
>   		goto out;
>   
> diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
> index 75b2e21bfd2b..ba0ed49b3085 100644
> --- a/net/llc/llc_conn.c
> +++ b/net/llc/llc_conn.c
> @@ -932,7 +932,7 @@ static void llc_sk_init(struct sock *sk)
>   struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
>   			  struct proto *prot, bool kern, bool hold_net)
>   {
> -	struct sock *sk = sk_alloc(net, family, priority, prot, kern);
> +	struct sock *sk = sk_alloc(net, family, priority, prot, kern, hold_net);
>   
>   	if (!sk)
>   		goto out;
> diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c
> index 17821c976213..5de6bc967271 100644
> --- a/net/mctp/af_mctp.c
> +++ b/net/mctp/af_mctp.c
> @@ -702,7 +702,7 @@ static int mctp_pf_create(struct net *net, struct socket *sock,
>   	sock->state = SS_UNCONNECTED;
>   	sock->ops = ops;
>   
> -	sk = sk_alloc(net, PF_MCTP, GFP_KERNEL, proto, kern);
> +	sk = sk_alloc(net, PF_MCTP, GFP_KERNEL, proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index ddc51cb89c5b..273f3e43938a 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -626,7 +626,8 @@ static int __netlink_create(struct net *net, struct socket *sock,
>   
>   	sock->ops = &netlink_ops;
>   
> -	sk = sk_alloc(net, PF_NETLINK, GFP_KERNEL, &netlink_proto, kern);
> +	sk = sk_alloc(net, PF_NETLINK, GFP_KERNEL, &netlink_proto,
> +		      kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
> index 483f78951a19..0803ca64385d 100644
> --- a/net/netrom/af_netrom.c
> +++ b/net/netrom/af_netrom.c
> @@ -435,7 +435,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
>   	if (sock->type != SOCK_SEQPACKET || protocol != 0)
>   		return -ESOCKTNOSUPPORT;
>   
> -	sk = sk_alloc(net, PF_NETROM, GFP_ATOMIC, &nr_proto, kern);
> +	sk = sk_alloc(net, PF_NETROM, GFP_ATOMIC, &nr_proto, kern, hold_net);
>   	if (sk  == NULL)
>   		return -ENOMEM;
>   
> @@ -478,7 +478,8 @@ static struct sock *nr_make_new(struct sock *osk)
>   	if (osk->sk_type != SOCK_SEQPACKET)
>   		return NULL;
>   
> -	sk = sk_alloc(sock_net(osk), PF_NETROM, GFP_ATOMIC, osk->sk_prot, 0);
> +	sk = sk_alloc(sock_net(osk), PF_NETROM, GFP_ATOMIC, osk->sk_prot,
> +		      false, true);
>   	if (sk == NULL)
>   		return NULL;
>   
> diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
> index 14f592becce0..80c427c32a91 100644
> --- a/net/nfc/llcp_sock.c
> +++ b/net/nfc/llcp_sock.c
> @@ -977,7 +977,7 @@ struct sock *nfc_llcp_sock_alloc(struct socket *sock, int type, gfp_t gfp,
>   	struct sock *sk;
>   	struct nfc_llcp_sock *llcp_sock;
>   
> -	sk = sk_alloc(&init_net, PF_NFC, gfp, &llcp_sock_proto, kern);
> +	sk = sk_alloc(&init_net, PF_NFC, gfp, &llcp_sock_proto, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   
> diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
> index 4485b1ccb1c7..f2443d274065 100644
> --- a/net/nfc/rawsock.c
> +++ b/net/nfc/rawsock.c
> @@ -339,7 +339,7 @@ static int rawsock_create(struct net *net, struct socket *sock,
>   		sock->ops = &rawsock_ops;
>   	}
>   
> -	sk = sk_alloc(net, PF_NFC, GFP_ATOMIC, nfc_proto->proto, kern);
> +	sk = sk_alloc(net, PF_NFC, GFP_ATOMIC, nfc_proto->proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 5a25dac333b0..2d1cab4839cd 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -3414,7 +3414,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
>   	sock->state = SS_UNCONNECTED;
>   
>   	err = -ENOBUFS;
> -	sk = sk_alloc(net, PF_PACKET, GFP_KERNEL, &packet_proto, kern);
> +	sk = sk_alloc(net, PF_PACKET, GFP_KERNEL, &packet_proto, kern, hold_net);
>   	if (sk == NULL)
>   		goto out;
>   
> diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
> index 4bdbc93c74fb..dc2e03edd65d 100644
> --- a/net/phonet/af_phonet.c
> +++ b/net/phonet/af_phonet.c
> @@ -84,7 +84,7 @@ static int pn_socket_create(struct net *net, struct socket *sock, int protocol,
>   		goto out;
>   	}
>   
> -	sk = sk_alloc(net, PF_PHONET, GFP_KERNEL, pnp->prot, kern);
> +	sk = sk_alloc(net, PF_PHONET, GFP_KERNEL, pnp->prot, kern, hold_net);
>   	if (sk == NULL) {
>   		err = -ENOMEM;
>   		goto out;
> diff --git a/net/phonet/pep.c b/net/phonet/pep.c
> index 53a858478e22..9b6e83b92f6f 100644
> --- a/net/phonet/pep.c
> +++ b/net/phonet/pep.c
> @@ -836,7 +836,7 @@ static struct sock *pep_sock_accept(struct sock *sk,
>   
>   	/* Create a new to-be-accepted sock */
>   	newsk = sk_alloc(sock_net(sk), PF_PHONET, GFP_KERNEL, sk->sk_prot,
> -			 arg->kern);
> +			 arg->kern, arg->hold_net);
>   	if (!newsk) {
>   		pep_reject_conn(sk, skb, PN_PIPE_ERR_OVERLOAD, GFP_KERNEL);
>   		err = -ENOBUFS;
> diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c
> index c05711f79a37..05a3b00fddf8 100644
> --- a/net/qrtr/af_qrtr.c
> +++ b/net/qrtr/af_qrtr.c
> @@ -1266,7 +1266,7 @@ static int qrtr_create(struct net *net, struct socket *sock,
>   	if (sock->type != SOCK_DGRAM)
>   		return -EPROTOTYPE;
>   
> -	sk = sk_alloc(net, AF_QIPCRTR, GFP_KERNEL, &qrtr_proto, kern);
> +	sk = sk_alloc(net, AF_QIPCRTR, GFP_KERNEL, &qrtr_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index 3e1bb40485ad..a0999d9ee5ae 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
> @@ -702,7 +702,7 @@ static int rds_create(struct net *net, struct socket *sock, int protocol,
>   	if (sock->type != SOCK_SEQPACKET || protocol)
>   		return -ESOCKTNOSUPPORT;
>   
> -	sk = sk_alloc(net, AF_RDS, GFP_KERNEL, &rds_proto, kern);
> +	sk = sk_alloc(net, AF_RDS, GFP_KERNEL, &rds_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
> index 1c175c92aa42..6aeaa526382a 100644
> --- a/net/rose/af_rose.c
> +++ b/net/rose/af_rose.c
> @@ -555,8 +555,8 @@ static int rose_create(struct net *net, struct socket *sock, int protocol,
>   	if (sock->type != SOCK_SEQPACKET || protocol != 0)
>   		return -ESOCKTNOSUPPORT;
>   
> -	sk = sk_alloc(net, PF_ROSE, GFP_ATOMIC, &rose_proto, kern);
> -	if (sk == NULL)
> +	sk = sk_alloc(net, PF_ROSE, GFP_ATOMIC, &rose_proto, kern, hold_net);
> +	if (!sk)
>   		return -ENOMEM;
>   
>   	rose = rose_sk(sk);
> @@ -594,8 +594,9 @@ static struct sock *rose_make_new(struct sock *osk)
>   	if (osk->sk_type != SOCK_SEQPACKET)
>   		return NULL;
>   
> -	sk = sk_alloc(sock_net(osk), PF_ROSE, GFP_ATOMIC, &rose_proto, 0);
> -	if (sk == NULL)
> +	sk = sk_alloc(sock_net(osk), PF_ROSE, GFP_ATOMIC, &rose_proto,
> +		      false, true);
> +	if (!sk)
>   		return NULL;
>   
>   	rose = rose_sk(sk);
> diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
> index f2374f65b1c0..7e7e1163c476 100644
> --- a/net/rxrpc/af_rxrpc.c
> +++ b/net/rxrpc/af_rxrpc.c
> @@ -830,7 +830,7 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
>   	sock->ops = &rxrpc_rpc_ops;
>   	sock->state = SS_UNCONNECTED;
>   
> -	sk = sk_alloc(net, PF_RXRPC, GFP_KERNEL, &rxrpc_proto, kern);
> +	sk = sk_alloc(net, PF_RXRPC, GFP_KERNEL, &rxrpc_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index 2c4e4dd79246..5e62c77a6f47 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -784,7 +784,7 @@ static struct sock *sctp_v6_create_accept_sk(struct sock *sk,
>   	struct sock *newsk;
>   
>   	newsk = sk_alloc(sock_net(sk), PF_INET6, GFP_KERNEL, sk->sk_prot,
> -			 arg->kern);
> +			 arg->kern, arg->hold_net);
>   	if (!newsk)
>   		goto out;
>   
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index 7b2ae3df171a..73ee2ca9ff31 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -587,7 +587,7 @@ static struct sock *sctp_v4_create_accept_sk(struct sock *sk,
>   	struct sock *newsk;
>   
>   	newsk = sk_alloc(sock_net(sk), PF_INET, GFP_KERNEL, sk->sk_prot,
> -			 arg->kern);
> +			 arg->kern, arg->hold_net);
>   	if (!newsk)
>   		goto out;
>   
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 2535b922f760..6e93f188a908 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -393,7 +393,7 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
>   	struct sock *sk;
>   
>   	prot = (protocol == SMCPROTO_SMC6) ? &smc_proto6 : &smc_proto;
> -	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, kern);
> +	sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   
Only for the smc part:
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 26566ff1d4c7..aba5b139c7d9 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -484,7 +484,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
>   	}
>   
>   	/* Allocate socket's protocol area */
> -	sk = sk_alloc(net, AF_TIPC, GFP_KERNEL, &tipc_proto, kern);
> +	sk = sk_alloc(net, AF_TIPC, GFP_KERNEL, &tipc_proto, kern, hold_net);
>   	if (sk == NULL)
>   		return -ENOMEM;
>   
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 393be726004c..136f4b1d05da 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1020,9 +1020,11 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int type,
>   	}
>   
>   	if (type == SOCK_STREAM)
> -		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_stream_proto, kern);
> -	else /*dgram and  seqpacket */
> -		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_dgram_proto, kern);
> +		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_stream_proto,
> +			      kern, hold_net);
> +	else /* dgram and seqpacket */
> +		sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_dgram_proto,
> +			      kern, hold_net);
>   
>   	if (!sk) {
>   		err = -ENOMEM;
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index f2ce92cd57c4..10aa09e1a291 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -738,7 +738,7 @@ static struct sock *__vsock_create(struct net *net,
>   	struct vsock_sock *psk;
>   	struct vsock_sock *vsk;
>   
> -	sk = sk_alloc(net, AF_VSOCK, priority, &vsock_proto, kern);
> +	sk = sk_alloc(net, AF_VSOCK, priority, &vsock_proto, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   
> diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
> index 0b6c22b979e7..3619982cbb32 100644
> --- a/net/x25/af_x25.c
> +++ b/net/x25/af_x25.c
> @@ -510,7 +510,7 @@ static struct sock *x25_alloc_socket(struct net *net, bool kern, bool hold_net)
>   	struct x25_sock *x25;
>   	struct sock *sk;
>   
> -	sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern);
> +	sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 5763ef355c73..a93b600c6583 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -1703,7 +1703,7 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
>   
>   	sock->state = SS_UNCONNECTED;
>   
> -	sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern);
> +	sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern, hold_net);
>   	if (!sk)
>   		return -ENOBUFS;
>   


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref().
  2024-12-13  9:21 ` [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref() Kuniyuki Iwashima
@ 2024-12-13 13:46   ` Wenjia Zhang
  2024-12-17 10:32   ` Paolo Abeni
  1 sibling, 0 replies; 27+ messages in thread
From: Wenjia Zhang @ 2024-12-13 13:46 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: Kuniyuki Iwashima, netdev



On 13.12.24 10:21, Kuniyuki Iwashima wrote:
> sock_create_kern() is quite a bad name, and the non-netdev folks tend
> to use it without taking care of the netns lifetime.
> 
> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference count
> the netns of kernel sockets."), TCP sockets created by sock_create_kern()
> have caused many use-after-free.
> 
> Let's rename sock_create_kern() to sock_create_net_noref() and add fat
> documentation so that we no longer introduce the same issue in the future.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>   drivers/block/drbd/drbd_receiver.c            | 12 +++----
>   drivers/infiniband/sw/rxe/rxe_qp.c            |  2 +-
>   drivers/soc/qcom/qmi_interface.c              |  4 +--
>   fs/afs/rxrpc.c                                |  3 +-
>   fs/dlm/lowcomms.c                             |  8 ++---
>   include/linux/net.h                           |  3 +-
>   net/9p/trans_fd.c                             |  8 ++---
>   net/bluetooth/rfcomm/core.c                   |  3 +-
>   net/ceph/messenger.c                          |  6 ++--
>   net/handshake/handshake-test.c                |  3 +-
>   net/ipv4/af_inet.c                            |  3 +-
>   net/ipv4/udp_tunnel_core.c                    |  2 +-
>   net/ipv6/ip6_udp_tunnel.c                     |  4 +--
>   net/l2tp/l2tp_core.c                          |  8 ++---
>   net/mctp/test/route-test.c                    |  6 ++--
>   net/mptcp/pm_netlink.c                        |  4 +--
>   net/mptcp/subflow.c                           |  2 +-
>   net/netfilter/ipvs/ip_vs_sync.c               |  8 ++---
>   net/qrtr/ns.c                                 |  6 ++--
>   net/rds/tcp_listen.c                          |  4 +--
>   net/rxrpc/rxperf.c                            |  4 +--
>   net/sctp/socket.c                             |  2 +-
>   net/smc/smc_inet.c                            |  2 +-
>   net/socket.c                                  | 35 +++++++++++++------
>   net/sunrpc/clnt.c                             |  4 +--
>   net/sunrpc/svcsock.c                          |  2 +-
>   net/sunrpc/xprtsock.c                         |  6 ++--
>   net/tipc/topsrv.c                             |  4 +--
>   net/wireless/nl80211.c                        |  4 +--
>   .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  4 +--
>   30 files changed, 92 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
> index 0c9f54197768..39be44e5db8a 100644
> --- a/drivers/block/drbd/drbd_receiver.c
> +++ b/drivers/block/drbd/drbd_receiver.c
> @@ -618,9 +618,9 @@ static struct socket *drbd_try_connect(struct drbd_connection *connection)
>   	peer_addr_len = min_t(int, connection->peer_addr_len, sizeof(src_in6));
>   	memcpy(&peer_in6, &connection->peer_addr, peer_addr_len);
>   
> -	what = "sock_create_kern";
> -	err = sock_create_kern(&init_net, ((struct sockaddr *)&src_in6)->sa_family,
> -			       SOCK_STREAM, IPPROTO_TCP, &sock);
> +	what = "sock_create_net_noref";
> +	err = sock_create_net_noref(&init_net, ((struct sockaddr *)&src_in6)->sa_family,
> +				    SOCK_STREAM, IPPROTO_TCP, &sock);
>   	if (err < 0) {
>   		sock = NULL;
>   		goto out;
> @@ -713,9 +713,9 @@ static int prepare_listen_socket(struct drbd_connection *connection, struct acce
>   	my_addr_len = min_t(int, connection->my_addr_len, sizeof(struct sockaddr_in6));
>   	memcpy(&my_addr, &connection->my_addr, my_addr_len);
>   
> -	what = "sock_create_kern";
> -	err = sock_create_kern(&init_net, ((struct sockaddr *)&my_addr)->sa_family,
> -			       SOCK_STREAM, IPPROTO_TCP, &s_listen);
> +	what = "sock_create_net_noref";
> +	err = sock_create_net_noref(&init_net, ((struct sockaddr *)&my_addr)->sa_family,
> +				    SOCK_STREAM, IPPROTO_TCP, &s_listen);
>   	if (err) {
>   		s_listen = NULL;
>   		goto out;
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 91d329e90308..250673cf6cbf 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -241,7 +241,7 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
>   	/* if we don't finish qp create make sure queue is valid */
>   	skb_queue_head_init(&qp->req_pkts);
>   
> -	err = sock_create_kern(&init_net, AF_INET, SOCK_DGRAM, 0, &qp->sk);
> +	err = sock_create_net_noref(&init_net, AF_INET, SOCK_DGRAM, 0, &qp->sk);
>   	if (err < 0)
>   		return err;
>   	qp->sk->sk->sk_user_data = (void *)(uintptr_t)qp->elem.index;
> diff --git a/drivers/soc/qcom/qmi_interface.c b/drivers/soc/qcom/qmi_interface.c
> index bc6d6379d8b1..eb5a64f6fd6f 100644
> --- a/drivers/soc/qcom/qmi_interface.c
> +++ b/drivers/soc/qcom/qmi_interface.c
> @@ -588,8 +588,8 @@ static struct socket *qmi_sock_create(struct qmi_handle *qmi,
>   	struct socket *sock;
>   	int ret;
>   
> -	ret = sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
> -			       PF_QIPCRTR, &sock);
> +	ret = sock_create_net_noref(&init_net, AF_QIPCRTR, SOCK_DGRAM,
> +				    PF_QIPCRTR, &sock);
>   	if (ret < 0)
>   		return ERR_PTR(ret);
>   
> diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
> index 9f2a3bb56ec6..7443fe801894 100644
> --- a/fs/afs/rxrpc.c
> +++ b/fs/afs/rxrpc.c
> @@ -44,7 +44,8 @@ int afs_open_socket(struct afs_net *net)
>   
>   	_enter("");
>   
> -	ret = sock_create_kern(net->net, AF_RXRPC, SOCK_DGRAM, PF_INET6, &socket);
> +	ret = sock_create_net_noref(net->net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
> +				    &socket);
>   	if (ret < 0)
>   		goto error_1;
>   
> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index df40c3fd1070..b0450aff4cd4 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -1579,8 +1579,8 @@ static int dlm_connect(struct connection *con)
>   	}
>   
>   	/* Create a socket to communicate with */
> -	result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
> -				  SOCK_STREAM, dlm_proto_ops->proto, &sock);
> +	result = sock_create_net_noref(&init_net, dlm_local_addr[0].ss_family,
> +				       SOCK_STREAM, dlm_proto_ops->proto, &sock);
>   	if (result < 0)
>   		return result;
>   
> @@ -1760,8 +1760,8 @@ static int dlm_listen_for_all(void)
>   	if (result < 0)
>   		return result;
>   
> -	result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
> -				  SOCK_STREAM, dlm_proto_ops->proto, &sock);
> +	result = sock_create_net_noref(&init_net, dlm_local_addr[0].ss_family,
> +				       SOCK_STREAM, dlm_proto_ops->proto, &sock);
>   	if (result < 0) {
>   		log_print("Can't create comms socket: %d", result);
>   		return result;
> diff --git a/include/linux/net.h b/include/linux/net.h
> index 1ba4abb18863..582faf2fdd08 100644
> --- a/include/linux/net.h
> +++ b/include/linux/net.h
> @@ -254,7 +254,8 @@ bool sock_is_registered(int family);
>   int sock_create_user(int family, int type, int proto, struct socket **res);
>   int sock_create_net(struct net *net, int family, int type, int proto,
>   		    struct socket **res);
> -int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
> +int sock_create_net_noref(struct net *net, int family, int type, int proto,
> +			  struct socket **res);
>   int sock_create_lite(int family, int type, int proto, struct socket **res);
>   struct socket *sock_alloc(void);
>   void sock_release(struct socket *sock);
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 83f81da24727..ae014999040f 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -1011,8 +1011,8 @@ p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args)
>   	sin_server.sin_family = AF_INET;
>   	sin_server.sin_addr.s_addr = in_aton(addr);
>   	sin_server.sin_port = htons(opts.port);
> -	err = sock_create_kern(current->nsproxy->net_ns, PF_INET,
> -			       SOCK_STREAM, IPPROTO_TCP, &csocket);
> +	err = sock_create_net_noref(current->nsproxy->net_ns, PF_INET,
> +				    SOCK_STREAM, IPPROTO_TCP, &csocket);
>   	if (err) {
>   		pr_err("%s (%d): problem creating socket\n",
>   		       __func__, task_pid_nr(current));
> @@ -1062,8 +1062,8 @@ p9_fd_create_unix(struct p9_client *client, const char *addr, char *args)
>   
>   	sun_server.sun_family = PF_UNIX;
>   	strcpy(sun_server.sun_path, addr);
> -	err = sock_create_kern(current->nsproxy->net_ns, PF_UNIX,
> -			       SOCK_STREAM, 0, &csocket);
> +	err = sock_create_net_noref(current->nsproxy->net_ns, PF_UNIX,
> +				    SOCK_STREAM, 0, &csocket);
>   	if (err < 0) {
>   		pr_err("%s (%d): problem creating socket\n",
>   		       __func__, task_pid_nr(current));
> diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
> index 4c56ca5a216c..6204514667b6 100644
> --- a/net/bluetooth/rfcomm/core.c
> +++ b/net/bluetooth/rfcomm/core.c
> @@ -200,7 +200,8 @@ static int rfcomm_l2sock_create(struct socket **sock)
>   
>   	BT_DBG("");
>   
> -	err = sock_create_kern(&init_net, PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP, sock);
> +	err = sock_create_net_noref(&init_net, PF_BLUETOOTH, SOCK_SEQPACKET,
> +				    BTPROTO_L2CAP, sock);
>   	if (!err) {
>   		struct sock *sk = (*sock)->sk;
>   		sk->sk_data_ready   = rfcomm_l2data_ready;
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index d1b5705dc0c6..cb6a1532ff9f 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -442,10 +442,10 @@ int ceph_tcp_connect(struct ceph_connection *con)
>   	     ceph_pr_addr(&con->peer_addr));
>   	BUG_ON(con->sock);
>   
> -	/* sock_create_kern() allocates with GFP_KERNEL */
> +	/* sock_create_net_noref() allocates with GFP_KERNEL */
>   	noio_flag = memalloc_noio_save();
> -	ret = sock_create_kern(read_pnet(&con->msgr->net), ss.ss_family,
> -			       SOCK_STREAM, IPPROTO_TCP, &sock);
> +	ret = sock_create_net_noref(read_pnet(&con->msgr->net), ss.ss_family,
> +				    SOCK_STREAM, IPPROTO_TCP, &sock);
>   	memalloc_noio_restore(noio_flag);
>   	if (ret)
>   		return ret;
> diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
> index 4f300504f3e5..54793f9e4d30 100644
> --- a/net/handshake/handshake-test.c
> +++ b/net/handshake/handshake-test.c
> @@ -145,7 +145,8 @@ static void handshake_req_alloc_case(struct kunit *test)
>   
>   static int handshake_sock_create(struct socket **sock)
>   {
> -	return sock_create_kern(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, sock);
> +	return sock_create_net_noref(&init_net, PF_INET, SOCK_STREAM,
> +				     IPPROTO_TCP, sock);
>   }
>   
>   static void handshake_req_submit_test1(struct kunit *test)
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index d22bb0d3ddc1..03c3854f382a 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1644,8 +1644,9 @@ int inet_ctl_sock_create(struct sock **sk, unsigned short family,
>   			 struct net *net)
>   {
>   	struct socket *sock;
> -	int rc = sock_create_kern(net, family, type, protocol, &sock);
> +	int rc;
>   
> +	rc = sock_create_net_noref(net, family, type, protocol, &sock);
>   	if (rc == 0) {
>   		*sk = sock->sk;
>   		(*sk)->sk_allocation = GFP_ATOMIC;
> diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
> index 619a53eb672d..e8e079ebca36 100644
> --- a/net/ipv4/udp_tunnel_core.c
> +++ b/net/ipv4/udp_tunnel_core.c
> @@ -15,7 +15,7 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
>   	struct socket *sock = NULL;
>   	struct sockaddr_in udp_addr;
>   
> -	err = sock_create_kern(net, AF_INET, SOCK_DGRAM, 0, &sock);
> +	err = sock_create_net_noref(net, AF_INET, SOCK_DGRAM, 0, &sock);
>   	if (err < 0)
>   		goto error;
>   
> diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
> index c99053189ea8..65d859c7d9c4 100644
> --- a/net/ipv6/ip6_udp_tunnel.c
> +++ b/net/ipv6/ip6_udp_tunnel.c
> @@ -18,10 +18,10 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
>   		     struct socket **sockp)
>   {
>   	struct sockaddr_in6 udp6_addr = {};
> -	int err;
>   	struct socket *sock = NULL;
> +	int err;
>   
> -	err = sock_create_kern(net, AF_INET6, SOCK_DGRAM, 0, &sock);
> +	err = sock_create_net_noref(net, AF_INET6, SOCK_DGRAM, 0, &sock);
>   	if (err < 0)
>   		goto error;
>   
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 369a2f2e459c..e43534185f45 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1494,8 +1494,8 @@ static int l2tp_tunnel_sock_create(struct net *net,
>   		if (cfg->local_ip6 && cfg->peer_ip6) {
>   			struct sockaddr_l2tpip6 ip6_addr = {0};
>   
> -			err = sock_create_kern(net, AF_INET6, SOCK_DGRAM,
> -					       IPPROTO_L2TP, &sock);
> +			err = sock_create_net_noref(net, AF_INET6, SOCK_DGRAM,
> +						    IPPROTO_L2TP, &sock);
>   			if (err < 0)
>   				goto out;
>   
> @@ -1522,8 +1522,8 @@ static int l2tp_tunnel_sock_create(struct net *net,
>   		{
>   			struct sockaddr_l2tpip ip_addr = {0};
>   
> -			err = sock_create_kern(net, AF_INET, SOCK_DGRAM,
> -					       IPPROTO_L2TP, &sock);
> +			err = sock_create_net_noref(net, AF_INET, SOCK_DGRAM,
> +						    IPPROTO_L2TP, &sock);
>   			if (err < 0)
>   				goto out;
>   
> diff --git a/net/mctp/test/route-test.c b/net/mctp/test/route-test.c
> index 8551dab1d1e6..f1b2cf0c8b48 100644
> --- a/net/mctp/test/route-test.c
> +++ b/net/mctp/test/route-test.c
> @@ -310,7 +310,7 @@ static void __mctp_route_test_init(struct kunit *test,
>   	rt = mctp_test_create_route(&init_net, dev->mdev, 8, 68);
>   	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
>   
> -	rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
> +	rc = sock_create_net_noref(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
>   	KUNIT_ASSERT_EQ(test, rc, 0);
>   
>   	addr.smctp_family = AF_MCTP;
> @@ -568,7 +568,7 @@ static void mctp_test_route_input_sk_keys(struct kunit *test)
>   	rt = mctp_test_create_route(&init_net, dev->mdev, 8, 68);
>   	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
>   
> -	rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
> +	rc = sock_create_net_noref(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
>   	KUNIT_ASSERT_EQ(test, rc, 0);
>   
>   	msk = container_of(sock->sk, struct mctp_sock, sk);
> @@ -994,7 +994,7 @@ static void mctp_test_route_output_key_create(struct kunit *test)
>   	rt = mctp_test_create_route(&init_net, dev->mdev, dst, 68);
>   	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
>   
> -	rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
> +	rc = sock_create_net_noref(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
>   	KUNIT_ASSERT_EQ(test, rc, 0);
>   
>   	dev->mdev->addrs = kmalloc(sizeof(u8), GFP_KERNEL);
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 7a0f7998376a..3dc40a364fb2 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -1083,8 +1083,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
>   	int backlog = 1024;
>   	int err;
>   
> -	err = sock_create_kern(sock_net(sk), entry->addr.family,
> -			       SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
> +	err = sock_create_net_noref(sock_net(sk), entry->addr.family,
> +				    SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
>   	if (err)
>   		return err;
>   
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index e7e8972bdfca..7162873a232a 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1953,7 +1953,7 @@ static int subflow_ulp_init(struct sock *sk)
>   	int err = 0;
>   
>   	/* disallow attaching ULP to a socket unless it has been
> -	 * created with sock_create_kern()
> +	 * created with sock_create_net()
>   	 */
>   	if (!sk->sk_kern_sock) {
>   		err = -EOPNOTSUPP;
> diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
> index 3402675bf521..e97cd30f196a 100644
> --- a/net/netfilter/ipvs/ip_vs_sync.c
> +++ b/net/netfilter/ipvs/ip_vs_sync.c
> @@ -1470,8 +1470,8 @@ static int make_send_sock(struct netns_ipvs *ipvs, int id,
>   	int result, salen;
>   
>   	/* First create a socket */
> -	result = sock_create_kern(ipvs->net, ipvs->mcfg.mcast_af, SOCK_DGRAM,
> -				  IPPROTO_UDP, &sock);
> +	result = sock_create_net_noref(ipvs->net, ipvs->mcfg.mcast_af, SOCK_DGRAM,
> +				       IPPROTO_UDP, &sock);
>   	if (result < 0) {
>   		pr_err("Error during creation of socket; terminating\n");
>   		goto error;
> @@ -1527,8 +1527,8 @@ static int make_receive_sock(struct netns_ipvs *ipvs, int id,
>   	int result, salen;
>   
>   	/* First create a socket */
> -	result = sock_create_kern(ipvs->net, ipvs->bcfg.mcast_af, SOCK_DGRAM,
> -				  IPPROTO_UDP, &sock);
> +	result = sock_create_net_noref(ipvs->net, ipvs->bcfg.mcast_af, SOCK_DGRAM,
> +				       IPPROTO_UDP, &sock);
>   	if (result < 0) {
>   		pr_err("Error during creation of socket; terminating\n");
>   		goto error;
> diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
> index 3de9350cbf30..2f8f347150c0 100644
> --- a/net/qrtr/ns.c
> +++ b/net/qrtr/ns.c
> @@ -692,8 +692,8 @@ int qrtr_ns_init(void)
>   	INIT_LIST_HEAD(&qrtr_ns.lookups);
>   	INIT_WORK(&qrtr_ns.work, qrtr_ns_worker);
>   
> -	ret = sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
> -			       PF_QIPCRTR, &qrtr_ns.sock);
> +	ret = sock_create_net_noref(&init_net, AF_QIPCRTR, SOCK_DGRAM,
> +				    PF_QIPCRTR, &qrtr_ns.sock);
>   	if (ret < 0)
>   		return ret;
>   
> @@ -735,7 +735,7 @@ int qrtr_ns_init(void)
>   	 *  qrtr module is inserted successfully.
>   	 *
>   	 * However, the reference count is increased twice in
> -	 * sock_create_kern(): one is to increase the reference count of owner
> +	 * sock_create_net_noref(): one is to increase the reference count of owner
>   	 * of qrtr socket's proto_ops struct; another is to increment the
>   	 * reference count of owner of qrtr proto struct. Therefore, we must
>   	 * decrement the module reference count twice to ensure that it keeps
> diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
> index 440ac9057148..202afd77b532 100644
> --- a/net/rds/tcp_listen.c
> +++ b/net/rds/tcp_listen.c
> @@ -289,8 +289,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6)
>   	int addr_len;
>   	int ret;
>   
> -	ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
> -			       IPPROTO_TCP, &sock);
> +	ret = sock_create_net_noref(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
> +				    IPPROTO_TCP, &sock);
>   	if (ret < 0) {
>   		rdsdebug("could not create %s listener socket: %d\n",
>   			 isv6 ? "IPv6" : "IPv4", ret);
> diff --git a/net/rxrpc/rxperf.c b/net/rxrpc/rxperf.c
> index 7ef93407be83..1c784d449a6b 100644
> --- a/net/rxrpc/rxperf.c
> +++ b/net/rxrpc/rxperf.c
> @@ -182,8 +182,8 @@ static int rxperf_open_socket(void)
>   	struct socket *socket;
>   	int ret;
>   
> -	ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
> -			       &socket);
> +	ret = sock_create_net_noref(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
> +				    &socket);
>   	if (ret < 0)
>   		goto error_1;
>   
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index e49904f08559..fb8ed0290a4a 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1328,7 +1328,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk, struct sockaddr *kaddrs,
>   		return err;
>   
>   	/* in-kernel sockets don't generally have a file allocated to them
> -	 * if all they do is call sock_create_kern().
> +	 * if all they do is call sock_create_net_noref().
>   	 */
>   	if (sk->sk_socket->file)
>   		flags = sk->sk_socket->file->f_flags;
> diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c
> index a944e7dcb8b9..dbd76070e05e 100644
> --- a/net/smc/smc_inet.c
> +++ b/net/smc/smc_inet.c
> @@ -111,7 +111,7 @@ static struct inet_protosw smc_inet6_protosw = {
>   static unsigned int smc_sync_mss(struct sock *sk, u32 pmtu)
>   {
>   	/* No need pass it through to clcsock, mss can always be set by
> -	 * sock_create_kern or smc_setsockopt.
> +	 * sock_create_net or smc_setsockopt.
>   	 */
>   	return 0;
>   }

Only for the smc part:
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

> diff --git a/net/socket.c b/net/socket.c
> index 992de3dd94b8..8f45d17e52c3 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1665,23 +1665,36 @@ int sock_create_net(struct net *net, int family, int type, int protocol,
>   EXPORT_SYMBOL(sock_create_net);
>   
>   /**
> - *	sock_create_kern - creates a socket (kernel space)
> - *	@net: net namespace
> - *	@family: protocol family (AF_INET, ...)
> - *	@type: communication type (SOCK_STREAM, ...)
> - *	@protocol: protocol (0, ...)
> - *	@res: new socket
> + * sock_create_net_noref - creates a socket for kernel space
> + *
> + * @net: net namespace
> + * @family: protocol family (AF_INET, ...)
> + * @type: communication type (SOCK_STREAM, ...)
> + * @protocol: protocol (0, ...)
> + * @res: new socket
>    *
> - *	A wrapper around __sock_create().
> - *	Returns 0 or an error. This function internally uses GFP_KERNEL.
> + * Creates a new socket and assigns it to @res, passing through LSM.
> + *
> + * The socket is for kernel space and should not be exposed to
> + * userspace via a file descriptor nor BPF hooks except for LSM
> + * (see inet_create(), inet_release(), etc).
> + *
> + * The socket DOES NOT hold a reference count of @net to allow it to
> + * be removed; the caller MUST ensure that the socket is always freed
> + * before @net.
> + *
> + * @net MUST be alive as of calling sock_create_net_noref().
> + *
> + * Context: Process context. This function internally uses GFP_KERNEL.
> + * Return: 0 or an error.
>    */
>   
> -int sock_create_kern(struct net *net, int family, int type, int protocol,
> -		     struct socket **res)
> +int sock_create_net_noref(struct net *net, int family, int type, int protocol,
> +			  struct socket **res)
>   {
>   	return __sock_create(net, family, type, protocol, res, true, false);
>   }
> -EXPORT_SYMBOL(sock_create_kern);
> +EXPORT_SYMBOL(sock_create_net_noref);
>   
>   static struct socket *__sys_socket_create(int family, int type, int protocol)
>   {
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 37935082d799..4e8723403e07 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1450,8 +1450,8 @@ static int rpc_sockname(struct net *net, struct sockaddr *sap, size_t salen,
>   	struct socket *sock;
>   	int err;
>   
> -	err = sock_create_kern(net, sap->sa_family,
> -			       SOCK_DGRAM, IPPROTO_UDP, &sock);
> +	err = sock_create_net_noref(net, sap->sa_family,
> +				    SOCK_DGRAM, IPPROTO_UDP, &sock);
>   	if (err < 0) {
>   		dprintk("RPC:       can't create UDP socket (%d)\n", err);
>   		goto out;
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index cde5765f6f81..e20465c20b16 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1529,7 +1529,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>   	if (protocol == IPPROTO_TCP)
>   		error = sock_create_net(net, family, type, protocol, &sock);
>   	else
> -		error = sock_create_kern(net, family, type, protocol, &sock);
> +		error = sock_create_net_noref(net, family, type, protocol, &sock);
>   	if (error < 0)
>   		return ERR_PTR(error);
>   
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index f3e139c30442..e793914d48f6 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1927,7 +1927,7 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
>   	if (protocol == IPPROTO_TCP)
>   		err = sock_create_net(xprt->xprt_net, family, type, protocol, &sock);
>   	else
> -		err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
> +		err = sock_create_net_noref(xprt->xprt_net, family, type, protocol, &sock);
>   	if (err < 0) {
>   		dprintk("RPC:       can't create %d transport socket (%d).\n",
>   				protocol, -err);
> @@ -1999,8 +1999,8 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
>   	struct socket *sock;
>   	int status;
>   
> -	status = sock_create_kern(xprt->xprt_net, AF_LOCAL,
> -				  SOCK_STREAM, 0, &sock);
> +	status = sock_create_net_noref(xprt->xprt_net, AF_LOCAL,
> +				       SOCK_STREAM, 0, &sock);
>   	if (status < 0) {
>   		dprintk("RPC:       can't create AF_LOCAL "
>   			"transport socket (%d).\n", -status);
> diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
> index 8ee0c07d00e9..2e03391c1bd1 100644
> --- a/net/tipc/topsrv.c
> +++ b/net/tipc/topsrv.c
> @@ -515,7 +515,7 @@ static int tipc_topsrv_create_listener(struct tipc_topsrv *srv)
>   	struct sock *sk;
>   	int rc;
>   
> -	rc = sock_create_kern(srv->net, AF_TIPC, SOCK_SEQPACKET, 0, &lsock);
> +	rc = sock_create_net_noref(srv->net, AF_TIPC, SOCK_SEQPACKET, 0, &lsock);
>   	if (rc < 0)
>   		return rc;
>   
> @@ -553,7 +553,7 @@ static int tipc_topsrv_create_listener(struct tipc_topsrv *srv)
>   	 * after TIPC module is inserted successfully.
>   	 *
>   	 * However, the reference count is ever increased twice in
> -	 * sock_create_kern(): one is to increase the reference count of owner
> +	 * sock_create_net_noref(): one is to increase the reference count of owner
>   	 * of TIPC socket's proto_ops struct; another is to increment the
>   	 * reference count of owner of TIPC proto struct. Therefore, we must
>   	 * decrement the module reference count twice to ensure that it keeps
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index 27c58fd260e0..fef671d39d5e 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -13689,8 +13689,8 @@ static int nl80211_parse_wowlan_tcp(struct cfg80211_registered_device *rdev,
>   	port = nla_get_u16_default(tb[NL80211_WOWLAN_TCP_SRC_PORT], 0);
>   #ifdef CONFIG_INET
>   	/* allocate a socket and port for it and use it */
> -	err = sock_create_kern(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
> -			       IPPROTO_TCP, &cfg->sock);
> +	err = sock_create_net_noref(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
> +				    IPPROTO_TCP, &cfg->sock);
>   	if (err) {
>   		kfree(cfg);
>   		return err;
> diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> index cc9dde507aba..b6e78e9d3280 100644
> --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> @@ -804,8 +804,8 @@ __bpf_kfunc int bpf_kfunc_init_sock(struct init_sock_args *args)
>   		goto out;
>   	}
>   
> -	err = sock_create_kern(current->nsproxy->net_ns, args->af, args->type,
> -			       proto, &sock);
> +	err = sock_create_net_noref(current->nsproxy->net_ns, args->af, args->type,
> +				    proto, &sock);
>   
>   	if (!err)
>   		/* Set timeout for call to kernel_connect() to prevent it from hanging,


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create().
  2024-12-13  9:21 ` [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create() Kuniyuki Iwashima
@ 2024-12-13 13:46   ` Wenjia Zhang
  2024-12-17 10:24   ` Paolo Abeni
  1 sibling, 0 replies; 27+ messages in thread
From: Wenjia Zhang @ 2024-12-13 13:46 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: Kuniyuki Iwashima, netdev



On 13.12.24 10:21, Kuniyuki Iwashima wrote:
> We will introduce a new API to create a kernel socket with netns refcnt
> held.  Then, sk_alloc() needs the hold_net flag passed to __sock_create().
> 
> Let's pass it down to net_proto_family.create() and functions that call
> sk_alloc().
> 
> While at it, we convert the kern flag to boolean.
> 
> Note that we still need to pass hold_net to struct pppox_proto.create()
> and struct nfc_protocol.create() before passing hold_net to sk_alloc().
> 
> Also, we use !kern as hold_net in the accept() paths.  We will add the
> hold_net flag to struct proto_accept_arg later.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>   crypto/af_alg.c                   |  2 +-
>   drivers/isdn/mISDN/socket.c       | 13 ++++++++-----
>   drivers/net/ppp/pppox.c           |  2 +-
>   include/linux/net.h               |  2 +-
>   include/net/bluetooth/bluetooth.h |  3 ++-
>   include/net/llc_conn.h            |  2 +-
>   net/appletalk/ddp.c               |  2 +-
>   net/atm/common.c                  |  3 ++-
>   net/atm/common.h                  |  3 ++-
>   net/atm/pvc.c                     |  4 ++--
>   net/atm/svc.c                     |  8 ++++----
>   net/ax25/af_ax25.c                |  2 +-
>   net/bluetooth/af_bluetooth.c      |  7 ++++---
>   net/bluetooth/bnep/sock.c         |  5 +++--
>   net/bluetooth/cmtp/sock.c         |  2 +-
>   net/bluetooth/hci_sock.c          |  4 ++--
>   net/bluetooth/hidp/sock.c         |  5 +++--
>   net/bluetooth/iso.c               | 11 ++++++-----
>   net/bluetooth/l2cap_sock.c        | 14 ++++++++------
>   net/bluetooth/rfcomm/sock.c       | 12 +++++++-----
>   net/bluetooth/sco.c               | 11 ++++++-----
>   net/caif/caif_socket.c            |  2 +-
>   net/can/af_can.c                  |  2 +-
>   net/ieee802154/socket.c           |  2 +-
>   net/ipv4/af_inet.c                |  2 +-
>   net/ipv6/af_inet6.c               |  2 +-
>   net/iucv/af_iucv.c                | 11 ++++++-----
>   net/kcm/kcmsock.c                 |  2 +-
>   net/key/af_key.c                  |  2 +-
>   net/llc/af_llc.c                  |  6 ++++--
>   net/llc/llc_conn.c                |  9 ++++++---
>   net/mctp/af_mctp.c                |  2 +-
>   net/netlink/af_netlink.c          |  8 ++++----
>   net/netrom/af_netrom.c            |  2 +-
>   net/nfc/af_nfc.c                  |  2 +-
>   net/packet/af_packet.c            |  2 +-
>   net/phonet/af_phonet.c            |  2 +-
>   net/qrtr/af_qrtr.c                |  2 +-
>   net/rds/af_rds.c                  |  2 +-
>   net/rose/af_rose.c                |  2 +-
>   net/rxrpc/af_rxrpc.c              |  2 +-
>   net/smc/af_smc.c                  | 15 ++++++++-------
>   net/socket.c                      |  2 +-
>   net/tipc/socket.c                 |  6 ++++--
>   net/unix/af_unix.c                |  9 +++++----
>   net/vmw_vsock/af_vsock.c          |  8 ++++----
>   net/x25/af_x25.c                  | 13 ++++++++-----
>   net/xdp/xsk.c                     |  2 +-
>   48 files changed, 133 insertions(+), 105 deletions(-)
> 
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index 0da7c1ac778a..e60032b94d97 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -503,7 +503,7 @@ static void alg_sock_destruct(struct sock *sk)
>   }
>   
>   static int alg_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	int err;
> diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
> index b215b28cad7b..54157c24ccb9 100644
> --- a/drivers/isdn/mISDN/socket.c
> +++ b/drivers/isdn/mISDN/socket.c
> @@ -590,7 +590,8 @@ static const struct proto_ops data_sock_ops = {
>   };
>   
>   static int
> -data_sock_create(struct net *net, struct socket *sock, int protocol, int kern)
> +data_sock_create(struct net *net, struct socket *sock, int protocol,
> +		 bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -746,7 +747,8 @@ static const struct proto_ops base_sock_ops = {
>   
>   
>   static int
> -base_sock_create(struct net *net, struct socket *sock, int protocol, int kern)
> +base_sock_create(struct net *net, struct socket *sock, int protocol,
> +		 bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -771,13 +773,14 @@ base_sock_create(struct net *net, struct socket *sock, int protocol, int kern)
>   }
>   
>   static int
> -mISDN_sock_create(struct net *net, struct socket *sock, int proto, int kern)
> +mISDN_sock_create(struct net *net, struct socket *sock, int proto,
> +		  bool kern, bool hold_net)
>   {
>   	int err = -EPROTONOSUPPORT;
>   
>   	switch (proto) {
>   	case ISDN_P_BASE:
> -		err = base_sock_create(net, sock, proto, kern);
> +		err = base_sock_create(net, sock, proto, kern, hold_net);
>   		break;
>   	case ISDN_P_TE_S0:
>   	case ISDN_P_NT_S0:
> @@ -791,7 +794,7 @@ mISDN_sock_create(struct net *net, struct socket *sock, int proto, int kern)
>   	case ISDN_P_B_L2DTMF:
>   	case ISDN_P_B_L2DSP:
>   	case ISDN_P_B_L2DSPHDLC:
> -		err = data_sock_create(net, sock, proto, kern);
> +		err = data_sock_create(net, sock, proto, kern, hold_net);
>   		break;
>   	default:
>   		return err;
> diff --git a/drivers/net/ppp/pppox.c b/drivers/net/ppp/pppox.c
> index 08364f10a43f..53b3f790d1f5 100644
> --- a/drivers/net/ppp/pppox.c
> +++ b/drivers/net/ppp/pppox.c
> @@ -112,7 +112,7 @@ EXPORT_SYMBOL(pppox_compat_ioctl);
>   #endif
>   
>   static int pppox_create(struct net *net, struct socket *sock, int protocol,
> -			int kern)
> +			bool kern, bool hold_net)
>   {
>   	int rc = -EPROTOTYPE;
>   
> diff --git a/include/linux/net.h b/include/linux/net.h
> index 68ac97e301be..c2a35a102ee2 100644
> --- a/include/linux/net.h
> +++ b/include/linux/net.h
> @@ -233,7 +233,7 @@ struct proto_ops {
>   struct net_proto_family {
>   	int		family;
>   	int		(*create)(struct net *net, struct socket *sock,
> -				  int protocol, int kern);
> +				  int protocol, bool kern, bool hold_net);
>   	struct module	*owner;
>   };
>   
> diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
> index 435250c72d56..58afa3fd08af 100644
> --- a/include/net/bluetooth/bluetooth.h
> +++ b/include/net/bluetooth/bluetooth.h
> @@ -406,7 +406,8 @@ void bt_sock_link(struct bt_sock_list *l, struct sock *s);
>   void bt_sock_unlink(struct bt_sock_list *l, struct sock *s);
>   bool bt_sock_linked(struct bt_sock_list *l, struct sock *s);
>   struct sock *bt_sock_alloc(struct net *net, struct socket *sock,
> -			   struct proto *prot, int proto, gfp_t prio, int kern);
> +			   struct proto *prot, int proto, gfp_t prio,
> +			   bool kern, bool hold_net);
>   int  bt_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>   		     int flags);
>   int  bt_sock_stream_recvmsg(struct socket *sock, struct msghdr *msg,
> diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h
> index 374411b3066c..7d8b928a5ff6 100644
> --- a/include/net/llc_conn.h
> +++ b/include/net/llc_conn.h
> @@ -97,7 +97,7 @@ static __inline__ char llc_backlog_type(struct sk_buff *skb)
>   }
>   
>   struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
> -			  struct proto *prot, int kern);
> +			  struct proto *prot, bool kern, bool hold_net);
>   void llc_sk_stop_all_timers(struct sock *sk, bool sync);
>   void llc_sk_free(struct sock *sk);
>   
> diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
> index b068651984fe..9bd361ccf5f4 100644
> --- a/net/appletalk/ddp.c
> +++ b/net/appletalk/ddp.c
> @@ -1030,7 +1030,7 @@ static struct proto ddp_proto = {
>    * set the state.
>    */
>   static int atalk_create(struct net *net, struct socket *sock, int protocol,
> -			int kern)
> +			bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	int rc = -ESOCKTNOSUPPORT;
> diff --git a/net/atm/common.c b/net/atm/common.c
> index 9b75699992ff..c1e05b0c0b4b 100644
> --- a/net/atm/common.c
> +++ b/net/atm/common.c
> @@ -137,7 +137,8 @@ static struct proto vcc_proto = {
>   	.release_cb = vcc_release_cb,
>   };
>   
> -int vcc_create(struct net *net, struct socket *sock, int protocol, int family, int kern)
> +int vcc_create(struct net *net, struct socket *sock, int protocol, int family,
> +	       bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct atm_vcc *vcc;
> diff --git a/net/atm/common.h b/net/atm/common.h
> index a1e56e8de698..410419873eb6 100644
> --- a/net/atm/common.h
> +++ b/net/atm/common.h
> @@ -11,7 +11,8 @@
>   #include <linux/poll.h> /* for poll_table */
>   
>   
> -int vcc_create(struct net *net, struct socket *sock, int protocol, int family, int kern);
> +int vcc_create(struct net *net, struct socket *sock, int protocol, int family,
> +	       bool kern, bool hold_net);
>   int vcc_release(struct socket *sock);
>   int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
>   int vcc_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> diff --git a/net/atm/pvc.c b/net/atm/pvc.c
> index 66d9a9bd5896..6238c1809481 100644
> --- a/net/atm/pvc.c
> +++ b/net/atm/pvc.c
> @@ -130,13 +130,13 @@ static const struct proto_ops pvc_proto_ops = {
>   
>   
>   static int pvc_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
>   	if (net != &init_net)
>   		return -EAFNOSUPPORT;
>   
>   	sock->ops = &pvc_proto_ops;
> -	return vcc_create(net, sock, protocol, PF_ATMPVC, kern);
> +	return vcc_create(net, sock, protocol, PF_ATMPVC, kern, hold_net);
>   }
>   
>   static const struct net_proto_family pvc_family_ops = {
> diff --git a/net/atm/svc.c b/net/atm/svc.c
> index f8137ae693b0..9795294f4c1e 100644
> --- a/net/atm/svc.c
> +++ b/net/atm/svc.c
> @@ -34,7 +34,7 @@
>   #endif
>   
>   static int svc_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern);
> +		      bool kern, bool hold_net);
>   
>   /*
>    * Note: since all this is still nicely synchronized with the signaling demon,
> @@ -336,7 +336,7 @@ static int svc_accept(struct socket *sock, struct socket *newsock,
>   
>   	lock_sock(sk);
>   
> -	error = svc_create(sock_net(sk), newsock, 0, arg->kern);
> +	error = svc_create(sock_net(sk), newsock, 0, arg->kern, !arg->kern);
>   	if (error)
>   		goto out;
>   
> @@ -658,7 +658,7 @@ static const struct proto_ops svc_proto_ops = {
>   
>   
>   static int svc_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
>   	int error;
>   
> @@ -666,7 +666,7 @@ static int svc_create(struct net *net, struct socket *sock, int protocol,
>   		return -EAFNOSUPPORT;
>   
>   	sock->ops = &svc_proto_ops;
> -	error = vcc_create(net, sock, protocol, AF_ATMSVC, kern);
> +	error = vcc_create(net, sock, protocol, AF_ATMSVC, kern, hold_net);
>   	if (error)
>   		return error;
>   	ATM_SD(sock)->local.sas_family = AF_ATMSVC;
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index d6f9fae06a9d..6c68b5e5b11c 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -830,7 +830,7 @@ static struct proto ax25_proto = {
>   };
>   
>   static int ax25_create(struct net *net, struct socket *sock, int protocol,
> -		       int kern)
> +		       bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	ax25_cb *ax25;
> diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
> index 0b4d0a8bd361..7c24a6f87281 100644
> --- a/net/bluetooth/af_bluetooth.c
> +++ b/net/bluetooth/af_bluetooth.c
> @@ -111,7 +111,7 @@ void bt_sock_unregister(int proto)
>   EXPORT_SYMBOL(bt_sock_unregister);
>   
>   static int bt_sock_create(struct net *net, struct socket *sock, int proto,
> -			  int kern)
> +			  bool kern, bool hold_net)
>   {
>   	int err;
>   
> @@ -129,7 +129,7 @@ static int bt_sock_create(struct net *net, struct socket *sock, int proto,
>   	read_lock(&bt_proto_lock);
>   
>   	if (bt_proto[proto] && try_module_get(bt_proto[proto]->owner)) {
> -		err = bt_proto[proto]->create(net, sock, proto, kern);
> +		err = bt_proto[proto]->create(net, sock, proto, kern, hold_net);
>   		if (!err)
>   			bt_sock_reclassify_lock(sock->sk, proto);
>   		module_put(bt_proto[proto]->owner);
> @@ -141,7 +141,8 @@ static int bt_sock_create(struct net *net, struct socket *sock, int proto,
>   }
>   
>   struct sock *bt_sock_alloc(struct net *net, struct socket *sock,
> -			   struct proto *prot, int proto, gfp_t prio, int kern)
> +			   struct proto *prot, int proto, gfp_t prio,
> +			   bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
> index 00d47bcf4d7d..d845cdb0e48b 100644
> --- a/net/bluetooth/bnep/sock.c
> +++ b/net/bluetooth/bnep/sock.c
> @@ -196,7 +196,7 @@ static struct proto bnep_proto = {
>   };
>   
>   static int bnep_sock_create(struct net *net, struct socket *sock, int protocol,
> -			    int kern)
> +			    bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -205,7 +205,8 @@ static int bnep_sock_create(struct net *net, struct socket *sock, int protocol,
>   	if (sock->type != SOCK_RAW)
>   		return -ESOCKTNOSUPPORT;
>   
> -	sk = bt_sock_alloc(net, sock, &bnep_proto, protocol, GFP_ATOMIC, kern);
> +	sk = bt_sock_alloc(net, sock, &bnep_proto, protocol, GFP_ATOMIC,
> +			   kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
> index 96d49d9fae96..2ea9da9fe1d5 100644
> --- a/net/bluetooth/cmtp/sock.c
> +++ b/net/bluetooth/cmtp/sock.c
> @@ -198,7 +198,7 @@ static struct proto cmtp_proto = {
>   };
>   
>   static int cmtp_sock_create(struct net *net, struct socket *sock, int protocol,
> -			    int kern)
> +			    bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index 022b86797acd..4c51d7ee8a3e 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -2188,7 +2188,7 @@ static struct proto hci_sk_proto = {
>   };
>   
>   static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
> -			   int kern)
> +			   bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -2200,7 +2200,7 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol,
>   	sock->ops = &hci_sock_ops;
>   
>   	sk = bt_sock_alloc(net, sock, &hci_sk_proto, protocol, GFP_ATOMIC,
> -			   kern);
> +			   kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c
> index c93aaeb3a3fa..0ebe94f39906 100644
> --- a/net/bluetooth/hidp/sock.c
> +++ b/net/bluetooth/hidp/sock.c
> @@ -247,7 +247,7 @@ static struct proto hidp_proto = {
>   };
>   
>   static int hidp_sock_create(struct net *net, struct socket *sock, int protocol,
> -			    int kern)
> +			    bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -256,7 +256,8 @@ static int hidp_sock_create(struct net *net, struct socket *sock, int protocol,
>   	if (sock->type != SOCK_RAW)
>   		return -ESOCKTNOSUPPORT;
>   
> -	sk = bt_sock_alloc(net, sock, &hidp_proto, protocol, GFP_ATOMIC, kern);
> +	sk = bt_sock_alloc(net, sock, &hidp_proto, protocol, GFP_ATOMIC,
> +			   kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c
> index 43d0ebe11100..9f3529fbadf4 100644
> --- a/net/bluetooth/iso.c
> +++ b/net/bluetooth/iso.c
> @@ -874,11 +874,12 @@ static struct bt_iso_qos default_qos = {
>   };
>   
>   static struct sock *iso_sock_alloc(struct net *net, struct socket *sock,
> -				   int proto, gfp_t prio, int kern)
> +				   int proto, gfp_t prio,
> +				   bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> -	sk = bt_sock_alloc(net, sock, &iso_proto, proto, prio, kern);
> +	sk = bt_sock_alloc(net, sock, &iso_proto, proto, prio, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   
> @@ -896,7 +897,7 @@ static struct sock *iso_sock_alloc(struct net *net, struct socket *sock,
>   }
>   
>   static int iso_sock_create(struct net *net, struct socket *sock, int protocol,
> -			   int kern)
> +			   bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -909,7 +910,7 @@ static int iso_sock_create(struct net *net, struct socket *sock, int protocol,
>   
>   	sock->ops = &iso_sock_ops;
>   
> -	sk = iso_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
> +	sk = iso_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> @@ -1911,7 +1912,7 @@ static void iso_conn_ready(struct iso_conn *conn)
>   		lock_sock(parent);
>   
>   		sk = iso_sock_alloc(sock_net(parent), NULL,
> -				    BTPROTO_ISO, GFP_ATOMIC, 0);
> +				    BTPROTO_ISO, GFP_ATOMIC, false, true);
>   		if (!sk) {
>   			release_sock(parent);
>   			return;
> diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
> index 3d2553dcdb1b..04fe3c622210 100644
> --- a/net/bluetooth/l2cap_sock.c
> +++ b/net/bluetooth/l2cap_sock.c
> @@ -45,7 +45,8 @@ static struct bt_sock_list l2cap_sk_list = {
>   static const struct proto_ops l2cap_sock_ops;
>   static void l2cap_sock_init(struct sock *sk, struct sock *parent);
>   static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock,
> -				     int proto, gfp_t prio, int kern);
> +				     int proto, gfp_t prio,
> +				     bool kern, bool hold_net);
>   static void l2cap_sock_cleanup_listen(struct sock *parent);
>   
>   bool l2cap_is_socket(struct socket *sock)
> @@ -1468,7 +1469,7 @@ static struct l2cap_chan *l2cap_sock_new_connection_cb(struct l2cap_chan *chan)
>   	}
>   
>   	sk = l2cap_sock_alloc(sock_net(parent), NULL, BTPROTO_L2CAP,
> -			      GFP_ATOMIC, 0);
> +			      GFP_ATOMIC, false, true);
>   	if (!sk) {
>   		release_sock(parent);
>   		return NULL;
> @@ -1871,12 +1872,13 @@ static struct proto l2cap_proto = {
>   };
>   
>   static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock,
> -				     int proto, gfp_t prio, int kern)
> +				     int proto, gfp_t prio,
> +				     bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct l2cap_chan *chan;
>   
> -	sk = bt_sock_alloc(net, sock, &l2cap_proto, proto, prio, kern);
> +	sk = bt_sock_alloc(net, sock, &l2cap_proto, proto, prio, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   
> @@ -1900,7 +1902,7 @@ static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock,
>   }
>   
>   static int l2cap_sock_create(struct net *net, struct socket *sock, int protocol,
> -			     int kern)
> +			     bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -1917,7 +1919,7 @@ static int l2cap_sock_create(struct net *net, struct socket *sock, int protocol,
>   
>   	sock->ops = &l2cap_sock_ops;
>   
> -	sk = l2cap_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
> +	sk = l2cap_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
> index 913402806fa0..b96046914a63 100644
> --- a/net/bluetooth/rfcomm/sock.c
> +++ b/net/bluetooth/rfcomm/sock.c
> @@ -269,7 +269,8 @@ static struct proto rfcomm_proto = {
>   };
>   
>   static struct sock *rfcomm_sock_alloc(struct net *net, struct socket *sock,
> -				      int proto, gfp_t prio, int kern)
> +				      int proto, gfp_t prio,
> +				      bool kern, bool hold_net)
>   {
>   	struct rfcomm_dlc *d;
>   	struct sock *sk;
> @@ -278,7 +279,7 @@ static struct sock *rfcomm_sock_alloc(struct net *net, struct socket *sock,
>   	if (!d)
>   		return NULL;
>   
> -	sk = bt_sock_alloc(net, sock, &rfcomm_proto, proto, prio, kern);
> +	sk = bt_sock_alloc(net, sock, &rfcomm_proto, proto, prio, kern, hold_net);
>   	if (!sk) {
>   		rfcomm_dlc_free(d);
>   		return NULL;
> @@ -303,7 +304,7 @@ static struct sock *rfcomm_sock_alloc(struct net *net, struct socket *sock,
>   }
>   
>   static int rfcomm_sock_create(struct net *net, struct socket *sock,
> -			      int protocol, int kern)
> +			      int protocol, bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -316,7 +317,7 @@ static int rfcomm_sock_create(struct net *net, struct socket *sock,
>   
>   	sock->ops = &rfcomm_sock_ops;
>   
> -	sk = rfcomm_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
> +	sk = rfcomm_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> @@ -952,7 +953,8 @@ int rfcomm_connect_ind(struct rfcomm_session *s, u8 channel, struct rfcomm_dlc *
>   		goto done;
>   	}
>   
> -	sk = rfcomm_sock_alloc(sock_net(parent), NULL, BTPROTO_RFCOMM, GFP_ATOMIC, 0);
> +	sk = rfcomm_sock_alloc(sock_net(parent), NULL, BTPROTO_RFCOMM, GFP_ATOMIC,
> +			       false, true);
>   	if (!sk)
>   		goto done;
>   
> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
> index aa7bfe26cb40..a1865df18d59 100644
> --- a/net/bluetooth/sco.c
> +++ b/net/bluetooth/sco.c
> @@ -545,11 +545,12 @@ static struct proto sco_proto = {
>   };
>   
>   static struct sock *sco_sock_alloc(struct net *net, struct socket *sock,
> -				   int proto, gfp_t prio, int kern)
> +				   int proto, gfp_t prio,
> +				   bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> -	sk = bt_sock_alloc(net, sock, &sco_proto, proto, prio, kern);
> +	sk = bt_sock_alloc(net, sock, &sco_proto, proto, prio, kern, hold_net);
>   	if (!sk)
>   		return NULL;
>   
> @@ -567,7 +568,7 @@ static struct sock *sco_sock_alloc(struct net *net, struct socket *sock,
>   }
>   
>   static int sco_sock_create(struct net *net, struct socket *sock, int protocol,
> -			   int kern)
> +			   bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -580,7 +581,7 @@ static int sco_sock_create(struct net *net, struct socket *sock, int protocol,
>   
>   	sock->ops = &sco_sock_ops;
>   
> -	sk = sco_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern);
> +	sk = sco_sock_alloc(net, sock, protocol, GFP_ATOMIC, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> @@ -1341,7 +1342,7 @@ static void sco_conn_ready(struct sco_conn *conn)
>   		lock_sock(parent);
>   
>   		sk = sco_sock_alloc(sock_net(parent), NULL,
> -				    BTPROTO_SCO, GFP_ATOMIC, 0);
> +				    BTPROTO_SCO, GFP_ATOMIC, false, true);
>   		if (!sk) {
>   			release_sock(parent);
>   			sco_conn_unlock(conn);
> diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
> index 039dfbd367c9..6eef0e83f442 100644
> --- a/net/caif/caif_socket.c
> +++ b/net/caif/caif_socket.c
> @@ -1015,7 +1015,7 @@ static void caif_sock_destructor(struct sock *sk)
>   }
>   
>   static int caif_create(struct net *net, struct socket *sock, int protocol,
> -		       int kern)
> +		       bool kern, bool hold_net)
>   {
>   	struct sock *sk = NULL;
>   	struct caifsock *cf_sk = NULL;
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index 01f3fbb3b67d..c4094ccc9978 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -112,7 +112,7 @@ static inline void can_put_proto(const struct can_proto *cp)
>   }
>   
>   static int can_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	const struct can_proto *cp;
> diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
> index 18d267921bb5..0dd1a8829c42 100644
> --- a/net/ieee802154/socket.c
> +++ b/net/ieee802154/socket.c
> @@ -999,7 +999,7 @@ static void ieee802154_sock_destruct(struct sock *sk)
>    * set the state.
>    */
>   static int ieee802154_create(struct net *net, struct socket *sock,
> -			     int protocol, int kern)
> +			     int protocol, bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	int rc;
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 8095e82de808..7313ec410fb5 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -250,7 +250,7 @@ EXPORT_SYMBOL(inet_listen);
>    */
>   
>   static int inet_create(struct net *net, struct socket *sock, int protocol,
> -		       int kern)
> +		       bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct inet_protosw *answer;
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index f60ec8b0f8ea..8f951e5e58ab 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -118,7 +118,7 @@ void inet6_sock_destruct(struct sock *sk)
>   EXPORT_SYMBOL_GPL(inet6_sock_destruct);
>   
>   static int inet6_create(struct net *net, struct socket *sock, int protocol,
> -			int kern)
> +			bool kern, bool hold_net)
>   {
>   	struct inet_sock *inet;
>   	struct ipv6_pinfo *np;
> diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
> index 7929df08d4e0..b7bbd4947855 100644
> --- a/net/iucv/af_iucv.c
> +++ b/net/iucv/af_iucv.c
> @@ -446,7 +446,8 @@ static void iucv_sock_init(struct sock *sk, struct sock *parent)
>   	}
>   }
>   
> -static struct sock *iucv_sock_alloc(struct socket *sock, int proto, gfp_t prio, int kern)
> +static struct sock *iucv_sock_alloc(struct socket *sock, int proto, gfp_t prio,
> +				    bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct iucv_sock *iucv;
> @@ -1632,7 +1633,7 @@ static int iucv_callback_connreq(struct iucv_path *path,
>   	}
>   
>   	/* Create the new socket */
> -	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, 0);
> +	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, false, true);
>   	if (!nsk) {
>   		err = pr_iucv->path_sever(path, user_data);
>   		iucv_path_free(path);
> @@ -1854,7 +1855,7 @@ static int afiucv_hs_callback_syn(struct sock *sk, struct sk_buff *skb)
>   		goto out;
>   	}
>   
> -	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, 0);
> +	nsk = iucv_sock_alloc(NULL, sk->sk_protocol, GFP_ATOMIC, false, true);
>   	bh_lock_sock(sk);
>   	if ((sk->sk_state != IUCV_LISTEN) ||
>   	    sk_acceptq_is_full(sk) ||
> @@ -2229,7 +2230,7 @@ static const struct proto_ops iucv_sock_ops = {
>   };
>   
>   static int iucv_sock_create(struct net *net, struct socket *sock, int protocol,
> -			    int kern)
> +			    bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -2248,7 +2249,7 @@ static int iucv_sock_create(struct net *net, struct socket *sock, int protocol,
>   		return -ESOCKTNOSUPPORT;
>   	}
>   
> -	sk = iucv_sock_alloc(sock, protocol, GFP_KERNEL, kern);
> +	sk = iucv_sock_alloc(sock, protocol, GFP_KERNEL, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
> index 24aec295a51c..50925046a392 100644
> --- a/net/kcm/kcmsock.c
> +++ b/net/kcm/kcmsock.c
> @@ -1778,7 +1778,7 @@ static const struct proto_ops kcm_seqpacket_ops = {
>   
>   /* Create proto operation for kcm sockets */
>   static int kcm_create(struct net *net, struct socket *sock,
> -		      int protocol, int kern)
> +		      int protocol, bool kern, bool hold_net)
>   {
>   	struct kcm_net *knet = net_generic(net, kcm_net_id);
>   	struct sock *sk;
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index c56bb4f451e6..1c35b1cfb1c5 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -136,7 +136,7 @@ static struct proto key_proto = {
>   };
>   
>   static int pfkey_create(struct net *net, struct socket *sock, int protocol,
> -			int kern)
> +			bool kern, bool hold_net)
>   {
>   	struct netns_pfkey *net_pfkey = net_generic(net, pfkey_net_id);
>   	struct sock *sk;
> diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
> index 0259cde394ba..5d865f4a5cb4 100644
> --- a/net/llc/af_llc.c
> +++ b/net/llc/af_llc.c
> @@ -163,13 +163,14 @@ static struct proto llc_proto = {
>    *	@sock: Socket to initialize and attach allocated sk to.
>    *	@protocol: Unused.
>    *	@kern: on behalf of kernel or userspace
> + *	@hold_net: hold netns refcnt or not
>    *
>    *	Allocate and initialize a new llc_ui socket, validate the user wants a
>    *	socket type we have available.
>    *	Returns 0 upon success, negative upon failure.
>    */
>   static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
> -			 int kern)
> +			 bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	int rc = -ESOCKTNOSUPPORT;
> @@ -182,7 +183,8 @@ static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
>   
>   	if (likely(sock->type == SOCK_DGRAM || sock->type == SOCK_STREAM)) {
>   		rc = -ENOMEM;
> -		sk = llc_sk_alloc(net, PF_LLC, GFP_KERNEL, &llc_proto, kern);
> +		sk = llc_sk_alloc(net, PF_LLC, GFP_KERNEL, &llc_proto,
> +				  kern, hold_net);
>   		if (sk) {
>   			rc = 0;
>   			llc_ui_sk_init(sock, sk);
> diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
> index afc6974eafda..75b2e21bfd2b 100644
> --- a/net/llc/llc_conn.c
> +++ b/net/llc/llc_conn.c
> @@ -761,10 +761,11 @@ static struct sock *llc_create_incoming_sock(struct sock *sk,
>   					     struct llc_addr *saddr,
>   					     struct llc_addr *daddr)
>   {
> -	struct sock *newsk = llc_sk_alloc(sock_net(sk), sk->sk_family, GFP_ATOMIC,
> -					  sk->sk_prot, 0);
>   	struct llc_sock *newllc, *llc = llc_sk(sk);
> +	struct sock *newsk;
>   
> +	newsk = llc_sk_alloc(sock_net(sk), sk->sk_family, GFP_ATOMIC,
> +			     sk->sk_prot, false, true);
>   	if (!newsk)
>   		goto out;
>   	newllc = llc_sk(newsk);
> @@ -923,11 +924,13 @@ static void llc_sk_init(struct sock *sk)
>    *	@priority: for allocation (%GFP_KERNEL, %GFP_ATOMIC, etc)
>    *	@prot: struct proto associated with this new sock instance
>    *	@kern: is this to be a kernel socket?
> + *	@hold_net: hold netns refcnt or not
>    *
>    *	Allocates a LLC sock and initializes it. Returns the new LLC sock
>    *	or %NULL if there's no memory available for one
>    */
> -struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority, struct proto *prot, int kern)
> +struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
> +			  struct proto *prot, bool kern, bool hold_net)
>   {
>   	struct sock *sk = sk_alloc(net, family, priority, prot, kern);
>   
> diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c
> index f6de136008f6..17821c976213 100644
> --- a/net/mctp/af_mctp.c
> +++ b/net/mctp/af_mctp.c
> @@ -682,7 +682,7 @@ static struct proto mctp_proto = {
>   };
>   
>   static int mctp_pf_create(struct net *net, struct socket *sock,
> -			  int protocol, int kern)
> +			  int protocol, bool kern, bool hold_net)
>   {
>   	const struct proto_ops *ops;
>   	struct proto *proto;
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index f4e7b5e4bb59..ddc51cb89c5b 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -619,7 +619,7 @@ static struct proto netlink_proto = {
>   };
>   
>   static int __netlink_create(struct net *net, struct socket *sock,
> -			    int protocol, int kern)
> +			    int protocol, bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct netlink_sock *nlk;
> @@ -645,7 +645,7 @@ static int __netlink_create(struct net *net, struct socket *sock,
>   }
>   
>   static int netlink_create(struct net *net, struct socket *sock, int protocol,
> -			  int kern)
> +			  bool kern, bool hold_net)
>   {
>   	struct module *module = NULL;
>   	struct netlink_sock *nlk;
> @@ -684,7 +684,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
>   	if (err < 0)
>   		goto out;
>   
> -	err = __netlink_create(net, sock, protocol, kern);
> +	err = __netlink_create(net, sock, protocol, kern, hold_net);
>   	if (err < 0)
>   		goto out_module;
>   
> @@ -2012,7 +2012,7 @@ __netlink_kernel_create(struct net *net, int unit, struct module *module,
>   	if (sock_create_lite(PF_NETLINK, SOCK_DGRAM, unit, &sock))
>   		return NULL;
>   
> -	if (__netlink_create(net, sock, unit, 1) < 0)
> +	if (__netlink_create(net, sock, unit, true, false) < 0)
>   		goto out_sock_release_nosk;
>   
>   	sk = sock->sk;
> diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
> index 6ee148f0e6d0..483f78951a19 100644
> --- a/net/netrom/af_netrom.c
> +++ b/net/netrom/af_netrom.c
> @@ -424,7 +424,7 @@ static struct proto nr_proto = {
>   };
>   
>   static int nr_create(struct net *net, struct socket *sock, int protocol,
> -		     int kern)
> +		     bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct nr_sock *nr;
> diff --git a/net/nfc/af_nfc.c b/net/nfc/af_nfc.c
> index dda323e0a473..4fb1c86fcc81 100644
> --- a/net/nfc/af_nfc.c
> +++ b/net/nfc/af_nfc.c
> @@ -16,7 +16,7 @@ static DEFINE_RWLOCK(proto_tab_lock);
>   static const struct nfc_protocol *proto_tab[NFC_SOCKPROTO_MAX];
>   
>   static int nfc_sock_create(struct net *net, struct socket *sock, int proto,
> -			   int kern)
> +			   bool kern, bool hold_net)
>   {
>   	int rc = -EPROTONOSUPPORT;
>   
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 886c0dd47b66..5a25dac333b0 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -3398,7 +3398,7 @@ static struct proto packet_proto = {
>    */
>   
>   static int packet_create(struct net *net, struct socket *sock, int protocol,
> -			 int kern)
> +			 bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct packet_sock *po;
> diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
> index a27efa4faa4e..4bdbc93c74fb 100644
> --- a/net/phonet/af_phonet.c
> +++ b/net/phonet/af_phonet.c
> @@ -48,7 +48,7 @@ static inline void phonet_proto_put(const struct phonet_protocol *pp)
>   /* protocol family functions */
>   
>   static int pn_socket_create(struct net *net, struct socket *sock, int protocol,
> -			    int kern)
> +			    bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct pn_sock *pn;
> diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c
> index 00c51cf693f3..c05711f79a37 100644
> --- a/net/qrtr/af_qrtr.c
> +++ b/net/qrtr/af_qrtr.c
> @@ -1258,7 +1258,7 @@ static struct proto qrtr_proto = {
>   };
>   
>   static int qrtr_create(struct net *net, struct socket *sock,
> -		       int protocol, int kern)
> +		       int protocol, bool kern, bool hold_net)
>   {
>   	struct qrtr_sock *ipc;
>   	struct sock *sk;
> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index 8435a20968ef..3e1bb40485ad 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
> @@ -695,7 +695,7 @@ static int __rds_create(struct socket *sock, struct sock *sk, int protocol)
>   }
>   
>   static int rds_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
> index 59050caab65c..1c175c92aa42 100644
> --- a/net/rose/af_rose.c
> +++ b/net/rose/af_rose.c
> @@ -544,7 +544,7 @@ static struct proto rose_proto = {
>   };
>   
>   static int rose_create(struct net *net, struct socket *sock, int protocol,
> -		       int kern)
> +		       bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct rose_sock *rose;
> diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
> index 86873399f7d5..f2374f65b1c0 100644
> --- a/net/rxrpc/af_rxrpc.c
> +++ b/net/rxrpc/af_rxrpc.c
> @@ -811,7 +811,7 @@ static __poll_t rxrpc_poll(struct file *file, struct socket *sock,
>    * create an RxRPC socket
>    */
>   static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
> -			int kern)
> +			bool kern, bool hold_net)
>   {
>   	struct rxrpc_net *rxnet;
>   	struct rxrpc_sock *rx;
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index b52bee98a3eb..2535b922f760 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -387,7 +387,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol)
>   }
>   
>   static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
> -				   int protocol, int kern)
> +				   int protocol, bool kern, bool hold_net)
>   {
>   	struct proto *prot;
>   	struct sock *sk;
> @@ -1715,7 +1715,8 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
>   	int rc = -EINVAL;
>   
>   	release_sock(lsk);
> -	new_sk = smc_sock_alloc(sock_net(lsk), NULL, lsk->sk_protocol, 0);
> +	new_sk = smc_sock_alloc(sock_net(lsk), NULL, lsk->sk_protocol,
> +				false, true);
>   	if (!new_sk) {
>   		rc = -ENOMEM;
>   		lsk->sk_err = ENOMEM;
> @@ -3331,7 +3332,7 @@ int smc_create_clcsk(struct net *net, struct sock *sk, int family)
>   }
>   
>   static int __smc_create(struct net *net, struct socket *sock, int protocol,
> -			int kern, struct socket *clcsock)
> +			bool kern, bool hold_net, struct socket *clcsock)
>   {
>   	int family = (protocol == SMCPROTO_SMC6) ? PF_INET6 : PF_INET;
>   	struct smc_sock *smc;
> @@ -3349,7 +3350,7 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
>   	rc = -ENOBUFS;
>   	sock->ops = &smc_sock_ops;
>   	sock->state = SS_UNCONNECTED;
> -	sk = smc_sock_alloc(net, sock, protocol, kern);
> +	sk = smc_sock_alloc(net, sock, protocol, kern, hold_net);
>   	if (!sk)
>   		goto out;
>   
> @@ -3371,9 +3372,9 @@ static int __smc_create(struct net *net, struct socket *sock, int protocol,
>   }
>   
>   static int smc_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
> -	return __smc_create(net, sock, protocol, kern, NULL);
> +	return __smc_create(net, sock, protocol, kern, hold_net, NULL);
>   }
>   
>   static const struct net_proto_family smc_sock_family_ops = {
> @@ -3408,7 +3409,7 @@ static int smc_ulp_init(struct sock *sk)
>   
>   	smcsock->type = SOCK_STREAM;
>   	__module_get(THIS_MODULE); /* tried in __tcp_ulp_find_autoload */
> -	ret = __smc_create(net, smcsock, protocol, 0, tcp);
> +	ret = __smc_create(net, smcsock, protocol, false, true, tcp);
>   	if (ret) {
>   		sock_release(smcsock); /* module_put() which ops won't be NULL */
>   		return ret;

Only for the smc part:
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

> diff --git a/net/socket.c b/net/socket.c
> index e5b4e0d34132..d1b4dadd67e4 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1561,7 +1561,7 @@ static int __sock_create(struct net *net, int family, int type, int protocol,
>   	/* Now protected by module ref count */
>   	rcu_read_unlock();
>   
> -	err = pf->create(net, sock, protocol, kern);
> +	err = pf->create(net, sock, protocol, kern, hold_net);
>   	if (err < 0) {
>   		/* ->create should release the allocated sock->sk object on error
>   		 * and make sure sock->sk is set to NULL to avoid use-after-free
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 65dcbb54f55d..4ee0bd1043e1 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -449,6 +449,7 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout)
>    * @sock: pre-allocated socket structure
>    * @protocol: protocol indicator (must be 0)
>    * @kern: caused by kernel or by userspace?
> + * @hold_net: hold netns refcnt or not
>    *
>    * This routine creates additional data structures used by the TIPC socket,
>    * initializes them, and links them together.
> @@ -456,7 +457,7 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout)
>    * Return: 0 on success, errno otherwise
>    */
>   static int tipc_sk_create(struct net *net, struct socket *sock,
> -			  int protocol, int kern)
> +			  int protocol, bool kern, bool hold_net)
>   {
>   	const struct proto_ops *ops;
>   	struct sock *sk;
> @@ -2735,7 +2736,8 @@ static int tipc_accept(struct socket *sock, struct socket *new_sock,
>   
>   	buf = skb_peek(&sk->sk_receive_queue);
>   
> -	res = tipc_sk_create(sock_net(sock->sk), new_sock, 0, arg->kern);
> +	res = tipc_sk_create(sock_net(sock->sk), new_sock, 0,
> +			     arg->kern, !arg->kern);
>   	if (res)
>   		goto exit;
>   	security_sk_clone(sock->sk, new_sock->sk);
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 6b1762300443..393be726004c 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1006,7 +1006,8 @@ struct proto unix_stream_proto = {
>   #endif
>   };
>   
> -static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, int type)
> +static struct sock *unix_create1(struct net *net, struct socket *sock, int type,
> +				 bool kern, bool hold_net)
>   {
>   	struct unix_sock *u;
>   	struct sock *sk;
> @@ -1061,7 +1062,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
>   }
>   
>   static int unix_create(struct net *net, struct socket *sock, int protocol,
> -		       int kern)
> +		       bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   
> @@ -1091,7 +1092,7 @@ static int unix_create(struct net *net, struct socket *sock, int protocol,
>   		return -ESOCKTNOSUPPORT;
>   	}
>   
> -	sk = unix_create1(net, sock, kern, sock->type);
> +	sk = unix_create1(net, sock, sock->type, kern, hold_net);
>   	if (IS_ERR(sk))
>   		return PTR_ERR(sk);
>   
> @@ -1568,7 +1569,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
>   	 */
>   
>   	/* create new sock for complete connection */
> -	newsk = unix_create1(net, NULL, 0, sock->type);
> +	newsk = unix_create1(net, NULL, sock->type, false, true);
>   	if (IS_ERR(newsk)) {
>   		err = PTR_ERR(newsk);
>   		newsk = NULL;
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 5cf8109f672a..f2ce92cd57c4 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -732,7 +732,7 @@ static struct sock *__vsock_create(struct net *net,
>   				   struct sock *parent,
>   				   gfp_t priority,
>   				   unsigned short type,
> -				   int kern)
> +				   bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct vsock_sock *psk;
> @@ -864,7 +864,7 @@ static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>   struct sock *vsock_create_connected(struct sock *parent)
>   {
>   	return __vsock_create(sock_net(parent), NULL, parent, GFP_KERNEL,
> -			      parent->sk_type, 0);
> +			      parent->sk_type, false, true);
>   }
>   EXPORT_SYMBOL_GPL(vsock_create_connected);
>   
> @@ -2399,7 +2399,7 @@ static const struct proto_ops vsock_seqpacket_ops = {
>   };
>   
>   static int vsock_create(struct net *net, struct socket *sock,
> -			int protocol, int kern)
> +			int protocol, bool kern, bool hold_net)
>   {
>   	struct vsock_sock *vsk;
>   	struct sock *sk;
> @@ -2427,7 +2427,7 @@ static int vsock_create(struct net *net, struct socket *sock,
>   
>   	sock->state = SS_UNCONNECTED;
>   
> -	sk = __vsock_create(net, sock, NULL, GFP_KERNEL, 0, kern);
> +	sk = __vsock_create(net, sock, NULL, GFP_KERNEL, 0, kern, hold_net);
>   	if (!sk)
>   		return -ENOMEM;
>   
> diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
> index 8dda4178497c..0b6c22b979e7 100644
> --- a/net/x25/af_x25.c
> +++ b/net/x25/af_x25.c
> @@ -505,11 +505,12 @@ static struct proto x25_proto = {
>   	.obj_size = sizeof(struct x25_sock),
>   };
>   
> -static struct sock *x25_alloc_socket(struct net *net, int kern)
> +static struct sock *x25_alloc_socket(struct net *net, bool kern, bool hold_net)
>   {
>   	struct x25_sock *x25;
> -	struct sock *sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern);
> +	struct sock *sk;
>   
> +	sk = sk_alloc(net, AF_X25, GFP_ATOMIC, &x25_proto, kern);
>   	if (!sk)
>   		goto out;
>   
> @@ -525,7 +526,7 @@ static struct sock *x25_alloc_socket(struct net *net, int kern)
>   }
>   
>   static int x25_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
>   	struct sock *sk;
>   	struct x25_sock *x25;
> @@ -543,7 +544,8 @@ static int x25_create(struct net *net, struct socket *sock, int protocol,
>   		goto out;
>   
>   	rc = -ENOMEM;
> -	if ((sk = x25_alloc_socket(net, kern)) == NULL)
> +	sk = x25_alloc_socket(net, kern, hold_net);
> +	if (!sk)
>   		goto out;
>   
>   	x25 = x25_sk(sk);
> @@ -592,7 +594,8 @@ static struct sock *x25_make_new(struct sock *osk)
>   	if (osk->sk_type != SOCK_SEQPACKET)
>   		goto out;
>   
> -	if ((sk = x25_alloc_socket(sock_net(osk), 0)) == NULL)
> +	sk = x25_alloc_socket(sock_net(osk), false, true);
> +	if (!sk)
>   		goto out;
>   
>   	x25 = x25_sk(sk);
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 3fa70286c846..5763ef355c73 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -1688,7 +1688,7 @@ static void xsk_destruct(struct sock *sk)
>   }
>   
>   static int xsk_create(struct net *net, struct socket *sock, int protocol,
> -		      int kern)
> +		      bool kern, bool hold_net)
>   {
>   	struct xdp_sock *xs;
>   	struct sock *sk;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 03/15] smc: Pass kern to smc_sock_alloc().
  2024-12-13  9:21 ` [PATCH v3 net-next 03/15] smc: Pass kern to smc_sock_alloc() Kuniyuki Iwashima
@ 2024-12-13 13:46   ` Wenjia Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Wenjia Zhang @ 2024-12-13 13:46 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: Kuniyuki Iwashima, netdev



On 13.12.24 10:21, Kuniyuki Iwashima wrote:
> AF_SMC was introduced in commit ac7138746e14 ("smc: establish
> new socket family").
> 
> Since then, smc_create() ignores the kern argument and calls
> smc_sock_alloc(), which calls sk_alloc() with hard-coded arguments.
> 
>    sk = sk_alloc(net, PF_SMC, GFP_KERNEL, prot, 0);
> 
> This means sock_create_kern(AF_SMC) always creates a userspace
> socket.
> 
> Later, commit d7cd421da9da ("net/smc: Introduce TCP ULP support")
> added another confusing call site.
> 
> smc_ulp_init() calls __smc_create() with kern=1, but again,
> smc_sock_alloc() allocates a userspace socket by calling
> sk_alloc() with kern=0.
> 
> To fix up the weird paths, let's pass kern down to smc_sock_alloc()
> and sk_alloc().
> 
> This commit does not introduce functional change because we have
> no in-tree users calling sock_create_kern(AF_SMC) and we change
> kern from 1 to 0 in smc_ulp_init().
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>   net/smc/af_smc.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 

Ok, thank you for the detailed description, LGTM!

Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion.
  2024-12-13 13:45   ` Wenjia Zhang
@ 2024-12-13 13:54     ` Kuniyuki Iwashima
  2024-12-13 15:15       ` Wenjia Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Kuniyuki Iwashima @ 2024-12-13 13:54 UTC (permalink / raw)
  To: wenjia
  Cc: alibuda, allison.henderson, chuck.lever, davem, edumazet, horms,
	jaka, jlayton, kuba, kuni1840, kuniyu, matttbe, netdev, pabeni,
	sfrench

From: Wenjia Zhang <wenjia@linux.ibm.com>
Date: Fri, 13 Dec 2024 14:45:20 +0100
> > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> > index 6e93f188a908..7b0de80b3aca 100644
> > --- a/net/smc/af_smc.c
> > +++ b/net/smc/af_smc.c
> > @@ -3310,25 +3310,8 @@ static const struct proto_ops smc_sock_ops = {
> >   
> >   int smc_create_clcsk(struct net *net, struct sock *sk, int family)
> >   {
> > -	struct smc_sock *smc = smc_sk(sk);
> > -	int rc;
> > -
> > -	rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
> > -			      &smc->clcsock);
> > -	if (rc)
> > -		return rc;
> > -
> > -	/* smc_clcsock_release() does not wait smc->clcsock->sk's
> > -	 * destruction;  its sk_state might not be TCP_CLOSE after
> > -	 * smc->sk is close()d, and TCP timers can be fired later,
> > -	 * which need net ref.
> > -	 */
> > -	sk = smc->clcsock->sk;
> > -	__netns_tracker_free(net, &sk->ns_tracker, false);
> > -	sk->sk_net_refcnt = 1;
> > -	get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> > -	sock_inuse_add(net, 1);
> I don't think this line shoud be removed. Otherwise, the popurse here to 
> manage the per namespace statistics in the case of network namespace 
> isolation would be lost.

Now it's counted in sk_alloc().

sock_create_net() below passes hold_net=true to sk_alloc() and if
sk->sk_netns_refcnt (== hold_net) is true, sock_inuse_add() is
called there.

See patch 9 and 10:
https://lore.kernel.org/netdev/20241213092152.14057-10-kuniyu@amazon.com/
https://lore.kernel.org/netdev/20241213092152.14057-11-kuniyu@amazon.com/


> @D. Wythe, could you please check it again? Maybe you have some good 
> testing on this case.
> 
> > -	return 0;
> > +	return sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP,
> > +			       &smc_sk(sk)->clcsock);
> >   }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion.
  2024-12-13  9:21 ` [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion Kuniyuki Iwashima
  2024-12-13 13:45   ` Wenjia Zhang
@ 2024-12-13 14:15   ` Chuck Lever
  2024-12-13 23:29   ` Allison Henderson
  2 siblings, 0 replies; 27+ messages in thread
From: Chuck Lever @ 2024-12-13 14:15 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: Kuniyuki Iwashima, netdev, Matthieu Baerts, Allison Henderson,
	Steve French, Wenjia Zhang, Jan Karcher, Jeff Layton

On 12/13/24 4:21 AM, Kuniyuki Iwashima wrote:
> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference count
> the netns of kernel sockets."), TCP kernel socket has caused many UAF.
> 
> We have converted such sockets to hold netns refcnt, and we have the
> same pattern in cifs, mptcp, rds, smc, and sunrpc.
> 
> Let's drop the conversion and use sock_create_net() instead.
> 
> The changes for cifs, mptcp, and smc are straightforward.
> 
> For rds, we need to move maybe_get_net() before sock_create_net() and
> sock->ops->accept().
> 
> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> call sock_create_kern() for others.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Acked-by: Allison Henderson <allison.henderson@oracle.com>
> ---
> v3: Add missing mutex_unlock in rds_tcp_conn_path_connect().
> v2: Collect Acked-by from MPTCP and RDS maintainers
> 
> Cc: Steve French <sfrench@samba.org>
> Cc: Wenjia Zhang <wenjia@linux.ibm.com>
> Cc: Jan Karcher <jaka@linux.ibm.com>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/smb/client/connect.c | 13 ++-----------
>   net/mptcp/subflow.c     | 10 +---------
>   net/rds/tcp.c           | 14 --------------
>   net/rds/tcp_connect.c   | 21 +++++++++++++++------
>   net/rds/tcp_listen.c    | 14 ++++++++++++--
>   net/smc/af_smc.c        | 21 ++-------------------
>   net/sunrpc/svcsock.c    | 12 ++++++------
>   net/sunrpc/xprtsock.c   | 12 ++++--------
>   8 files changed, 42 insertions(+), 75 deletions(-)
> 
> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index c36c1b4ffe6e..7a67b86c0423 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -3130,22 +3130,13 @@ generic_ip_connect(struct TCP_Server_Info *server)
>   	if (server->ssocket) {
>   		socket = server->ssocket;
>   	} else {
> -		struct net *net = cifs_net_ns(server);
> -		struct sock *sk;
> -
> -		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
> -				      IPPROTO_TCP, &server->ssocket);
> +		rc = sock_create_net(cifs_net_ns(server), sfamily, SOCK_STREAM,
> +				     IPPROTO_TCP, &server->ssocket);
>   		if (rc < 0) {
>   			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
>   			return rc;
>   		}
>   
> -		sk = server->ssocket->sk;
> -		__netns_tracker_free(net, &sk->ns_tracker, false);
> -		sk->sk_net_refcnt = 1;
> -		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -
>   		/* BB other socket options to set KEEPALIVE, NODELAY? */
>   		cifs_dbg(FYI, "Socket created\n");
>   		socket = server->ssocket;
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index fd021cf8286e..e7e8972bdfca 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1755,7 +1755,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>   	if (unlikely(!sk->sk_socket))
>   		return -EINVAL;
>   
> -	err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
> +	err = sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
>   	if (err)
>   		return err;
>   
> @@ -1768,14 +1768,6 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>   	/* the newly created socket has to be in the same cgroup as its parent */
>   	mptcp_attach_cgroup(sk, sf->sk);
>   
> -	/* kernel sockets do not by default acquire net ref, but TCP timer
> -	 * needs it.
> -	 * Update ns_tracker to current stack trace and refcounted tracker.
> -	 */
> -	__netns_tracker_free(net, &sf->sk->ns_tracker, false);
> -	sf->sk->sk_net_refcnt = 1;
> -	get_net_track(net, &sf->sk->ns_tracker, GFP_KERNEL);
> -	sock_inuse_add(net, 1);
>   	err = tcp_set_ulp(sf->sk, "mptcp");
>   	if (err)
>   		goto err_free;
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 351ac1747224..4509900476f7 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -494,21 +494,7 @@ bool rds_tcp_tune(struct socket *sock)
>   
>   	tcp_sock_set_nodelay(sock->sk);
>   	lock_sock(sk);
> -	/* TCP timer functions might access net namespace even after
> -	 * a process which created this net namespace terminated.
> -	 */
> -	if (!sk->sk_net_refcnt) {
> -		if (!maybe_get_net(net)) {
> -			release_sock(sk);
> -			return false;
> -		}
> -		/* Update ns_tracker to current stack trace and refcounted tracker */
> -		__netns_tracker_free(net, &sk->ns_tracker, false);
>   
> -		sk->sk_net_refcnt = 1;
> -		netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -	}
>   	rtn = net_generic(net, rds_tcp_netid);
>   	if (rtn->sndbuf_size > 0) {
>   		sk->sk_sndbuf = rtn->sndbuf_size;
> diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
> index a0046e99d6df..c9449780f952 100644
> --- a/net/rds/tcp_connect.c
> +++ b/net/rds/tcp_connect.c
> @@ -93,6 +93,7 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
>   	struct sockaddr_in6 sin6;
>   	struct sockaddr_in sin;
>   	struct sockaddr *addr;
> +	struct net *net;
>   	int addrlen;
>   	bool isv6;
>   	int ret;
> @@ -107,20 +108,28 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
>   
>   	mutex_lock(&tc->t_conn_path_lock);
>   
> +	net = rds_conn_net(conn);
> +
>   	if (rds_conn_path_up(cp)) {
> -		mutex_unlock(&tc->t_conn_path_lock);
> -		return 0;
> +		ret = 0;
> +		goto out;
>   	}
> +
> +	if (!maybe_get_net(net)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
>   	if (ipv6_addr_v4mapped(&conn->c_laddr)) {
> -		ret = sock_create_kern(rds_conn_net(conn), PF_INET,
> -				       SOCK_STREAM, IPPROTO_TCP, &sock);
> +		ret = sock_create_net(net, PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
>   		isv6 = false;
>   	} else {
> -		ret = sock_create_kern(rds_conn_net(conn), PF_INET6,
> -				       SOCK_STREAM, IPPROTO_TCP, &sock);
> +		ret = sock_create_net(net, PF_INET6, SOCK_STREAM, IPPROTO_TCP, &sock);
>   		isv6 = true;
>   	}
>   
> +	put_net(net);
> +
>   	if (ret < 0)
>   		goto out;
>   
> diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
> index 69aaf03ab93e..440ac9057148 100644
> --- a/net/rds/tcp_listen.c
> +++ b/net/rds/tcp_listen.c
> @@ -101,6 +101,7 @@ int rds_tcp_accept_one(struct socket *sock)
>   	struct rds_connection *conn;
>   	int ret;
>   	struct inet_sock *inet;
> +	struct net *net;
>   	struct rds_tcp_connection *rs_tcp = NULL;
>   	int conn_state;
>   	struct rds_conn_path *cp;
> @@ -108,7 +109,7 @@ int rds_tcp_accept_one(struct socket *sock)
>   	struct proto_accept_arg arg = {
>   		.flags = O_NONBLOCK,
>   		.kern = true,
> -		.hold_net = false,
> +		.hold_net = true,
>   	};
>   #if !IS_ENABLED(CONFIG_IPV6)
>   	struct in6_addr saddr, daddr;
> @@ -118,13 +119,22 @@ int rds_tcp_accept_one(struct socket *sock)
>   	if (!sock) /* module unload or netns delete in progress */
>   		return -ENETUNREACH;
>   
> +	net = sock_net(sock->sk);
> +
> +	if (!maybe_get_net(net))
> +		return -EINVAL;
> +
>   	ret = sock_create_lite(sock->sk->sk_family,
>   			       sock->sk->sk_type, sock->sk->sk_protocol,
>   			       &new_sock);
> -	if (ret)
> +	if (ret) {
> +		put_net(net);
>   		goto out;
> +	}
>   
>   	ret = sock->ops->accept(sock, new_sock, &arg);
> +	put_net(net);
> +
>   	if (ret < 0)
>   		goto out;
>   
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 6e93f188a908..7b0de80b3aca 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -3310,25 +3310,8 @@ static const struct proto_ops smc_sock_ops = {
>   
>   int smc_create_clcsk(struct net *net, struct sock *sk, int family)
>   {
> -	struct smc_sock *smc = smc_sk(sk);
> -	int rc;
> -
> -	rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
> -			      &smc->clcsock);
> -	if (rc)
> -		return rc;
> -
> -	/* smc_clcsock_release() does not wait smc->clcsock->sk's
> -	 * destruction;  its sk_state might not be TCP_CLOSE after
> -	 * smc->sk is close()d, and TCP timers can be fired later,
> -	 * which need net ref.
> -	 */
> -	sk = smc->clcsock->sk;
> -	__netns_tracker_free(net, &sk->ns_tracker, false);
> -	sk->sk_net_refcnt = 1;
> -	get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> -	sock_inuse_add(net, 1);
> -	return 0;
> +	return sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP,
> +			       &smc_sk(sk)->clcsock);
>   }
>   
>   static int __smc_create(struct net *net, struct socket *sock, int protocol,
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 9583bad3d150..cde5765f6f81 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1526,7 +1526,10 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>   		return ERR_PTR(-EINVAL);
>   	}
>   
> -	error = sock_create_kern(net, family, type, protocol, &sock);
> +	if (protocol == IPPROTO_TCP)
> +		error = sock_create_net(net, family, type, protocol, &sock);
> +	else
> +		error = sock_create_kern(net, family, type, protocol, &sock);
>   	if (error < 0)
>   		return ERR_PTR(error);
>   
> @@ -1551,11 +1554,8 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>   	newlen = error;
>   
>   	if (protocol == IPPROTO_TCP) {
> -		__netns_tracker_free(net, &sock->sk->ns_tracker, false);
> -		sock->sk->sk_net_refcnt = 1;
> -		get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -		if ((error = kernel_listen(sock, 64)) < 0)
> +		error = kernel_listen(sock, 64);
> +		if (error < 0)
>   			goto bummer;
>   	}
>   
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index feb1768e8a57..f3e139c30442 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1924,7 +1924,10 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
>   	struct socket *sock;
>   	int err;
>   
> -	err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
> +	if (protocol == IPPROTO_TCP)
> +		err = sock_create_net(xprt->xprt_net, family, type, protocol, &sock);
> +	else
> +		err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
>   	if (err < 0) {
>   		dprintk("RPC:       can't create %d transport socket (%d).\n",
>   				protocol, -err);
> @@ -1941,13 +1944,6 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
>   		goto out;
>   	}
>   
> -	if (protocol == IPPROTO_TCP) {
> -		__netns_tracker_free(xprt->xprt_net, &sock->sk->ns_tracker, false);
> -		sock->sk->sk_net_refcnt = 1;
> -		get_net_track(xprt->xprt_net, &sock->sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(xprt->xprt_net, 1);
> -	}
> -
>   	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
>   	if (IS_ERR(filp))
>   		return ERR_CAST(filp);

For the svcsock.c hunks:

Acked-by: Chuck Lever <chuck.lever@oracle.com>


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion.
  2024-12-13 13:54     ` Kuniyuki Iwashima
@ 2024-12-13 15:15       ` Wenjia Zhang
  0 siblings, 0 replies; 27+ messages in thread
From: Wenjia Zhang @ 2024-12-13 15:15 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: alibuda, allison.henderson, chuck.lever, davem, edumazet, horms,
	jaka, jlayton, kuba, kuni1840, matttbe, netdev, pabeni, sfrench



On 13.12.24 14:54, Kuniyuki Iwashima wrote:
> From: Wenjia Zhang <wenjia@linux.ibm.com>
> Date: Fri, 13 Dec 2024 14:45:20 +0100
>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>> index 6e93f188a908..7b0de80b3aca 100644
>>> --- a/net/smc/af_smc.c
>>> +++ b/net/smc/af_smc.c
>>> @@ -3310,25 +3310,8 @@ static const struct proto_ops smc_sock_ops = {
>>>    
>>>    int smc_create_clcsk(struct net *net, struct sock *sk, int family)
>>>    {
>>> -	struct smc_sock *smc = smc_sk(sk);
>>> -	int rc;
>>> -
>>> -	rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
>>> -			      &smc->clcsock);
>>> -	if (rc)
>>> -		return rc;
>>> -
>>> -	/* smc_clcsock_release() does not wait smc->clcsock->sk's
>>> -	 * destruction;  its sk_state might not be TCP_CLOSE after
>>> -	 * smc->sk is close()d, and TCP timers can be fired later,
>>> -	 * which need net ref.
>>> -	 */
>>> -	sk = smc->clcsock->sk;
>>> -	__netns_tracker_free(net, &sk->ns_tracker, false);
>>> -	sk->sk_net_refcnt = 1;
>>> -	get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
>>> -	sock_inuse_add(net, 1);
>> I don't think this line shoud be removed. Otherwise, the popurse here to
>> manage the per namespace statistics in the case of network namespace
>> isolation would be lost.
> 
> Now it's counted in sk_alloc().
> 
> sock_create_net() below passes hold_net=true to sk_alloc() and if
> sk->sk_netns_refcnt (== hold_net) is true, sock_inuse_add() is
> called there.
> 
> See patch 9 and 10:
> https://lore.kernel.org/netdev/20241213092152.14057-10-kuniyu@amazon.com/
> https://lore.kernel.org/netdev/20241213092152.14057-11-kuniyu@amazon.com/
> 
> 
ok, I see. Thank you for pointing it out!

Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

>> @D. Wythe, could you please check it again? Maybe you have some good
>> testing on this case.
>>
>>> -	return 0;
>>> +	return sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP,
>>> +			       &smc_sk(sk)->clcsock);
>>>    }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion.
  2024-12-13  9:21 ` [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion Kuniyuki Iwashima
  2024-12-13 13:45   ` Wenjia Zhang
  2024-12-13 14:15   ` Chuck Lever
@ 2024-12-13 23:29   ` Allison Henderson
  2 siblings, 0 replies; 27+ messages in thread
From: Allison Henderson @ 2024-12-13 23:29 UTC (permalink / raw)
  To: horms@kernel.org, edumazet@google.com, kuniyu@amazon.com,
	davem@davemloft.net, pabeni@redhat.com, kuba@kernel.org
  Cc: Chuck Lever III, kuni1840@gmail.com, wenjia@linux.ibm.com,
	jaka@linux.ibm.com, sfrench@samba.org, jlayton@kernel.org,
	netdev@vger.kernel.org, matttbe@kernel.org

On Fri, 2024-12-13 at 18:21 +0900, Kuniyuki Iwashima wrote:
> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference count
> the netns of kernel sockets."), TCP kernel socket has caused many UAF.
> 
> We have converted such sockets to hold netns refcnt, and we have the
> same pattern in cifs, mptcp, rds, smc, and sunrpc.
> 
> Let's drop the conversion and use sock_create_net() instead.
> 
> The changes for cifs, mptcp, and smc are straightforward.
> 
> For rds, we need to move maybe_get_net() before sock_create_net() and
> sock->ops->accept().
> 
> For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
> call sock_create_kern() for others.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Acked-by: Allison Henderson <allison.henderson@oracle.com>
> ---
> v3: Add missing mutex_unlock in rds_tcp_conn_path_connect().
> v2: Collect Acked-by from MPTCP and RDS maintainers
> 
> Cc: Steve French <sfrench@samba.org>
> Cc: Wenjia Zhang <wenjia@linux.ibm.com>
> Cc: Jan Karcher <jaka@linux.ibm.com>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/smb/client/connect.c | 13 ++-----------
>  net/mptcp/subflow.c     | 10 +---------
>  net/rds/tcp.c           | 14 --------------
>  net/rds/tcp_connect.c   | 21 +++++++++++++++------
>  net/rds/tcp_listen.c    | 14 ++++++++++++--
>  net/smc/af_smc.c        | 21 ++-------------------
>  net/sunrpc/svcsock.c    | 12 ++++++------
>  net/sunrpc/xprtsock.c   | 12 ++++--------
>  8 files changed, 42 insertions(+), 75 deletions(-)
> 
> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
> index c36c1b4ffe6e..7a67b86c0423 100644
> --- a/fs/smb/client/connect.c
> +++ b/fs/smb/client/connect.c
> @@ -3130,22 +3130,13 @@ generic_ip_connect(struct TCP_Server_Info *server)
>  	if (server->ssocket) {
>  		socket = server->ssocket;
>  	} else {
> -		struct net *net = cifs_net_ns(server);
> -		struct sock *sk;
> -
> -		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
> -				      IPPROTO_TCP, &server->ssocket);
> +		rc = sock_create_net(cifs_net_ns(server), sfamily, SOCK_STREAM,
> +				     IPPROTO_TCP, &server->ssocket);
>  		if (rc < 0) {
>  			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
>  			return rc;
>  		}
>  
> -		sk = server->ssocket->sk;
> -		__netns_tracker_free(net, &sk->ns_tracker, false);
> -		sk->sk_net_refcnt = 1;
> -		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -
>  		/* BB other socket options to set KEEPALIVE, NODELAY? */
>  		cifs_dbg(FYI, "Socket created\n");
>  		socket = server->ssocket;
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index fd021cf8286e..e7e8972bdfca 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1755,7 +1755,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>  	if (unlikely(!sk->sk_socket))
>  		return -EINVAL;
>  
> -	err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
> +	err = sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
>  	if (err)
>  		return err;
>  
> @@ -1768,14 +1768,6 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
>  	/* the newly created socket has to be in the same cgroup as its parent */
>  	mptcp_attach_cgroup(sk, sf->sk);
>  
> -	/* kernel sockets do not by default acquire net ref, but TCP timer
> -	 * needs it.
> -	 * Update ns_tracker to current stack trace and refcounted tracker.
> -	 */
> -	__netns_tracker_free(net, &sf->sk->ns_tracker, false);
> -	sf->sk->sk_net_refcnt = 1;
> -	get_net_track(net, &sf->sk->ns_tracker, GFP_KERNEL);
> -	sock_inuse_add(net, 1);
>  	err = tcp_set_ulp(sf->sk, "mptcp");
>  	if (err)
>  		goto err_free;
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 351ac1747224..4509900476f7 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -494,21 +494,7 @@ bool rds_tcp_tune(struct socket *sock)
>  
>  	tcp_sock_set_nodelay(sock->sk);
>  	lock_sock(sk);
> -	/* TCP timer functions might access net namespace even after
> -	 * a process which created this net namespace terminated.
> -	 */
> -	if (!sk->sk_net_refcnt) {
> -		if (!maybe_get_net(net)) {
> -			release_sock(sk);
> -			return false;
> -		}
> -		/* Update ns_tracker to current stack trace and refcounted tracker */
> -		__netns_tracker_free(net, &sk->ns_tracker, false);
>  
> -		sk->sk_net_refcnt = 1;
> -		netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -	}
>  	rtn = net_generic(net, rds_tcp_netid);
>  	if (rtn->sndbuf_size > 0) {
>  		sk->sk_sndbuf = rtn->sndbuf_size;
> diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
> index a0046e99d6df..c9449780f952 100644
> --- a/net/rds/tcp_connect.c
> +++ b/net/rds/tcp_connect.c
> @@ -93,6 +93,7 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
>  	struct sockaddr_in6 sin6;
>  	struct sockaddr_in sin;
>  	struct sockaddr *addr;
> +	struct net *net;
>  	int addrlen;
>  	bool isv6;
>  	int ret;
> @@ -107,20 +108,28 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
>  
>  	mutex_lock(&tc->t_conn_path_lock);
>  
> +	net = rds_conn_net(conn);
> +
>  	if (rds_conn_path_up(cp)) {
> -		mutex_unlock(&tc->t_conn_path_lock);
> -		return 0;
> +		ret = 0;
> +		goto out;
>  	}
> +
> +	if (!maybe_get_net(net)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}

Ok, this looks much better.  Thank you!

Allison

> +
>  	if (ipv6_addr_v4mapped(&conn->c_laddr)) {
> -		ret = sock_create_kern(rds_conn_net(conn), PF_INET,
> -				       SOCK_STREAM, IPPROTO_TCP, &sock);
> +		ret = sock_create_net(net, PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
>  		isv6 = false;
>  	} else {
> -		ret = sock_create_kern(rds_conn_net(conn), PF_INET6,
> -				       SOCK_STREAM, IPPROTO_TCP, &sock);
> +		ret = sock_create_net(net, PF_INET6, SOCK_STREAM, IPPROTO_TCP, &sock);
>  		isv6 = true;
>  	}
>  
> +	put_net(net);
> +
>  	if (ret < 0)
>  		goto out;
>  
> diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
> index 69aaf03ab93e..440ac9057148 100644
> --- a/net/rds/tcp_listen.c
> +++ b/net/rds/tcp_listen.c
> @@ -101,6 +101,7 @@ int rds_tcp_accept_one(struct socket *sock)
>  	struct rds_connection *conn;
>  	int ret;
>  	struct inet_sock *inet;
> +	struct net *net;
>  	struct rds_tcp_connection *rs_tcp = NULL;
>  	int conn_state;
>  	struct rds_conn_path *cp;
> @@ -108,7 +109,7 @@ int rds_tcp_accept_one(struct socket *sock)
>  	struct proto_accept_arg arg = {
>  		.flags = O_NONBLOCK,
>  		.kern = true,
> -		.hold_net = false,
> +		.hold_net = true,
>  	};
>  #if !IS_ENABLED(CONFIG_IPV6)
>  	struct in6_addr saddr, daddr;
> @@ -118,13 +119,22 @@ int rds_tcp_accept_one(struct socket *sock)
>  	if (!sock) /* module unload or netns delete in progress */
>  		return -ENETUNREACH;
>  
> +	net = sock_net(sock->sk);
> +
> +	if (!maybe_get_net(net))
> +		return -EINVAL;
> +
>  	ret = sock_create_lite(sock->sk->sk_family,
>  			       sock->sk->sk_type, sock->sk->sk_protocol,
>  			       &new_sock);
> -	if (ret)
> +	if (ret) {
> +		put_net(net);
>  		goto out;
> +	}
>  
>  	ret = sock->ops->accept(sock, new_sock, &arg);
> +	put_net(net);
> +
>  	if (ret < 0)
>  		goto out;
>  
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 6e93f188a908..7b0de80b3aca 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -3310,25 +3310,8 @@ static const struct proto_ops smc_sock_ops = {
>  
>  int smc_create_clcsk(struct net *net, struct sock *sk, int family)
>  {
> -	struct smc_sock *smc = smc_sk(sk);
> -	int rc;
> -
> -	rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
> -			      &smc->clcsock);
> -	if (rc)
> -		return rc;
> -
> -	/* smc_clcsock_release() does not wait smc->clcsock->sk's
> -	 * destruction;  its sk_state might not be TCP_CLOSE after
> -	 * smc->sk is close()d, and TCP timers can be fired later,
> -	 * which need net ref.
> -	 */
> -	sk = smc->clcsock->sk;
> -	__netns_tracker_free(net, &sk->ns_tracker, false);
> -	sk->sk_net_refcnt = 1;
> -	get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> -	sock_inuse_add(net, 1);
> -	return 0;
> +	return sock_create_net(net, family, SOCK_STREAM, IPPROTO_TCP,
> +			       &smc_sk(sk)->clcsock);
>  }
>  
>  static int __smc_create(struct net *net, struct socket *sock, int protocol,
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 9583bad3d150..cde5765f6f81 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1526,7 +1526,10 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>  		return ERR_PTR(-EINVAL);
>  	}
>  
> -	error = sock_create_kern(net, family, type, protocol, &sock);
> +	if (protocol == IPPROTO_TCP)
> +		error = sock_create_net(net, family, type, protocol, &sock);
> +	else
> +		error = sock_create_kern(net, family, type, protocol, &sock);
>  	if (error < 0)
>  		return ERR_PTR(error);
>  
> @@ -1551,11 +1554,8 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
>  	newlen = error;
>  
>  	if (protocol == IPPROTO_TCP) {
> -		__netns_tracker_free(net, &sock->sk->ns_tracker, false);
> -		sock->sk->sk_net_refcnt = 1;
> -		get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(net, 1);
> -		if ((error = kernel_listen(sock, 64)) < 0)
> +		error = kernel_listen(sock, 64);
> +		if (error < 0)
>  			goto bummer;
>  	}
>  
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index feb1768e8a57..f3e139c30442 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1924,7 +1924,10 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
>  	struct socket *sock;
>  	int err;
>  
> -	err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
> +	if (protocol == IPPROTO_TCP)
> +		err = sock_create_net(xprt->xprt_net, family, type, protocol, &sock);
> +	else
> +		err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
>  	if (err < 0) {
>  		dprintk("RPC:       can't create %d transport socket (%d).\n",
>  				protocol, -err);
> @@ -1941,13 +1944,6 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
>  		goto out;
>  	}
>  
> -	if (protocol == IPPROTO_TCP) {
> -		__netns_tracker_free(xprt->xprt_net, &sock->sk->ns_tracker, false);
> -		sock->sk->sk_net_refcnt = 1;
> -		get_net_track(xprt->xprt_net, &sock->sk->ns_tracker, GFP_KERNEL);
> -		sock_inuse_add(xprt->xprt_net, 1);
> -	}
> -
>  	filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
>  	if (IS_ERR(filp))
>  		return ERR_CAST(filp);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create().
  2024-12-13  9:21 ` [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create() Kuniyuki Iwashima
  2024-12-13 13:46   ` Wenjia Zhang
@ 2024-12-17 10:24   ` Paolo Abeni
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Abeni @ 2024-12-17 10:24 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Simon Horman
  Cc: Kuniyuki Iwashima, netdev

On 12/13/24 10:21, Kuniyuki Iwashima wrote:
> We will introduce a new API to create a kernel socket with netns refcnt
> held.  Then, sk_alloc() needs the hold_net flag passed to __sock_create().
> 
> Let's pass it down to net_proto_family.create() and functions that call
> sk_alloc().
> 
> While at it, we convert the kern flag to boolean.
> 
> Note that we still need to pass hold_net to struct pppox_proto.create()
> and struct nfc_protocol.create() before passing hold_net to sk_alloc().
> 
> Also, we use !kern as hold_net in the accept() paths.  We will add the
> hold_net flag to struct proto_accept_arg later.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  crypto/af_alg.c                   |  2 +-
>  drivers/isdn/mISDN/socket.c       | 13 ++++++++-----
>  drivers/net/ppp/pppox.c           |  2 +-
>  include/linux/net.h               |  2 +-
>  include/net/bluetooth/bluetooth.h |  3 ++-
>  include/net/llc_conn.h            |  2 +-
>  net/appletalk/ddp.c               |  2 +-
>  net/atm/common.c                  |  3 ++-
>  net/atm/common.h                  |  3 ++-
>  net/atm/pvc.c                     |  4 ++--
>  net/atm/svc.c                     |  8 ++++----
>  net/ax25/af_ax25.c                |  2 +-
>  net/bluetooth/af_bluetooth.c      |  7 ++++---
>  net/bluetooth/bnep/sock.c         |  5 +++--
>  net/bluetooth/cmtp/sock.c         |  2 +-
>  net/bluetooth/hci_sock.c          |  4 ++--
>  net/bluetooth/hidp/sock.c         |  5 +++--
>  net/bluetooth/iso.c               | 11 ++++++-----
>  net/bluetooth/l2cap_sock.c        | 14 ++++++++------
>  net/bluetooth/rfcomm/sock.c       | 12 +++++++-----
>  net/bluetooth/sco.c               | 11 ++++++-----
>  net/caif/caif_socket.c            |  2 +-
>  net/can/af_can.c                  |  2 +-
>  net/ieee802154/socket.c           |  2 +-
>  net/ipv4/af_inet.c                |  2 +-
>  net/ipv6/af_inet6.c               |  2 +-
>  net/iucv/af_iucv.c                | 11 ++++++-----
>  net/kcm/kcmsock.c                 |  2 +-
>  net/key/af_key.c                  |  2 +-
>  net/llc/af_llc.c                  |  6 ++++--
>  net/llc/llc_conn.c                |  9 ++++++---
>  net/mctp/af_mctp.c                |  2 +-
>  net/netlink/af_netlink.c          |  8 ++++----
>  net/netrom/af_netrom.c            |  2 +-
>  net/nfc/af_nfc.c                  |  2 +-
>  net/packet/af_packet.c            |  2 +-
>  net/phonet/af_phonet.c            |  2 +-
>  net/qrtr/af_qrtr.c                |  2 +-
>  net/rds/af_rds.c                  |  2 +-
>  net/rose/af_rose.c                |  2 +-
>  net/rxrpc/af_rxrpc.c              |  2 +-
>  net/smc/af_smc.c                  | 15 ++++++++-------
>  net/socket.c                      |  2 +-
>  net/tipc/socket.c                 |  6 ++++--
>  net/unix/af_unix.c                |  9 +++++----
>  net/vmw_vsock/af_vsock.c          |  8 ++++----
>  net/x25/af_x25.c                  | 13 ++++++++-----
>  net/xdp/xsk.c                     |  2 +-
>  48 files changed, 133 insertions(+), 105 deletions(-)

The diffstat here and in patch 8/15 is IMHO scareful.

I'm wondering if we could make this more palatable? Can we let
_sock_create() directly acquire the netns reference for kern socket -
when asked? something alike:
---
diff --git a/net/socket.c b/net/socket.c
index 16402b8be5a7..23092f7576cf 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1577,6 +1577,13 @@ int __sock_create(struct net *net, int family,
int type, int protocol,
 		goto out_module_put;
 	}

+	DEBUG_NET_WARN_ON_ONCE(!kern && !hold_net);
+	if (hold_net && kern) {
+		sk->sk_net_refcnt = true;
+		get_net_track(net, &sf->sk->ns_tracker, GFP_KERNEL);
+		sock_inuse_add(net, 1);
+	}
+
 	/*
 	 * Now to bump the refcnt of the [loadable] module that owns this
 	 * socket at sock_release time we decrement its refcnt.
---

(completely untested, just to explain my thoughts). The goal would be to
drop patch 4 & 8 entirely.

Thanks,

Paolo


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref().
  2024-12-13  9:21 ` [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref() Kuniyuki Iwashima
  2024-12-13 13:46   ` Wenjia Zhang
@ 2024-12-17 10:32   ` Paolo Abeni
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Abeni @ 2024-12-17 10:32 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Simon Horman
  Cc: Kuniyuki Iwashima, netdev

On 12/13/24 10:21, Kuniyuki Iwashima wrote:
> sock_create_kern() is quite a bad name, and the non-netdev folks tend
> to use it without taking care of the netns lifetime.
> 
> Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference count
> the netns of kernel sockets."), TCP sockets created by sock_create_kern()
> have caused many use-after-free.
> 
> Let's rename sock_create_kern() to sock_create_net_noref() and add fat
> documentation so that we no longer introduce the same issue in the future.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

IMHO the net-benefit/LoC rate for this and the previous one is a bit too
low.

I would avoid the rename, just add the documentation and instead add
some suffix to the sock_create* kernel variant acquiring the netns
reference (sock_create_kern_netref()?)

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-12-17 10:33 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13  9:21 [PATCH v3 net-next 00/15] treewide: socket: Clean up sock_create() and friends Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 01/15] socket: Un-export __sock_create() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 02/15] socket: Pass hold_net flag to __sock_create() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 03/15] smc: Pass kern to smc_sock_alloc() Kuniyuki Iwashima
2024-12-13 13:46   ` Wenjia Zhang
2024-12-13  9:21 ` [PATCH v3 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create() Kuniyuki Iwashima
2024-12-13 13:46   ` Wenjia Zhang
2024-12-17 10:24   ` Paolo Abeni
2024-12-13  9:21 ` [PATCH v3 net-next 05/15] ppp: Pass hold_net to struct pppox_proto.create() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 06/15] nfc: Pass hold_net to struct nfc_protocol.create() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 07/15] socket: Add hold_net flag to struct proto_accept_arg Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 08/15] socket: Pass hold_net to sk_alloc() Kuniyuki Iwashima
2024-12-13 13:45   ` Wenjia Zhang
2024-12-13  9:21 ` [PATCH v3 net-next 09/15] socket: Respect hold_net in sk_alloc() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 10/15] socket: Introduce sock_create_net() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 11/15] socket: Remove kernel socket conversion Kuniyuki Iwashima
2024-12-13 13:45   ` Wenjia Zhang
2024-12-13 13:54     ` Kuniyuki Iwashima
2024-12-13 15:15       ` Wenjia Zhang
2024-12-13 14:15   ` Chuck Lever
2024-12-13 23:29   ` Allison Henderson
2024-12-13  9:21 ` [PATCH v3 net-next 12/15] socket: Move sock_inuse_add() to sock.c Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 13/15] socket: Use sock_create_net() instead of sock_create() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 14/15] socket: Rename sock_create() to sock_create_user() Kuniyuki Iwashima
2024-12-13  9:21 ` [PATCH v3 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref() Kuniyuki Iwashima
2024-12-13 13:46   ` Wenjia Zhang
2024-12-17 10:32   ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).