* [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse.
@ 2025-05-23 18:21 Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 1/7] socket: Un-export __sock_create() Kuniyuki Iwashima
` (6 more replies)
0 siblings, 7 replies; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
There are a bunch of weird usages of sock_create() and friends due
to poor documentation.
1) some subsystems use __sock_create(), but all of them can be
replaced with sock_create_kern()
2) some subsystems use sock_create(), but most of the sockets are
not tied to userspace processes nor exposed via file descriptors
but are (most likely unintentionally) exposed to some BPF hooks
(infiniband, ISDN, iscsi, Xen PV call, ocfs2, smbd)
3) some subsystems use sock_create_kern() and convert the sockets
to hold netns refcnt (cifs, mptcp, nvme, rds, smc, and sunrpc)
The primary goal is to sort out such confusion and provide enough
documentation for future developers to choose an appropriate API.
Before commit 26abe14379f8 ("net: Modify sk_alloc to not reference
count the netns of kernel sockets."), sock_create_kern() held the
netns refcnt, and each caller dropped it if unnecessary:
sock_create_kern(&init_net, ..., &sock);
sk_change_net(sock->sk, net);
But that implicit API change ended up causing a lot of use-after-free
and finally introduced another helper:
sock_create_kern(net, ..., &sock);
sk_net_refcnt_upgrade(sock->sk);
Patch 2 renames sock_create_kern() to __sock_create_kern() to mark it
as a special-purpose API, and Patch 3 restores the original
sock_create_kern() that holds the netns refcnt.
Now, we can simply use sock_create_kern() or __sock_create_kern()
depending on the use case (except for rds).
Changes
v2:
patch 3: s/ret/err/ in sock_create_kern() for clarity
patch 4: newly added
patch 5: drop unnecessary change for sunrpc and updated changelog
v1: https://lore.kernel.org/netdev/20250517035120.55560-1-kuniyu@amazon.com/
Kuniyuki Iwashima (7):
socket: Un-export __sock_create().
socket: Rename sock_create_kern() to __sock_create_kern().
socket: Restore sock_create_kern().
smb: client: Add missing net_passive_dec().
socket: Remove kernel socket conversion except for net/rds/.
socket: Replace most sock_create() calls with sock_create_kern().
socket: Clean up kdoc for sock_create() and sock_create_lite().
drivers/block/drbd/drbd_receiver.c | 12 +-
drivers/infiniband/hw/erdma/erdma_cm.c | 6 +-
drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
drivers/infiniband/sw/siw/siw_cm.c | 6 +-
drivers/isdn/mISDN/l1oip_core.c | 3 +-
drivers/nvme/host/tcp.c | 5 +-
drivers/nvme/target/tcp.c | 5 +-
drivers/soc/qcom/qmi_interface.c | 4 +-
drivers/target/iscsi/iscsi_target_login.c | 7 +-
drivers/xen/pvcalls-back.c | 6 +-
fs/afs/rxrpc.c | 2 +-
fs/dlm/lowcomms.c | 8 +-
fs/ocfs2/cluster/tcp.c | 8 +-
fs/smb/client/connect.c | 11 +-
fs/smb/server/transport_tcp.c | 7 +-
include/linux/net.h | 7 +-
net/9p/trans_fd.c | 9 +-
net/bluetooth/rfcomm/core.c | 3 +-
net/ceph/messenger.c | 6 +-
net/handshake/handshake-test.c | 32 ++--
net/ipv4/af_inet.c | 2 +-
net/ipv4/udp_tunnel_core.c | 2 +-
net/ipv6/ip6_udp_tunnel.c | 2 +-
net/l2tp/l2tp_core.c | 8 +-
net/mctp/test/route-test.c | 6 +-
net/mptcp/pm_kernel.c | 4 +-
net/mptcp/subflow.c | 7 +-
net/netfilter/ipvs/ip_vs_sync.c | 8 +-
net/qrtr/ns.c | 6 +-
net/rds/tcp_connect.c | 8 +-
net/rds/tcp_listen.c | 4 +-
net/rxrpc/rxperf.c | 4 +-
net/sctp/socket.c | 2 +-
net/smc/af_smc.c | 18 +--
net/smc/smc_inet.c | 2 +-
net/socket.c | 138 ++++++++++++------
net/sunrpc/clnt.c | 4 +-
net/sunrpc/svcsock.c | 6 +-
net/sunrpc/xprtsock.c | 12 +-
net/tipc/topsrv.c | 4 +-
net/wireless/nl80211.c | 4 +-
.../selftests/bpf/test_kmods/bpf_testmod.c | 4 +-
42 files changed, 219 insertions(+), 185 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2 net-next 1/7] socket: Un-export __sock_create().
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
@ 2025-05-23 18:21 ` Kuniyuki Iwashima
2025-05-26 5:29 ` Christoph Hellwig
2025-05-23 18:21 ` [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern() Kuniyuki Iwashima
` (5 subsequent siblings)
6 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
Since commit eeb1bd5c40ed ("net: Add a struct net parameter to
sock_create_kern"), we no longer need to export __sock_create()
and can replace all non-core users with sock_create_kern().
Let's convert them and un-export __sock_create().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/smb/client/connect.c | 4 ++--
include/linux/net.h | 2 --
net/9p/trans_fd.c | 9 +++++----
net/handshake/handshake-test.c | 32 ++++++++++++++------------------
net/socket.c | 3 +--
net/sunrpc/clnt.c | 4 ++--
net/sunrpc/svcsock.c | 2 +-
net/sunrpc/xprtsock.c | 6 +++---
net/wireless/nl80211.c | 4 ++--
9 files changed, 30 insertions(+), 36 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 6bf04d9a5491..c251a23a6447 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3350,8 +3350,8 @@ generic_ip_connect(struct TCP_Server_Info *server)
struct net *net = cifs_net_ns(server);
struct sock *sk;
- rc = __sock_create(net, sfamily, SOCK_STREAM,
- IPPROTO_TCP, &server->ssocket, 1);
+ rc = sock_create_kern(net, sfamily, SOCK_STREAM,
+ IPPROTO_TCP, &server->ssocket);
if (rc < 0) {
cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
return rc;
diff --git a/include/linux/net.h b/include/linux/net.h
index 0ff950eecc6b..26aaaa841f48 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -251,8 +251,6 @@ int sock_wake_async(struct socket_wq *sk_wq, int how, int band);
int sock_register(const struct net_proto_family *fam);
void sock_unregister(int family);
bool sock_is_registered(int family);
-int __sock_create(struct net *net, int family, int type, int proto,
- struct socket **res, int kern);
int sock_create(int family, int type, int proto, struct socket **res);
int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
int sock_create_lite(int family, int type, int proto, struct socket **res);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 339ec4e54778..842977f309b3 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1006,8 +1006,9 @@ p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args)
client->trans_opts.tcp.port = opts.port;
client->trans_opts.tcp.privport = opts.privport;
- err = __sock_create(current->nsproxy->net_ns, stor.ss_family,
- SOCK_STREAM, IPPROTO_TCP, &csocket, 1);
+
+ err = sock_create_kern(current->nsproxy->net_ns, stor.ss_family,
+ SOCK_STREAM, IPPROTO_TCP, &csocket);
if (err) {
pr_err("%s (%d): problem creating socket\n",
__func__, task_pid_nr(current));
@@ -1057,8 +1058,8 @@ p9_fd_create_unix(struct p9_client *client, const char *addr, char *args)
sun_server.sun_family = PF_UNIX;
strcpy(sun_server.sun_path, addr);
- err = __sock_create(current->nsproxy->net_ns, PF_UNIX,
- SOCK_STREAM, 0, &csocket, 1);
+ err = sock_create_kern(current->nsproxy->net_ns, PF_UNIX,
+ SOCK_STREAM, 0, &csocket);
if (err < 0) {
pr_err("%s (%d): problem creating socket\n",
__func__, task_pid_nr(current));
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 55442b2f518a..4f300504f3e5 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -143,14 +143,18 @@ static void handshake_req_alloc_case(struct kunit *test)
kfree(result);
}
+static int handshake_sock_create(struct socket **sock)
+{
+ return sock_create_kern(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, sock);
+}
+
static void handshake_req_submit_test1(struct kunit *test)
{
struct socket *sock;
int err, result;
/* Arrange */
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
/* Act */
@@ -190,8 +194,7 @@ static void handshake_req_submit_test3(struct kunit *test)
req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
sock->file = NULL;
@@ -216,8 +219,7 @@ static void handshake_req_submit_test4(struct kunit *test)
req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
@@ -251,8 +253,7 @@ static void handshake_req_submit_test5(struct kunit *test)
req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
@@ -289,8 +290,7 @@ static void handshake_req_submit_test6(struct kunit *test)
req2 = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req2);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filp);
@@ -321,8 +321,7 @@ static void handshake_req_cancel_test1(struct kunit *test)
req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
@@ -357,8 +356,7 @@ static void handshake_req_cancel_test2(struct kunit *test)
req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
@@ -399,8 +397,7 @@ static void handshake_req_cancel_test3(struct kunit *test)
req = handshake_req_alloc(&handshake_req_alloc_proto_good, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
@@ -457,8 +454,7 @@ static void handshake_req_destroy_test1(struct kunit *test)
req = handshake_req_alloc(&handshake_req_alloc_proto_destroy, GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, req);
- err = __sock_create(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sock, 1);
+ err = handshake_sock_create(&sock);
KUNIT_ASSERT_EQ(test, err, 0);
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
diff --git a/net/socket.c b/net/socket.c
index 9a0e720f0859..241d9767ae69 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1467,7 +1467,7 @@ EXPORT_SYMBOL(sock_wake_async);
* This function internally uses GFP_KERNEL.
*/
-int __sock_create(struct net *net, int family, int type, int protocol,
+static int __sock_create(struct net *net, int family, int type, int protocol,
struct socket **res, int kern)
{
int err;
@@ -1581,7 +1581,6 @@ int __sock_create(struct net *net, int family, int type, int protocol,
rcu_read_unlock();
goto out_sock_release;
}
-EXPORT_SYMBOL(__sock_create);
/**
* sock_create - creates a socket
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6f75862d9782..f9f340171530 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1455,8 +1455,8 @@ static int rpc_sockname(struct net *net, struct sockaddr *sap, size_t salen,
struct socket *sock;
int err;
- err = __sock_create(net, sap->sa_family,
- SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+ err = sock_create_kern(net, sap->sa_family,
+ SOCK_DGRAM, IPPROTO_UDP, &sock);
if (err < 0) {
dprintk("RPC: can't create UDP socket (%d)\n", err);
goto out;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 72e5a01df3d3..e2c69ab17ac5 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1516,7 +1516,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
return ERR_PTR(-EINVAL);
}
- error = __sock_create(net, family, type, protocol, &sock, 1);
+ error = sock_create_kern(net, family, type, protocol, &sock);
if (error < 0)
return ERR_PTR(error);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 83cc095846d3..5ffe88145193 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1924,7 +1924,7 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
struct socket *sock;
int err;
- err = __sock_create(xprt->xprt_net, family, type, protocol, &sock, 1);
+ err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
if (err < 0) {
dprintk("RPC: can't create %d transport socket (%d).\n",
protocol, -err);
@@ -1999,8 +1999,8 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
struct socket *sock;
int status;
- status = __sock_create(xprt->xprt_net, AF_LOCAL,
- SOCK_STREAM, 0, &sock, 1);
+ status = sock_create_kern(xprt->xprt_net, AF_LOCAL,
+ SOCK_STREAM, 0, &sock);
if (status < 0) {
dprintk("RPC: can't create AF_LOCAL "
"transport socket (%d).\n", -status);
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index fd5f79266471..98a7298e427d 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13750,8 +13750,8 @@ static int nl80211_parse_wowlan_tcp(struct cfg80211_registered_device *rdev,
port = nla_get_u16_default(tb[NL80211_WOWLAN_TCP_SRC_PORT], 0);
#ifdef CONFIG_INET
/* allocate a socket and port for it and use it */
- err = __sock_create(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
- IPPROTO_TCP, &cfg->sock, 1);
+ err = sock_create_kern(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
+ IPPROTO_TCP, &cfg->sock);
if (err) {
kfree(cfg);
return err;
--
2.49.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern().
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 1/7] socket: Un-export __sock_create() Kuniyuki Iwashima
@ 2025-05-23 18:21 ` Kuniyuki Iwashima
2025-05-26 5:30 ` Christoph Hellwig
2025-05-23 18:21 ` [PATCH v2 net-next 3/7] socket: Restore sock_create_kern() Kuniyuki Iwashima
` (4 subsequent siblings)
6 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
sock_create_kern() is a catchy name and often chosen by non-networking
developers to create kernel sockets. But due to its poor documentation,
it has caused a bunch of netns use-after-free:
* commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.")
* commit b013b817f32f ("nvme-tcp: fix use-after-free of netns by
kernel TCP socket.")
.. and more in NFS, SMC, MPTCP, RDS
Some non-networking maintainers mentioned that the socket API should
be more robust to prevent this type of issues. [0]
The current sock_create_kern() doesn't hold a reference to the netns,
which allows the netns to be removed while the socket is still around.
This is useful when the socket is used as the backend for a networking
device.
But, this is rather a special case, where netdev folks should use a
dedicated API, and we should provide sock_create_kern() as the standard
API for general in-kernel use cases.
In fact, we did so before commit 26abe14379f8 ("net: Modify sk_alloc
to not reference count the netns of kernel sockets."),
sock_create_kern(&init_net, ..., &sock)
sk_change_net(sock->sk, net);
but that implicit API change ended up causing a lot of problems.
Let's rename sock_create_kern() to __sock_create_kern() as a special
API and add a fat documentation.
The next patch will add sock_create_kern() that holds netns refcnt.
Link: https://lore.kernel.org/lkml/20250409084446.GA2771@lst.de/ #[0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> # net/mptcp
Acked-by: Chuck Lever <chuck.lever@oracle.com>
---
drivers/block/drbd/drbd_receiver.c | 12 +++---
drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
drivers/nvme/host/tcp.c | 6 +--
drivers/soc/qcom/qmi_interface.c | 4 +-
fs/afs/rxrpc.c | 2 +-
fs/dlm/lowcomms.c | 8 ++--
fs/smb/client/connect.c | 4 +-
include/linux/net.h | 3 +-
net/9p/trans_fd.c | 8 ++--
net/bluetooth/rfcomm/core.c | 3 +-
net/ceph/messenger.c | 6 +--
net/handshake/handshake-test.c | 2 +-
net/ipv4/af_inet.c | 2 +-
net/ipv4/udp_tunnel_core.c | 2 +-
net/ipv6/ip6_udp_tunnel.c | 2 +-
net/l2tp/l2tp_core.c | 8 ++--
net/mctp/test/route-test.c | 6 +--
net/mptcp/pm_kernel.c | 4 +-
net/mptcp/subflow.c | 4 +-
net/netfilter/ipvs/ip_vs_sync.c | 8 ++--
net/qrtr/ns.c | 6 +--
net/rds/tcp_connect.c | 8 ++--
net/rds/tcp_listen.c | 4 +-
net/rxrpc/rxperf.c | 4 +-
net/sctp/socket.c | 2 +-
net/smc/af_smc.c | 4 +-
net/smc/smc_inet.c | 2 +-
net/socket.c | 37 +++++++++++++------
net/sunrpc/clnt.c | 4 +-
net/sunrpc/svcsock.c | 2 +-
net/sunrpc/xprtsock.c | 6 +--
net/tipc/topsrv.c | 4 +-
net/wireless/nl80211.c | 4 +-
.../selftests/bpf/test_kmods/bpf_testmod.c | 4 +-
34 files changed, 102 insertions(+), 85 deletions(-)
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index e5a2e5f7887b..3e4619fad8c8 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -618,9 +618,9 @@ static struct socket *drbd_try_connect(struct drbd_connection *connection)
peer_addr_len = min_t(int, connection->peer_addr_len, sizeof(src_in6));
memcpy(&peer_in6, &connection->peer_addr, peer_addr_len);
- what = "sock_create_kern";
- err = sock_create_kern(&init_net, ((struct sockaddr *)&src_in6)->sa_family,
- SOCK_STREAM, IPPROTO_TCP, &sock);
+ what = "__sock_create_kern";
+ err = __sock_create_kern(&init_net, ((struct sockaddr *)&src_in6)->sa_family,
+ SOCK_STREAM, IPPROTO_TCP, &sock);
if (err < 0) {
sock = NULL;
goto out;
@@ -713,9 +713,9 @@ static int prepare_listen_socket(struct drbd_connection *connection, struct acce
my_addr_len = min_t(int, connection->my_addr_len, sizeof(struct sockaddr_in6));
memcpy(&my_addr, &connection->my_addr, my_addr_len);
- what = "sock_create_kern";
- err = sock_create_kern(&init_net, ((struct sockaddr *)&my_addr)->sa_family,
- SOCK_STREAM, IPPROTO_TCP, &s_listen);
+ what = "__sock_create_kern";
+ err = __sock_create_kern(&init_net, ((struct sockaddr *)&my_addr)->sa_family,
+ SOCK_STREAM, IPPROTO_TCP, &s_listen);
if (err) {
s_listen = NULL;
goto out;
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 7975fb0e2782..b4df63fdabad 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -241,7 +241,7 @@ static int rxe_qp_init_req(struct rxe_dev *rxe, struct rxe_qp *qp,
/* if we don't finish qp create make sure queue is valid */
skb_queue_head_init(&qp->req_pkts);
- err = sock_create_kern(&init_net, AF_INET, SOCK_DGRAM, 0, &qp->sk);
+ err = __sock_create_kern(&init_net, AF_INET, SOCK_DGRAM, 0, &qp->sk);
if (err < 0)
return err;
qp->sk->sk->sk_user_data = (void *)(uintptr_t)qp->elem.index;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 8ae6cc2280ca..3d3bdc5e280f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1756,9 +1756,9 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->cmnd_capsule_len = sizeof(struct nvme_command) +
NVME_TCP_ADMIN_CCSZ;
- ret = sock_create_kern(current->nsproxy->net_ns,
- ctrl->addr.ss_family, SOCK_STREAM,
- IPPROTO_TCP, &queue->sock);
+ ret = __sock_create_kern(current->nsproxy->net_ns,
+ ctrl->addr.ss_family, SOCK_STREAM,
+ IPPROTO_TCP, &queue->sock);
if (ret) {
dev_err(nctrl->device,
"failed to create socket: %d\n", ret);
diff --git a/drivers/soc/qcom/qmi_interface.c b/drivers/soc/qcom/qmi_interface.c
index bc6d6379d8b1..c8339985b2fe 100644
--- a/drivers/soc/qcom/qmi_interface.c
+++ b/drivers/soc/qcom/qmi_interface.c
@@ -588,8 +588,8 @@ static struct socket *qmi_sock_create(struct qmi_handle *qmi,
struct socket *sock;
int ret;
- ret = sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
- PF_QIPCRTR, &sock);
+ ret = __sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
+ PF_QIPCRTR, &sock);
if (ret < 0)
return ERR_PTR(ret);
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index c1cadf8fb346..9b54cba9b751 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -53,7 +53,7 @@ int afs_open_socket(struct afs_net *net)
_enter("");
- ret = sock_create_kern(net->net, AF_RXRPC, SOCK_DGRAM, PF_INET6, &socket);
+ ret = __sock_create_kern(net->net, AF_RXRPC, SOCK_DGRAM, PF_INET6, &socket);
if (ret < 0)
goto error_1;
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 70abd4da17a6..9086c3807a94 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1580,8 +1580,8 @@ static int dlm_connect(struct connection *con)
}
/* Create a socket to communicate with */
- result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
- SOCK_STREAM, dlm_proto_ops->proto, &sock);
+ result = __sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
+ SOCK_STREAM, dlm_proto_ops->proto, &sock);
if (result < 0)
return result;
@@ -1761,8 +1761,8 @@ static int dlm_listen_for_all(void)
if (result < 0)
return result;
- result = sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
- SOCK_STREAM, dlm_proto_ops->proto, &sock);
+ result = __sock_create_kern(&init_net, dlm_local_addr[0].ss_family,
+ SOCK_STREAM, dlm_proto_ops->proto, &sock);
if (result < 0) {
log_print("Can't create comms socket: %d", result);
return result;
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index c251a23a6447..37a2ba38f10e 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3350,8 +3350,8 @@ generic_ip_connect(struct TCP_Server_Info *server)
struct net *net = cifs_net_ns(server);
struct sock *sk;
- rc = sock_create_kern(net, sfamily, SOCK_STREAM,
- IPPROTO_TCP, &server->ssocket);
+ rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
+ IPPROTO_TCP, &server->ssocket);
if (rc < 0) {
cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
return rc;
diff --git a/include/linux/net.h b/include/linux/net.h
index 26aaaa841f48..12180e00f882 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -252,7 +252,8 @@ int sock_register(const struct net_proto_family *fam);
void sock_unregister(int family);
bool sock_is_registered(int family);
int sock_create(int family, int type, int proto, struct socket **res);
-int sock_create_kern(struct net *net, int family, int type, int proto, struct socket **res);
+int __sock_create_kern(struct net *net, int family, int type, int proto,
+ struct socket **res);
int sock_create_lite(int family, int type, int proto, struct socket **res);
struct socket *sock_alloc(void);
void sock_release(struct socket *sock);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 842977f309b3..728d60904a20 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -1007,8 +1007,8 @@ p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args)
client->trans_opts.tcp.port = opts.port;
client->trans_opts.tcp.privport = opts.privport;
- err = sock_create_kern(current->nsproxy->net_ns, stor.ss_family,
- SOCK_STREAM, IPPROTO_TCP, &csocket);
+ err = __sock_create_kern(current->nsproxy->net_ns, stor.ss_family,
+ SOCK_STREAM, IPPROTO_TCP, &csocket);
if (err) {
pr_err("%s (%d): problem creating socket\n",
__func__, task_pid_nr(current));
@@ -1058,8 +1058,8 @@ p9_fd_create_unix(struct p9_client *client, const char *addr, char *args)
sun_server.sun_family = PF_UNIX;
strcpy(sun_server.sun_path, addr);
- err = sock_create_kern(current->nsproxy->net_ns, PF_UNIX,
- SOCK_STREAM, 0, &csocket);
+ err = __sock_create_kern(current->nsproxy->net_ns, PF_UNIX,
+ SOCK_STREAM, 0, &csocket);
if (err < 0) {
pr_err("%s (%d): problem creating socket\n",
__func__, task_pid_nr(current));
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 20ea7dba0a9a..7ee7203aae22 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -200,7 +200,8 @@ static int rfcomm_l2sock_create(struct socket **sock)
BT_DBG("");
- err = sock_create_kern(&init_net, PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP, sock);
+ err = __sock_create_kern(&init_net, PF_BLUETOOTH, SOCK_SEQPACKET,
+ BTPROTO_L2CAP, sock);
if (!err) {
struct sock *sk = (*sock)->sk;
sk->sk_data_ready = rfcomm_l2data_ready;
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index d1b5705dc0c6..84da1ca9ce82 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -442,10 +442,10 @@ int ceph_tcp_connect(struct ceph_connection *con)
ceph_pr_addr(&con->peer_addr));
BUG_ON(con->sock);
- /* sock_create_kern() allocates with GFP_KERNEL */
+ /* __sock_create_kern() allocates with GFP_KERNEL */
noio_flag = memalloc_noio_save();
- ret = sock_create_kern(read_pnet(&con->msgr->net), ss.ss_family,
- SOCK_STREAM, IPPROTO_TCP, &sock);
+ ret = __sock_create_kern(read_pnet(&con->msgr->net), ss.ss_family,
+ SOCK_STREAM, IPPROTO_TCP, &sock);
memalloc_noio_restore(noio_flag);
if (ret)
return ret;
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 4f300504f3e5..d78fc3a8520d 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -145,7 +145,7 @@ static void handshake_req_alloc_case(struct kunit *test)
static int handshake_sock_create(struct socket **sock)
{
- return sock_create_kern(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, sock);
+ return __sock_create_kern(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP, sock);
}
static void handshake_req_submit_test1(struct kunit *test)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 76e38092cd8a..9b666648d621 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1631,7 +1631,7 @@ int inet_ctl_sock_create(struct sock **sk, unsigned short family,
struct net *net)
{
struct socket *sock;
- int rc = sock_create_kern(net, family, type, protocol, &sock);
+ int rc = __sock_create_kern(net, family, type, protocol, &sock);
if (rc == 0) {
*sk = sock->sk;
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index 2326548997d3..6fd3f1df882b 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -15,7 +15,7 @@ int udp_sock_create4(struct net *net, struct udp_port_cfg *cfg,
struct socket *sock = NULL;
struct sockaddr_in udp_addr;
- err = sock_create_kern(net, AF_INET, SOCK_DGRAM, 0, &sock);
+ err = __sock_create_kern(net, AF_INET, SOCK_DGRAM, 0, &sock);
if (err < 0)
goto error;
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index c99053189ea8..34ba859d82b9 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -21,7 +21,7 @@ int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
int err;
struct socket *sock = NULL;
- err = sock_create_kern(net, AF_INET6, SOCK_DGRAM, 0, &sock);
+ err = __sock_create_kern(net, AF_INET6, SOCK_DGRAM, 0, &sock);
if (err < 0)
goto error;
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 369a2f2e459c..0f347775a8b4 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1494,8 +1494,8 @@ static int l2tp_tunnel_sock_create(struct net *net,
if (cfg->local_ip6 && cfg->peer_ip6) {
struct sockaddr_l2tpip6 ip6_addr = {0};
- err = sock_create_kern(net, AF_INET6, SOCK_DGRAM,
- IPPROTO_L2TP, &sock);
+ err = __sock_create_kern(net, AF_INET6, SOCK_DGRAM,
+ IPPROTO_L2TP, &sock);
if (err < 0)
goto out;
@@ -1522,8 +1522,8 @@ static int l2tp_tunnel_sock_create(struct net *net,
{
struct sockaddr_l2tpip ip_addr = {0};
- err = sock_create_kern(net, AF_INET, SOCK_DGRAM,
- IPPROTO_L2TP, &sock);
+ err = __sock_create_kern(net, AF_INET, SOCK_DGRAM,
+ IPPROTO_L2TP, &sock);
if (err < 0)
goto out;
diff --git a/net/mctp/test/route-test.c b/net/mctp/test/route-test.c
index 06c1897b685a..faa6f682f078 100644
--- a/net/mctp/test/route-test.c
+++ b/net/mctp/test/route-test.c
@@ -310,7 +310,7 @@ static void __mctp_route_test_init(struct kunit *test,
rt = mctp_test_create_route(&init_net, dev->mdev, 8, 68);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
- rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
+ rc = __sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
KUNIT_ASSERT_EQ(test, rc, 0);
addr.smctp_family = AF_MCTP;
@@ -568,7 +568,7 @@ static void mctp_test_route_input_sk_keys(struct kunit *test)
rt = mctp_test_create_route(&init_net, dev->mdev, 8, 68);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
- rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
+ rc = __sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
KUNIT_ASSERT_EQ(test, rc, 0);
msk = container_of(sock->sk, struct mctp_sock, sk);
@@ -1186,7 +1186,7 @@ static void mctp_test_route_output_key_create(struct kunit *test)
rt = mctp_test_create_route(&init_net, dev->mdev, dst, 68);
KUNIT_ASSERT_NOT_ERR_OR_NULL(test, rt);
- rc = sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
+ rc = __sock_create_kern(&init_net, AF_MCTP, SOCK_DGRAM, 0, &sock);
KUNIT_ASSERT_EQ(test, rc, 0);
dev->mdev->addrs = kmalloc(sizeof(u8), GFP_KERNEL);
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index d39e7c178460..a7467497de0f 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -637,8 +637,8 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
int backlog = 1024;
int err;
- err = sock_create_kern(sock_net(sk), entry->addr.family,
- SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
+ err = __sock_create_kern(sock_net(sk), entry->addr.family,
+ SOCK_STREAM, IPPROTO_MPTCP, &entry->lsk);
if (err)
return err;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 15613d691bfe..602e689e991f 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1757,7 +1757,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
if (unlikely(!sk->sk_socket))
return -EINVAL;
- err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
+ err = __sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
if (err)
return err;
@@ -1948,7 +1948,7 @@ static int subflow_ulp_init(struct sock *sk)
int err = 0;
/* disallow attaching ULP to a socket unless it has been
- * created with sock_create_kern()
+ * created with __sock_create_kern()
*/
if (!sk->sk_kern_sock) {
err = -EOPNOTSUPP;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 3402675bf521..6c55471846cb 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1470,8 +1470,8 @@ static int make_send_sock(struct netns_ipvs *ipvs, int id,
int result, salen;
/* First create a socket */
- result = sock_create_kern(ipvs->net, ipvs->mcfg.mcast_af, SOCK_DGRAM,
- IPPROTO_UDP, &sock);
+ result = __sock_create_kern(ipvs->net, ipvs->mcfg.mcast_af, SOCK_DGRAM,
+ IPPROTO_UDP, &sock);
if (result < 0) {
pr_err("Error during creation of socket; terminating\n");
goto error;
@@ -1527,8 +1527,8 @@ static int make_receive_sock(struct netns_ipvs *ipvs, int id,
int result, salen;
/* First create a socket */
- result = sock_create_kern(ipvs->net, ipvs->bcfg.mcast_af, SOCK_DGRAM,
- IPPROTO_UDP, &sock);
+ result = __sock_create_kern(ipvs->net, ipvs->bcfg.mcast_af, SOCK_DGRAM,
+ IPPROTO_UDP, &sock);
if (result < 0) {
pr_err("Error during creation of socket; terminating\n");
goto error;
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index 3de9350cbf30..3496357b8650 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -692,8 +692,8 @@ int qrtr_ns_init(void)
INIT_LIST_HEAD(&qrtr_ns.lookups);
INIT_WORK(&qrtr_ns.work, qrtr_ns_worker);
- ret = sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
- PF_QIPCRTR, &qrtr_ns.sock);
+ ret = __sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM,
+ PF_QIPCRTR, &qrtr_ns.sock);
if (ret < 0)
return ret;
@@ -735,7 +735,7 @@ int qrtr_ns_init(void)
* qrtr module is inserted successfully.
*
* However, the reference count is increased twice in
- * sock_create_kern(): one is to increase the reference count of owner
+ * __sock_create_kern(): one is to increase the reference count of owner
* of qrtr socket's proto_ops struct; another is to increment the
* reference count of owner of qrtr proto struct. Therefore, we must
* decrement the module reference count twice to ensure that it keeps
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index a0046e99d6df..717e76e16a23 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -112,12 +112,12 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
return 0;
}
if (ipv6_addr_v4mapped(&conn->c_laddr)) {
- ret = sock_create_kern(rds_conn_net(conn), PF_INET,
- SOCK_STREAM, IPPROTO_TCP, &sock);
+ ret = __sock_create_kern(rds_conn_net(conn), PF_INET,
+ SOCK_STREAM, IPPROTO_TCP, &sock);
isv6 = false;
} else {
- ret = sock_create_kern(rds_conn_net(conn), PF_INET6,
- SOCK_STREAM, IPPROTO_TCP, &sock);
+ ret = __sock_create_kern(rds_conn_net(conn), PF_INET6,
+ SOCK_STREAM, IPPROTO_TCP, &sock);
isv6 = true;
}
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index d89bd8d0c354..9569b85fc596 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -278,8 +278,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6)
int addr_len;
int ret;
- ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
- IPPROTO_TCP, &sock);
+ ret = __sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
+ IPPROTO_TCP, &sock);
if (ret < 0) {
rdsdebug("could not create %s listener socket: %d\n",
isv6 ? "IPv6" : "IPv4", ret);
diff --git a/net/rxrpc/rxperf.c b/net/rxrpc/rxperf.c
index 0377301156b0..40af834a7ff7 100644
--- a/net/rxrpc/rxperf.c
+++ b/net/rxrpc/rxperf.c
@@ -188,8 +188,8 @@ static int rxperf_open_socket(void)
struct socket *socket;
int ret;
- ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
- &socket);
+ ret = __sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET6,
+ &socket);
if (ret < 0)
goto error_1;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 90b75d4ec329..3249e0680235 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1329,7 +1329,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk, struct sockaddr *kaddrs,
return err;
/* in-kernel sockets don't generally have a file allocated to them
- * if all they do is call sock_create_kern().
+ * if all they do is call __sock_create_kern().
*/
if (sk->sk_socket->file)
flags = sk->sk_socket->file->f_flags;
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 3760131f1484..d998ffed1712 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -3331,8 +3331,8 @@ int smc_create_clcsk(struct net *net, struct sock *sk, int family)
struct smc_sock *smc = smc_sk(sk);
int rc;
- rc = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
- &smc->clcsock);
+ rc = __sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
+ &smc->clcsock);
if (rc)
return rc;
diff --git a/net/smc/smc_inet.c b/net/smc/smc_inet.c
index a944e7dcb8b9..5dba8c0aa9fc 100644
--- a/net/smc/smc_inet.c
+++ b/net/smc/smc_inet.c
@@ -111,7 +111,7 @@ static struct inet_protosw smc_inet6_protosw = {
static unsigned int smc_sync_mss(struct sock *sk, u32 pmtu)
{
/* No need pass it through to clcsock, mss can always be set by
- * sock_create_kern or smc_setsockopt.
+ * __sock_create_kern or smc_setsockopt.
*/
return 0;
}
diff --git a/net/socket.c b/net/socket.c
index 241d9767ae69..7c4474c966c0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1600,22 +1600,37 @@ int sock_create(int family, int type, int protocol, struct socket **res)
EXPORT_SYMBOL(sock_create);
/**
- * sock_create_kern - creates a socket (kernel space)
- * @net: net namespace
- * @family: protocol family (AF_INET, ...)
- * @type: communication type (SOCK_STREAM, ...)
- * @protocol: protocol (0, ...)
- * @res: new socket
+ * __sock_create_kern - creates a socket for kernel space
*
- * A wrapper around __sock_create().
- * Returns 0 or an error. This function internally uses GFP_KERNEL.
+ * @net: net namespace
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ *
+ * Creates a new socket and assigns it to @res.
+ *
+ * The socket is for kernel space and should not be exposed to
+ * userspace via a file descriptor nor BPF hooks except for LSM
+ * (see inet_create(), inet_release(), etc).
+ *
+ * The socket bypasses some LSMs that take care of @kern in
+ * security_socket_create() and security_socket_post_create().
+ *
+ * The socket **DOES NOT** hold a reference count of @net to allow
+ * it to be removed; the caller MUST ensure that the socket is always
+ * freed before @net.
+ *
+ * @net MUST be alive as of calling __sock_create_kern().
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error.
*/
-
-int sock_create_kern(struct net *net, int family, int type, int protocol, struct socket **res)
+int __sock_create_kern(struct net *net, int family, int type, int protocol, struct socket **res)
{
return __sock_create(net, family, type, protocol, res, 1);
}
-EXPORT_SYMBOL(sock_create_kern);
+EXPORT_SYMBOL(__sock_create_kern);
static struct socket *__sys_socket_create(int family, int type, int protocol)
{
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index f9f340171530..e567776a53ab 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1455,8 +1455,8 @@ static int rpc_sockname(struct net *net, struct sockaddr *sap, size_t salen,
struct socket *sock;
int err;
- err = sock_create_kern(net, sap->sa_family,
- SOCK_DGRAM, IPPROTO_UDP, &sock);
+ err = __sock_create_kern(net, sap->sa_family,
+ SOCK_DGRAM, IPPROTO_UDP, &sock);
if (err < 0) {
dprintk("RPC: can't create UDP socket (%d)\n", err);
goto out;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index e2c69ab17ac5..adacfd03153a 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1516,7 +1516,7 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
return ERR_PTR(-EINVAL);
}
- error = sock_create_kern(net, family, type, protocol, &sock);
+ error = __sock_create_kern(net, family, type, protocol, &sock);
if (error < 0)
return ERR_PTR(error);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 5ffe88145193..6fb921ce6cf2 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1924,7 +1924,7 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
struct socket *sock;
int err;
- err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
+ err = __sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
if (err < 0) {
dprintk("RPC: can't create %d transport socket (%d).\n",
protocol, -err);
@@ -1999,8 +1999,8 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
struct socket *sock;
int status;
- status = sock_create_kern(xprt->xprt_net, AF_LOCAL,
- SOCK_STREAM, 0, &sock);
+ status = __sock_create_kern(xprt->xprt_net, AF_LOCAL,
+ SOCK_STREAM, 0, &sock);
if (status < 0) {
dprintk("RPC: can't create AF_LOCAL "
"transport socket (%d).\n", -status);
diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
index 8ee0c07d00e9..f970659a04f1 100644
--- a/net/tipc/topsrv.c
+++ b/net/tipc/topsrv.c
@@ -515,7 +515,7 @@ static int tipc_topsrv_create_listener(struct tipc_topsrv *srv)
struct sock *sk;
int rc;
- rc = sock_create_kern(srv->net, AF_TIPC, SOCK_SEQPACKET, 0, &lsock);
+ rc = __sock_create_kern(srv->net, AF_TIPC, SOCK_SEQPACKET, 0, &lsock);
if (rc < 0)
return rc;
@@ -553,7 +553,7 @@ static int tipc_topsrv_create_listener(struct tipc_topsrv *srv)
* after TIPC module is inserted successfully.
*
* However, the reference count is ever increased twice in
- * sock_create_kern(): one is to increase the reference count of owner
+ * __sock_create_kern(): one is to increase the reference count of owner
* of TIPC socket's proto_ops struct; another is to increment the
* reference count of owner of TIPC proto struct. Therefore, we must
* decrement the module reference count twice to ensure that it keeps
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 98a7298e427d..22607a34be71 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13750,8 +13750,8 @@ static int nl80211_parse_wowlan_tcp(struct cfg80211_registered_device *rdev,
port = nla_get_u16_default(tb[NL80211_WOWLAN_TCP_SRC_PORT], 0);
#ifdef CONFIG_INET
/* allocate a socket and port for it and use it */
- err = sock_create_kern(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
- IPPROTO_TCP, &cfg->sock);
+ err = __sock_create_kern(wiphy_net(&rdev->wiphy), PF_INET, SOCK_STREAM,
+ IPPROTO_TCP, &cfg->sock);
if (err) {
kfree(cfg);
return err;
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index 3220f1d28697..a2351a92069d 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -804,8 +804,8 @@ __bpf_kfunc int bpf_kfunc_init_sock(struct init_sock_args *args)
goto out;
}
- err = sock_create_kern(current->nsproxy->net_ns, args->af, args->type,
- proto, &sock);
+ err = __sock_create_kern(current->nsproxy->net_ns, args->af, args->type,
+ proto, &sock);
if (!err)
/* Set timeout for call to kernel_connect() to prevent it from hanging,
--
2.49.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 net-next 3/7] socket: Restore sock_create_kern().
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 1/7] socket: Un-export __sock_create() Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern() Kuniyuki Iwashima
@ 2025-05-23 18:21 ` Kuniyuki Iwashima
2025-05-26 5:32 ` Christoph Hellwig
2025-05-23 18:21 ` [PATCH v2 net-next 4/7] smb: client: Add missing net_passive_dec() Kuniyuki Iwashima
` (3 subsequent siblings)
6 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
Let's restore sock_create_kern() that holds a netns reference.
Now, it's the same as the version before commit 26abe14379f8 ("net:
Modify sk_alloc to not reference count the netns of kernel sockets.").
Back then, after creating a socket in init_net, we used sk_change_net()
to drop the netns ref and switch to another netns, but now we can
simply use __sock_create_kern() instead.
$ git blame -L:sk_change_net include/net/sock.h 26abe14379f8~
DEBUG_NET_WARN_ON_ONCE() is to catch a path calling sock_create_kern()
from __net_init functions, since doing so would leak the netns as
__net_exit functions cannot run until the socket is removed.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
v2: s/ret/err/ in sock_create_kern() for clarity
---
include/linux/net.h | 2 ++
net/socket.c | 42 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/include/linux/net.h b/include/linux/net.h
index 12180e00f882..b60e3afab344 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -254,6 +254,8 @@ bool sock_is_registered(int family);
int sock_create(int family, int type, int proto, struct socket **res);
int __sock_create_kern(struct net *net, int family, int type, int proto,
struct socket **res);
+int sock_create_kern(struct net *net, int family, int type, int proto,
+ struct socket **res);
int sock_create_lite(int family, int type, int proto, struct socket **res);
struct socket *sock_alloc(void);
void sock_release(struct socket *sock);
diff --git a/net/socket.c b/net/socket.c
index 7c4474c966c0..9ad352183fae 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1632,6 +1632,48 @@ int __sock_create_kern(struct net *net, int family, int type, int protocol, stru
}
EXPORT_SYMBOL(__sock_create_kern);
+/**
+ * sock_create_kern - creates a socket for kernel space
+ *
+ * @net: net namespace
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ *
+ * Creates a new socket and assigns it to @res.
+ *
+ * The socket is for kernel space and should not be exposed to
+ * userspace via a file descriptor nor BPF hooks except for LSM
+ * (see inet_create(), inet_release(), etc).
+ *
+ * The socket bypasses some LSMs that take care of @kern in
+ * security_socket_create() and security_socket_post_create().
+ *
+ * The socket holds a reference count of @net so that the caller
+ * does not need to care about @net's lifetime.
+ *
+ * This MUST NOT be called from the __net_init path and @net MUST
+ * be alive as of calling sock_create_kern().
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error.
+ */
+int sock_create_kern(struct net *net, int family, int type, int protocol,
+ struct socket **res)
+{
+ int err;
+
+ DEBUG_NET_WARN_ON_ONCE(!net_initialized(net));
+
+ err = __sock_create(net, family, type, protocol, res, 1);
+ if (!err)
+ sk_net_refcnt_upgrade((*res)->sk);
+
+ return err;
+}
+EXPORT_SYMBOL(sock_create_kern);
+
static struct socket *__sys_socket_create(int family, int type, int protocol)
{
struct socket *sock;
--
2.49.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 net-next 4/7] smb: client: Add missing net_passive_dec().
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
` (2 preceding siblings ...)
2025-05-23 18:21 ` [PATCH v2 net-next 3/7] socket: Restore sock_create_kern() Kuniyuki Iwashima
@ 2025-05-23 18:21 ` Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/ Kuniyuki Iwashima
` (2 subsequent siblings)
6 siblings, 0 replies; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme, stable
While reverting commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock
after rmmod"), I should have added net_passive_dec(), which was added
between the original commit and the revert by commit 5c70eb5c593d ("net:
better track kernel sockets lifetime").
Let's call net_passive_dec() in generic_ip_connect().
Note that this commit is only needed for 6.14+.
Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
fs/smb/client/connect.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 37a2ba38f10e..afac23a5a3ec 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3359,6 +3359,7 @@ generic_ip_connect(struct TCP_Server_Info *server)
sk = server->ssocket->sk;
__netns_tracker_free(net, &sk->ns_tracker, false);
+ net_passive_dec(net);
sk->sk_net_refcnt = 1;
get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
sock_inuse_add(net, 1);
--
2.49.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/.
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
` (3 preceding siblings ...)
2025-05-23 18:21 ` [PATCH v2 net-next 4/7] smb: client: Add missing net_passive_dec() Kuniyuki Iwashima
@ 2025-05-23 18:21 ` Kuniyuki Iwashima
2025-05-26 5:33 ` Christoph Hellwig
2025-05-23 18:21 ` [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern() Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 7/7] socket: Clean up kdoc for sock_create() and sock_create_lite() Kuniyuki Iwashima
6 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
count the netns of kernel sockets."), TCP kernel socket has caused
many UAF.
We have converted such sockets to hold netns refcnt, and we have
the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.
__sock_create_kern(..., &sock);
sk_net_refcnt_upgrade(sock->sk);
Let's drop the conversion and use sock_create_kern() instead.
The changes for cifs, mptcp, nvme, and smc are straightforward.
For sunrpc, we call sk_net_refcnt_upgrade() for IPPROTO_TCP only
so we use sock_create_kern() for TCP and use __sock_create_kern()
for others.
For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
sockets.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> # net/mptcp
Acked-by: Chuck Lever <chuck.lever@oracle.com>
---
v2: Drop unnecessary change for sunrpc and updated changelog for sunrpc
---
drivers/nvme/host/tcp.c | 7 +++----
fs/smb/client/connect.c | 12 ++----------
net/mptcp/subflow.c | 7 +------
net/smc/af_smc.c | 18 ++----------------
net/sunrpc/svcsock.c | 6 ++++--
net/sunrpc/xprtsock.c | 8 ++++----
6 files changed, 16 insertions(+), 42 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3d3bdc5e280f..fabb1cc02564 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1756,9 +1756,9 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->cmnd_capsule_len = sizeof(struct nvme_command) +
NVME_TCP_ADMIN_CCSZ;
- ret = __sock_create_kern(current->nsproxy->net_ns,
- ctrl->addr.ss_family, SOCK_STREAM,
- IPPROTO_TCP, &queue->sock);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ ctrl->addr.ss_family, SOCK_STREAM,
+ IPPROTO_TCP, &queue->sock);
if (ret) {
dev_err(nctrl->device,
"failed to create socket: %d\n", ret);
@@ -1771,7 +1771,6 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
goto err_destroy_mutex;
}
- sk_net_refcnt_upgrade(queue->sock->sk);
nvme_tcp_reclassify_socket(queue->sock);
/* Single syn retry */
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index afac23a5a3ec..c7b4f5a7cca1 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3348,22 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
socket = server->ssocket;
} else {
struct net *net = cifs_net_ns(server);
- struct sock *sk;
- rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
- IPPROTO_TCP, &server->ssocket);
+ rc = sock_create_kern(net, sfamily, SOCK_STREAM,
+ IPPROTO_TCP, &server->ssocket);
if (rc < 0) {
cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
return rc;
}
- sk = server->ssocket->sk;
- __netns_tracker_free(net, &sk->ns_tracker, false);
- net_passive_dec(net);
- sk->sk_net_refcnt = 1;
- get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
- sock_inuse_add(net, 1);
-
/* BB other socket options to set KEEPALIVE, NODELAY? */
cifs_dbg(FYI, "Socket created\n");
socket = server->ssocket;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 602e689e991f..00e5cecb7683 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1757,7 +1757,7 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
if (unlikely(!sk->sk_socket))
return -EINVAL;
- err = __sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
+ err = sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP, &sf);
if (err)
return err;
@@ -1770,11 +1770,6 @@ int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
/* the newly created socket has to be in the same cgroup as its parent */
mptcp_attach_cgroup(sk, sf->sk);
- /* kernel sockets do not by default acquire net ref, but TCP timer
- * needs it.
- * Update ns_tracker to current stack trace and refcounted tracker.
- */
- sk_net_refcnt_upgrade(sf->sk);
err = tcp_set_ulp(sf->sk, "mptcp");
if (err)
goto err_free;
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index d998ffed1712..6140a9e386d0 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -3328,22 +3328,8 @@ static const struct proto_ops smc_sock_ops = {
int smc_create_clcsk(struct net *net, struct sock *sk, int family)
{
- struct smc_sock *smc = smc_sk(sk);
- int rc;
-
- rc = __sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
- &smc->clcsock);
- if (rc)
- return rc;
-
- /* smc_clcsock_release() does not wait smc->clcsock->sk's
- * destruction; its sk_state might not be TCP_CLOSE after
- * smc->sk is close()d, and TCP timers can be fired later,
- * which need net ref.
- */
- sk = smc->clcsock->sk;
- sk_net_refcnt_upgrade(sk);
- return 0;
+ return sock_create_kern(net, family, SOCK_STREAM, IPPROTO_TCP,
+ &smc_sk(sk)->clcsock);
}
static int __smc_create(struct net *net, struct socket *sock, int protocol,
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index adacfd03153a..10d83a03ccfa 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1516,7 +1516,10 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
return ERR_PTR(-EINVAL);
}
- error = __sock_create_kern(net, family, type, protocol, &sock);
+ if (protocol == IPPROTO_TCP)
+ error = sock_create_kern(net, family, type, protocol, &sock);
+ else
+ error = __sock_create_kern(net, family, type, protocol, &sock);
if (error < 0)
return ERR_PTR(error);
@@ -1541,7 +1544,6 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv,
newlen = error;
if (protocol == IPPROTO_TCP) {
- sk_net_refcnt_upgrade(sock->sk);
if ((error = kernel_listen(sock, 64)) < 0)
goto bummer;
}
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 6fb921ce6cf2..f9576bd8f9c5 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1924,7 +1924,10 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
struct socket *sock;
int err;
- err = __sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
+ if (protocol == IPPROTO_TCP)
+ err = sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
+ else
+ err = __sock_create_kern(xprt->xprt_net, family, type, protocol, &sock);
if (err < 0) {
dprintk("RPC: can't create %d transport socket (%d).\n",
protocol, -err);
@@ -1941,9 +1944,6 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt,
goto out;
}
- if (protocol == IPPROTO_TCP)
- sk_net_refcnt_upgrade(sock->sk);
-
filp = sock_alloc_file(sock, O_NONBLOCK, NULL);
if (IS_ERR(filp))
return ERR_CAST(filp);
--
2.49.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
` (4 preceding siblings ...)
2025-05-23 18:21 ` [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/ Kuniyuki Iwashima
@ 2025-05-23 18:21 ` Kuniyuki Iwashima
2025-05-26 5:33 ` Christoph Hellwig
2025-05-26 5:35 ` Christoph Hellwig
2025-05-23 18:21 ` [PATCH v2 net-next 7/7] socket: Clean up kdoc for sock_create() and sock_create_lite() Kuniyuki Iwashima
6 siblings, 2 replies; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
Except for only one user, sctp_do_peeloff(), all sockets created
by drivers and fs are not tied to userspace processes nor exposed
via file descriptors.
Let's use sock_create_kern() for such in-kernel use cases as CIFS
client and NFS.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
drivers/infiniband/hw/erdma/erdma_cm.c | 6 ++++--
drivers/infiniband/sw/siw/siw_cm.c | 6 ++++--
drivers/isdn/mISDN/l1oip_core.c | 3 ++-
drivers/nvme/target/tcp.c | 5 +++--
drivers/target/iscsi/iscsi_target_login.c | 7 ++++---
drivers/xen/pvcalls-back.c | 6 ++++--
fs/ocfs2/cluster/tcp.c | 8 +++++---
fs/smb/server/transport_tcp.c | 7 ++++---
8 files changed, 30 insertions(+), 18 deletions(-)
diff --git a/drivers/infiniband/hw/erdma/erdma_cm.c b/drivers/infiniband/hw/erdma/erdma_cm.c
index e0acc185e719..cec758cec7fd 100644
--- a/drivers/infiniband/hw/erdma/erdma_cm.c
+++ b/drivers/infiniband/hw/erdma/erdma_cm.c
@@ -1026,7 +1026,8 @@ int erdma_connect(struct iw_cm_id *id, struct iw_cm_conn_param *params)
return -ENOENT;
erdma_qp_get(qp);
- ret = sock_create(AF_INET, SOCK_STREAM, IPPROTO_TCP, &s);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ AF_INET, SOCK_STREAM, IPPROTO_TCP, &s);
if (ret < 0)
goto error_put_qp;
@@ -1305,7 +1306,8 @@ int erdma_create_listen(struct iw_cm_id *id, int backlog)
if (addr_family != AF_INET)
return -EAFNOSUPPORT;
- ret = sock_create(addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
if (ret < 0)
return ret;
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index 708b13993fdf..bea948640aba 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -1391,7 +1391,8 @@ int siw_connect(struct iw_cm_id *id, struct iw_cm_conn_param *params)
siw_dbg_qp(qp, "pd_len %d, laddr %pISp, raddr %pISp\n", pd_len, laddr,
raddr);
- rv = sock_create(v4 ? AF_INET : AF_INET6, SOCK_STREAM, IPPROTO_TCP, &s);
+ rv = sock_create_kern(current->nsproxy->net_ns,
+ v4 ? AF_INET : AF_INET6, SOCK_STREAM, IPPROTO_TCP, &s);
if (rv < 0)
goto error;
@@ -1767,7 +1768,8 @@ int siw_create_listen(struct iw_cm_id *id, int backlog)
if (addr_family != AF_INET && addr_family != AF_INET6)
return -EAFNOSUPPORT;
- rv = sock_create(addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
+ rv = sock_create_kern(current->nsproxy->net_ns,
+ addr_family, SOCK_STREAM, IPPROTO_TCP, &s);
if (rv < 0)
return rv;
diff --git a/drivers/isdn/mISDN/l1oip_core.c b/drivers/isdn/mISDN/l1oip_core.c
index a5ad88a960d0..1451ec859a32 100644
--- a/drivers/isdn/mISDN/l1oip_core.c
+++ b/drivers/isdn/mISDN/l1oip_core.c
@@ -659,7 +659,8 @@ l1oip_socket_thread(void *data)
allow_signal(SIGTERM);
/* create socket */
- if (sock_create(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &socket)) {
+ if (sock_create_kern(current->nsproxy->net_ns,
+ PF_INET, SOCK_DGRAM, IPPROTO_UDP, &socket)) {
printk(KERN_ERR "%s: Failed to create socket.\n", __func__);
ret = -EIO;
goto fail;
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 12a5cb8641ca..4e499df746f4 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -2078,8 +2078,9 @@ static int nvmet_tcp_add_port(struct nvmet_port *nport)
if (port->nport->inline_data_size < 0)
port->nport->inline_data_size = NVMET_TCP_DEF_INLINE_DATA_SIZE;
- ret = sock_create(port->addr.ss_family, SOCK_STREAM,
- IPPROTO_TCP, &port->sock);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ port->addr.ss_family, SOCK_STREAM,
+ IPPROTO_TCP, &port->sock);
if (ret) {
pr_err("failed to create a socket\n");
goto err_port;
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index c2ac9a99ebbb..c085a3aaca6e 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -796,10 +796,11 @@ int iscsit_setup_np(
return -EINVAL;
}
- ret = sock_create(sockaddr->ss_family, np->np_sock_type,
- np->np_ip_proto, &sock);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ sockaddr->ss_family, np->np_sock_type,
+ np->np_ip_proto, &sock);
if (ret < 0) {
- pr_err("sock_create() failed.\n");
+ pr_err("sock_create_kern() failed.\n");
return ret;
}
np->np_socket = sock;
diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index fd7ed65e0197..c404678e1924 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -406,7 +406,8 @@ static int pvcalls_back_connect(struct xenbus_device *dev,
sa->sa_family != AF_INET)
goto out;
- ret = sock_create(AF_INET, SOCK_STREAM, 0, &sock);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ AF_INET, SOCK_STREAM, 0, &sock);
if (ret < 0)
goto out;
ret = inet_stream_connect(sock, sa, req->u.connect.len, 0);
@@ -646,7 +647,8 @@ static int pvcalls_back_bind(struct xenbus_device *dev,
goto out;
}
- ret = sock_create(AF_INET, SOCK_STREAM, 0, &map->sock);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ AF_INET, SOCK_STREAM, 0, &map->sock);
if (ret < 0)
goto out;
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index fce9beb214f0..491916662561 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1558,7 +1558,7 @@ static void o2net_start_connect(struct work_struct *work)
unsigned int nofs_flag;
/*
- * sock_create allocates the sock with GFP_KERNEL. We must
+ * sock_create_kern() allocates the sock with GFP_KERNEL. We must
* prevent the filesystem from being reentered by memory reclaim.
*/
nofs_flag = memalloc_nofs_save();
@@ -1600,7 +1600,8 @@ static void o2net_start_connect(struct work_struct *work)
goto out;
}
- ret = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
if (ret < 0) {
mlog(0, "can't create socket: %d\n", ret);
goto out;
@@ -1984,7 +1985,8 @@ static int o2net_open_listening_sock(__be32 addr, __be16 port)
.sin_port = port,
};
- ret = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
+ ret = sock_create_kern(current->nsproxy->net_ns,
+ PF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
if (ret < 0) {
printk(KERN_ERR "o2net: Error %d while creating socket\n", ret);
goto out;
diff --git a/fs/smb/server/transport_tcp.c b/fs/smb/server/transport_tcp.c
index abedf510899a..e1e9cbe5742f 100644
--- a/fs/smb/server/transport_tcp.c
+++ b/fs/smb/server/transport_tcp.c
@@ -427,18 +427,19 @@ static void tcp_destroy_socket(struct socket *ksmbd_socket)
*/
static int create_socket(struct interface *iface)
{
+ struct net *net = current->nsproxy->net_ns;
int ret;
struct sockaddr_in6 sin6;
struct sockaddr_in sin;
struct socket *ksmbd_socket;
bool ipv4 = false;
- ret = sock_create(PF_INET6, SOCK_STREAM, IPPROTO_TCP, &ksmbd_socket);
+ ret = sock_create_kern(net, PF_INET6, SOCK_STREAM, IPPROTO_TCP, &ksmbd_socket);
if (ret) {
if (ret != -EAFNOSUPPORT)
pr_err("Can't create socket for ipv6, fallback to ipv4: %d\n", ret);
- ret = sock_create(PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &ksmbd_socket);
+ ret = sock_create_kern(net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
+ &ksmbd_socket);
if (ret) {
pr_err("Can't create socket for ipv4: %d\n", ret);
goto out_clear;
--
2.49.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 net-next 7/7] socket: Clean up kdoc for sock_create() and sock_create_lite().
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
` (5 preceding siblings ...)
2025-05-23 18:21 ` [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern() Kuniyuki Iwashima
@ 2025-05-23 18:21 ` Kuniyuki Iwashima
6 siblings, 0 replies; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-23 18:21 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn
Cc: Simon Horman, Kuniyuki Iwashima, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
__sock_create() is now static and the same doc exists on sock_create()
and sock_create_kern().
Also, __sock_create() says "On failure @res is set to %NULL.", but
this is always false.
In addition, the old style kdoc is a bit corrupted and we can't see the
DESCRIPTION section:
$ scripts/kernel-doc -man net/socket.c | scripts/split-man.pl /tmp/man
$ man /tmp/man/sock_create.9
Let's clean them up.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/socket.c | 58 ++++++++++++++++++++++------------------------------
1 file changed, 25 insertions(+), 33 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index 9ad352183fae..e4e9f5cc5d70 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1315,18 +1315,20 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
}
/**
- * sock_create_lite - creates a socket
- * @family: protocol family (AF_INET, ...)
- * @type: communication type (SOCK_STREAM, ...)
- * @protocol: protocol (0, ...)
- * @res: new socket
+ * sock_create_lite - creates a socket
*
- * Creates a new socket and assigns it to @res, passing through LSM.
- * The new socket initialization is not complete, see kernel_accept().
- * Returns 0 or an error. On failure @res is set to %NULL.
- * This function internally uses GFP_KERNEL.
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
+ *
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ *
+ * The new socket initialization is not complete, see kernel_accept().
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error. On failure @res is set to %NULL.
*/
-
int sock_create_lite(int family, int type, int protocol, struct socket **res)
{
int err;
@@ -1452,21 +1454,6 @@ int sock_wake_async(struct socket_wq *wq, int how, int band)
}
EXPORT_SYMBOL(sock_wake_async);
-/**
- * __sock_create - creates a socket
- * @net: net namespace
- * @family: protocol family (AF_INET, ...)
- * @type: communication type (SOCK_STREAM, ...)
- * @protocol: protocol (0, ...)
- * @res: new socket
- * @kern: boolean for kernel space sockets
- *
- * Creates a new socket and assigns it to @res, passing through LSM.
- * Returns 0 or an error. On failure @res is set to %NULL. @kern must
- * be set to true if the socket resides in kernel space.
- * This function internally uses GFP_KERNEL.
- */
-
static int __sock_create(struct net *net, int family, int type, int protocol,
struct socket **res, int kern)
{
@@ -1583,16 +1570,21 @@ static int __sock_create(struct net *net, int family, int type, int protocol,
}
/**
- * sock_create - creates a socket
- * @family: protocol family (AF_INET, ...)
- * @type: communication type (SOCK_STREAM, ...)
- * @protocol: protocol (0, ...)
- * @res: new socket
+ * sock_create - creates a socket for userspace
+ *
+ * @family: protocol family (AF_INET, ...)
+ * @type: communication type (SOCK_STREAM, ...)
+ * @protocol: protocol (0, ...)
+ * @res: new socket
*
- * A wrapper around __sock_create().
- * Returns 0 or an error. This function internally uses GFP_KERNEL.
+ * Creates a new socket and assigns it to @res, passing through LSM.
+ *
+ * The socket is for userspace and should be exposed via a file
+ * descriptor and BPF hooks (see inet_create(), inet_release(), etc).
+ *
+ * Context: Process context. This function internally uses GFP_KERNEL.
+ * Return: 0 or an error.
*/
-
int sock_create(int family, int type, int protocol, struct socket **res)
{
return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
--
2.49.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 1/7] socket: Un-export __sock_create().
2025-05-23 18:21 ` [PATCH v2 net-next 1/7] socket: Un-export __sock_create() Kuniyuki Iwashima
@ 2025-05-26 5:29 ` Christoph Hellwig
2025-05-26 10:06 ` David Laight
2025-05-30 2:42 ` Kuniyuki Iwashima
0 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2025-05-26 5:29 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
On Fri, May 23, 2025 at 11:21:07AM -0700, Kuniyuki Iwashima wrote:
> Since commit eeb1bd5c40ed ("net: Add a struct net parameter to
> sock_create_kern"), we no longer need to export __sock_create()
> and can replace all non-core users with sock_create_kern().
>
> Let's convert them and un-export __sock_create().
The changes looks good, but the commit log including subject line
is rather confusing. What you do is to replace all uses of
__sock_create with sock_create_kern, which works because
sock_create_kern just calls __sock_create with the last argument set
to 1 as those callers do it. This then allows marking __sock_create
static because all outside users are gone.
Please state that, i.e.
Subect: use sock_create_kern insteadf of opencoding it
Replace all callers of __sock_create that set the kernel argument to 1
with sock_create_kern, which is the improve interface for that.
Mark __sock_create static now that all users outside of socket.c
are gone.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern().
2025-05-23 18:21 ` [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern() Kuniyuki Iwashima
@ 2025-05-26 5:30 ` Christoph Hellwig
2025-05-29 21:29 ` David Laight
2025-05-30 2:45 ` Kuniyuki Iwashima
0 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2025-05-26 5:30 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
On Fri, May 23, 2025 at 11:21:08AM -0700, Kuniyuki Iwashima wrote:
> Let's rename sock_create_kern() to __sock_create_kern() as a special
> API and add a fat documentation.
>
> The next patch will add sock_create_kern() that holds netns refcnt.
Maybe do this before patch 1 to reduce the churn of just touching a
lot of the same callers again?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 3/7] socket: Restore sock_create_kern().
2025-05-23 18:21 ` [PATCH v2 net-next 3/7] socket: Restore sock_create_kern() Kuniyuki Iwashima
@ 2025-05-26 5:32 ` Christoph Hellwig
2025-05-30 2:53 ` Kuniyuki Iwashima
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-05-26 5:32 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
On Fri, May 23, 2025 at 11:21:09AM -0700, Kuniyuki Iwashima wrote:
> Let's restore sock_create_kern() that holds a netns reference.
>
> Now, it's the same as the version before commit 26abe14379f8 ("net:
> Modify sk_alloc to not reference count the netns of kernel sockets.").
>
> Back then, after creating a socket in init_net, we used sk_change_net()
> to drop the netns ref and switch to another netns, but now we can
> simply use __sock_create_kern() instead.
>
> $ git blame -L:sk_change_net include/net/sock.h 26abe14379f8~
>
> DEBUG_NET_WARN_ON_ONCE() is to catch a path calling sock_create_kern()
> from __net_init functions, since doing so would leak the netns as
> __net_exit functions cannot run until the socket is removed.
Is reusing the name as the old sock_create_kern a good idea? It can
lead to bugs by people used to the old semantics. It's also
not really an all that descriptive name for either variant. I'm
not really a net stack or namespace expert, but maybe we can come
up with more descriptive version for both this new sock_create_kern
and the old sock_create_kern/__sock_create_kern?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/.
2025-05-23 18:21 ` [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/ Kuniyuki Iwashima
@ 2025-05-26 5:33 ` Christoph Hellwig
2025-05-30 2:59 ` Kuniyuki Iwashima
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-05-26 5:33 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
On Fri, May 23, 2025 at 11:21:11AM -0700, Kuniyuki Iwashima wrote:
> Let's drop the conversion and use sock_create_kern() instead.
Please send a patch per subsystem that is converted to make the
commit log better and help with bisectability.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-05-23 18:21 ` [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern() Kuniyuki Iwashima
@ 2025-05-26 5:33 ` Christoph Hellwig
2025-05-26 5:35 ` Christoph Hellwig
1 sibling, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2025-05-26 5:33 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
On Fri, May 23, 2025 at 11:21:12AM -0700, Kuniyuki Iwashima wrote:
> Except for only one user, sctp_do_peeloff(), all sockets created
> by drivers and fs are not tied to userspace processes nor exposed
> via file descriptors.
>
> Let's use sock_create_kern() for such in-kernel use cases as CIFS
> client and NFS.
Same thing, one patch per subsystem please.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-05-23 18:21 ` [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern() Kuniyuki Iwashima
2025-05-26 5:33 ` Christoph Hellwig
@ 2025-05-26 5:35 ` Christoph Hellwig
2025-05-30 3:03 ` Kuniyuki Iwashima
1 sibling, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-05-26 5:35 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, Simon Horman, Kuniyuki Iwashima, Chuck Lever,
Jeff Layton, Matthieu Baerts, Keith Busch, Jens Axboe,
Christoph Hellwig, Wenjia Zhang, Jan Karcher, Steve French,
netdev, mptcp, linux-nfs, linux-rdma, linux-nvme
On Fri, May 23, 2025 at 11:21:12AM -0700, Kuniyuki Iwashima wrote:
> Except for only one user, sctp_do_peeloff(), all sockets created
> by drivers and fs are not tied to userspace processes nor exposed
> via file descriptors.
>
> Let's use sock_create_kern() for such in-kernel use cases as CIFS
> client and NFS.
So if sock_create is now almost unused and the special case, should
it also be renamed to make that explicit and make people not accidentally
use it by default?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 1/7] socket: Un-export __sock_create().
2025-05-26 5:29 ` Christoph Hellwig
@ 2025-05-26 10:06 ` David Laight
2025-05-30 2:42 ` Kuniyuki Iwashima
1 sibling, 0 replies; 32+ messages in thread
From: David Laight @ 2025-05-26 10:06 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Willem de Bruijn, Simon Horman, Kuniyuki Iwashima,
Chuck Lever, Jeff Layton, Matthieu Baerts, Keith Busch,
Jens Axboe, Wenjia Zhang, Jan Karcher, Steve French, netdev,
mptcp, linux-nfs, linux-rdma, linux-nvme
On Mon, 26 May 2025 07:29:07 +0200
Christoph Hellwig <hch@lst.de> wrote:
> On Fri, May 23, 2025 at 11:21:07AM -0700, Kuniyuki Iwashima wrote:
> > Since commit eeb1bd5c40ed ("net: Add a struct net parameter to
> > sock_create_kern"), we no longer need to export __sock_create()
> > and can replace all non-core users with sock_create_kern().
> >
> > Let's convert them and un-export __sock_create().
>
> The changes looks good, but the commit log including subject line
> is rather confusing. What you do is to replace all uses of
> __sock_create with sock_create_kern, which works because
> sock_create_kern just calls __sock_create with the last argument set
> to 1 as those callers do it. This then allows marking __sock_create
> static because all outside users are gone.
>
> Please state that, i.e.
>
> Subect: use sock_create_kern insteadf of opencoding it
>
> Replace all callers of __sock_create that set the kernel argument to 1
> with sock_create_kern, which is the improve interface for that.
> Mark __sock_create static now that all users outside of socket.c
> are gone.
I'd also like to see an explicit statement on all these patches
about whether the created sockets hold a reference to the namespace.
I know it is documented in the function definitions, but the issue
has always been that the callers get it wrong.
From what I remember, as this point in the patch series sock_create_kern()
doesn't holds a reference, but by the end of the series it does.
That just has to be a recipe for disaster and pretty much requires the
changes all go through the same tree in one merge window.
But the code touches multiple areas and the changes would normally go through
multiple trees.
So it's going to be hard to get all the acks and the patch accepted.
(Unless you persuade Linus to 'just apply the changes'.
I think you need to look at three merge windows.
1) Add new function(s) for creating user/kernel sockets with/without holding
a namespace reference.
2) Update all the callers to use the new functions.
3) Delete the old functions.
There is no point modifying the callers twice, and the commits need to
explicitly state whether they want the namespace held or not.
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern().
2025-05-26 5:30 ` Christoph Hellwig
@ 2025-05-29 21:29 ` David Laight
2025-05-30 3:05 ` Kuniyuki Iwashima
2025-05-30 2:45 ` Kuniyuki Iwashima
1 sibling, 1 reply; 32+ messages in thread
From: David Laight @ 2025-05-29 21:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Willem de Bruijn, Simon Horman, Kuniyuki Iwashima,
Chuck Lever, Jeff Layton, Matthieu Baerts, Keith Busch,
Jens Axboe, Wenjia Zhang, Jan Karcher, Steve French, netdev,
mptcp, linux-nfs, linux-rdma, linux-nvme
On Mon, 26 May 2025 07:30:13 +0200
Christoph Hellwig <hch@lst.de> wrote:
> On Fri, May 23, 2025 at 11:21:08AM -0700, Kuniyuki Iwashima wrote:
> > Let's rename sock_create_kern() to __sock_create_kern() as a special
> > API and add a fat documentation.
> >
> > The next patch will add sock_create_kern() that holds netns refcnt.
>
> Maybe do this before patch 1 to reduce the churn of just touching a
> lot of the same callers again?
You also really want untouched source files to fail to compile.
If nothing else it'll stop backports going badly awry.
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 1/7] socket: Un-export __sock_create().
2025-05-26 5:29 ` Christoph Hellwig
2025-05-26 10:06 ` David Laight
@ 2025-05-30 2:42 ` Kuniyuki Iwashima
1 sibling, 0 replies; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-30 2:42 UTC (permalink / raw)
To: hch
Cc: axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton, kbusch,
kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 26 May 2025 07:29:07 +0200
> On Fri, May 23, 2025 at 11:21:07AM -0700, Kuniyuki Iwashima wrote:
> > Since commit eeb1bd5c40ed ("net: Add a struct net parameter to
> > sock_create_kern"), we no longer need to export __sock_create()
> > and can replace all non-core users with sock_create_kern().
> >
> > Let's convert them and un-export __sock_create().
>
> The changes looks good, but the commit log including subject line
> is rather confusing. What you do is to replace all uses of
> __sock_create with sock_create_kern, which works because
> sock_create_kern just calls __sock_create with the last argument set
> to 1 as those callers do it. This then allows marking __sock_create
> static because all outside users are gone.
>
> Please state that, i.e.
Will do so.
Thanks!
>
> Subect: use sock_create_kern insteadf of opencoding it
>
> Replace all callers of __sock_create that set the kernel argument to 1
> with sock_create_kern, which is the improve interface for that.
> Mark __sock_create static now that all users outside of socket.c
> are gone.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern().
2025-05-26 5:30 ` Christoph Hellwig
2025-05-29 21:29 ` David Laight
@ 2025-05-30 2:45 ` Kuniyuki Iwashima
1 sibling, 0 replies; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-30 2:45 UTC (permalink / raw)
To: hch
Cc: axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton, kbusch,
kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 26 May 2025 07:30:13 +0200
> On Fri, May 23, 2025 at 11:21:08AM -0700, Kuniyuki Iwashima wrote:
> > Let's rename sock_create_kern() to __sock_create_kern() as a special
> > API and add a fat documentation.
> >
> > The next patch will add sock_create_kern() that holds netns refcnt.
>
> Maybe do this before patch 1 to reduce the churn of just touching a
> lot of the same callers again?
Makes sense, will do.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 3/7] socket: Restore sock_create_kern().
2025-05-26 5:32 ` Christoph Hellwig
@ 2025-05-30 2:53 ` Kuniyuki Iwashima
2025-06-02 5:08 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-30 2:53 UTC (permalink / raw)
To: hch
Cc: axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton, kbusch,
kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 26 May 2025 07:32:27 +0200
> On Fri, May 23, 2025 at 11:21:09AM -0700, Kuniyuki Iwashima wrote:
> > Let's restore sock_create_kern() that holds a netns reference.
> >
> > Now, it's the same as the version before commit 26abe14379f8 ("net:
> > Modify sk_alloc to not reference count the netns of kernel sockets.").
> >
> > Back then, after creating a socket in init_net, we used sk_change_net()
> > to drop the netns ref and switch to another netns, but now we can
> > simply use __sock_create_kern() instead.
> >
> > $ git blame -L:sk_change_net include/net/sock.h 26abe14379f8~
> >
> > DEBUG_NET_WARN_ON_ONCE() is to catch a path calling sock_create_kern()
> > from __net_init functions, since doing so would leak the netns as
> > __net_exit functions cannot run until the socket is removed.
>
> Is reusing the name as the old sock_create_kern a good idea? It can
> lead to bugs by people used to the old semantics.
In the old days, sock_create_kern() did take a ref to netns,
but an implicit change that avoids taking the ref has caused
a lot of problems for people who used to the old semantics.
This series rather rolls back the change, so I think using
the same name here is better than leaving the catchy
sock_create_kern() error-prone.
> It's also
> not really an all that descriptive name for either variant. I'm
> not really a net stack or namespace expert, but maybe we can come
> up with more descriptive version for both this new sock_create_kern
> and the old sock_create_kern/__sock_create_kern?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/.
2025-05-26 5:33 ` Christoph Hellwig
@ 2025-05-30 2:59 ` Kuniyuki Iwashima
2025-06-02 5:08 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-30 2:59 UTC (permalink / raw)
To: hch
Cc: axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton, kbusch,
kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 26 May 2025 07:33:21 +0200
> On Fri, May 23, 2025 at 11:21:11AM -0700, Kuniyuki Iwashima wrote:
> > Let's drop the conversion and use sock_create_kern() instead.
>
> Please send a patch per subsystem that is converted to make the
> commit log better and help with bisectability.
Do you mean splitting this patch into per-subsystem patches
within the same series or sending non-netdev patches separately ?
The former is fine, but I think this change should be done in
the same series as the main goal of the series is the changes
in this patch, making kernel TCP sockets hold netns ref in a
single place.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-05-26 5:35 ` Christoph Hellwig
@ 2025-05-30 3:03 ` Kuniyuki Iwashima
2025-06-02 5:09 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-30 3:03 UTC (permalink / raw)
To: hch
Cc: axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton, kbusch,
kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 26 May 2025 07:35:55 +0200
> On Fri, May 23, 2025 at 11:21:12AM -0700, Kuniyuki Iwashima wrote:
> > Except for only one user, sctp_do_peeloff(), all sockets created
> > by drivers and fs are not tied to userspace processes nor exposed
> > via file descriptors.
> >
> > Let's use sock_create_kern() for such in-kernel use cases as CIFS
> > client and NFS.
>
> So if sock_create is now almost unused and the special case, should
> it also be renamed to make that explicit and make people not accidentally
> use it by default?
I actually tried to to do so as sock_create_user() in the
previous series but was advised to avoid rename as the benefit
against LoC was low.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern().
2025-05-29 21:29 ` David Laight
@ 2025-05-30 3:05 ` Kuniyuki Iwashima
2025-05-30 6:48 ` David Laight
0 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-30 3:05 UTC (permalink / raw)
To: david.laight.linux
Cc: axboe, chuck.lever, davem, edumazet, hch, horms, jaka, jlayton,
kbusch, kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: David Laight <david.laight.linux@gmail.com>
Date: Thu, 29 May 2025 22:29:11 +0100
> On Mon, 26 May 2025 07:30:13 +0200
> Christoph Hellwig <hch@lst.de> wrote:
>
> > On Fri, May 23, 2025 at 11:21:08AM -0700, Kuniyuki Iwashima wrote:
> > > Let's rename sock_create_kern() to __sock_create_kern() as a special
> > > API and add a fat documentation.
> > >
> > > The next patch will add sock_create_kern() that holds netns refcnt.
> >
> > Maybe do this before patch 1 to reduce the churn of just touching a
> > lot of the same callers again?
>
> You also really want untouched source files to fail to compile.
> If nothing else it'll stop backports going badly awry.
I didn't get what you wanted to say, but I remember the series
passed make all{yes,mod}config.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern().
2025-05-30 3:05 ` Kuniyuki Iwashima
@ 2025-05-30 6:48 ` David Laight
0 siblings, 0 replies; 32+ messages in thread
From: David Laight @ 2025-05-30 6:48 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: axboe, chuck.lever, davem, edumazet, hch, horms, jaka, jlayton,
kbusch, kuba, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
mptcp, netdev, pabeni, sfrench, wenjia, willemb
On Thu, 29 May 2025 20:05:32 -0700
Kuniyuki Iwashima <kuni1840@gmail.com> wrote:
> From: David Laight <david.laight.linux@gmail.com>
> Date: Thu, 29 May 2025 22:29:11 +0100
> > On Mon, 26 May 2025 07:30:13 +0200
> > Christoph Hellwig <hch@lst.de> wrote:
> >
> > > On Fri, May 23, 2025 at 11:21:08AM -0700, Kuniyuki Iwashima wrote:
> > > > Let's rename sock_create_kern() to __sock_create_kern() as a special
> > > > API and add a fat documentation.
> > > >
> > > > The next patch will add sock_create_kern() that holds netns refcnt.
> > >
> > > Maybe do this before patch 1 to reduce the churn of just touching a
> > > lot of the same callers again?
> >
> > You also really want untouched source files to fail to compile.
> > If nothing else it'll stop backports going badly awry.
>
> I didn't get what you wanted to say, but I remember the series
> passed make all{yes,mod}config.
One effect of the series seems to be changing sock_create_kern()
so that it 'holds' the network namespace.
Now if I backport one of the changed files to an old kernel version
it will still compile but won't work properly.
(Maybe you've removed the call where it acquired the 'hold'.)
So while the patch series bisects (assuming it all goes through
one tree - and it really needs to go through several) you are
relying on any backports picking up the changes.
(And also the changes to sock_create_kern() not being picked up
without all the other changes.)
Now backports ought to pick up the required dependant patches,
but it is much better to generate compile fails when patches
are missing.
Obscure run-time backport issues are annoying.
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 3/7] socket: Restore sock_create_kern().
2025-05-30 2:53 ` Kuniyuki Iwashima
@ 2025-06-02 5:08 ` Christoph Hellwig
2025-06-03 21:30 ` David Laight
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-06-02 5:08 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: hch, axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton,
kbusch, kuba, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
mptcp, netdev, pabeni, sfrench, wenjia, willemb
On Thu, May 29, 2025 at 07:53:41PM -0700, Kuniyuki Iwashima wrote:
> In the old days, sock_create_kern() did take a ref to netns,
> but an implicit change that avoids taking the ref has caused
> a lot of problems for people who used to the old semantics.
>
> This series rather rolls back the change, so I think using
> the same name here is better than leaving the catchy
> sock_create_kern() error-prone.
Ok.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/.
2025-05-30 2:59 ` Kuniyuki Iwashima
@ 2025-06-02 5:08 ` Christoph Hellwig
0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2025-06-02 5:08 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: hch, axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton,
kbusch, kuba, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
mptcp, netdev, pabeni, sfrench, wenjia, willemb
On Thu, May 29, 2025 at 07:59:33PM -0700, Kuniyuki Iwashima wrote:
> From: Christoph Hellwig <hch@lst.de>
> Date: Mon, 26 May 2025 07:33:21 +0200
> > On Fri, May 23, 2025 at 11:21:11AM -0700, Kuniyuki Iwashima wrote:
> > > Let's drop the conversion and use sock_create_kern() instead.
> >
> > Please send a patch per subsystem that is converted to make the
> > commit log better and help with bisectability.
>
> Do you mean splitting this patch into per-subsystem patches
> within the same series or sending non-netdev patches separately ?
Please send them in the same series for now. I think they can go
in together, but without splitting them out it's hard to even get
all the reuqired attention.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-05-30 3:03 ` Kuniyuki Iwashima
@ 2025-06-02 5:09 ` Christoph Hellwig
2025-06-02 21:52 ` Kuniyuki Iwashima
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-06-02 5:09 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: hch, axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton,
kbusch, kuba, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
mptcp, netdev, pabeni, sfrench, wenjia, willemb
On Thu, May 29, 2025 at 08:03:06PM -0700, Kuniyuki Iwashima wrote:
> I actually tried to to do so as sock_create_user() in the
> previous series but was advised to avoid rename as the benefit
> against LoC was low.
I can't really parse this. What is the 'benefit against LoC'?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-06-02 5:09 ` Christoph Hellwig
@ 2025-06-02 21:52 ` Kuniyuki Iwashima
2025-06-03 4:50 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-06-02 21:52 UTC (permalink / raw)
To: hch
Cc: axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton, kbusch,
kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 2 Jun 2025 07:09:49 +0200
> On Thu, May 29, 2025 at 08:03:06PM -0700, Kuniyuki Iwashima wrote:
> > I actually tried to to do so as sock_create_user() in the
> > previous series but was advised to avoid rename as the benefit
> > against LoC was low.
>
> I can't really parse this. What is the 'benefit against LoC'?
It was a kind of subjective opinion whether the amount of changes
was worth or not.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-06-02 21:52 ` Kuniyuki Iwashima
@ 2025-06-03 4:50 ` Christoph Hellwig
2025-06-04 18:20 ` Kuniyuki Iwashima
0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2025-06-03 4:50 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: hch, axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton,
kbusch, kuba, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
mptcp, netdev, pabeni, sfrench, wenjia, willemb
On Mon, Jun 02, 2025 at 02:52:47PM -0700, Kuniyuki Iwashima wrote:
> From: Christoph Hellwig <hch@lst.de>
> Date: Mon, 2 Jun 2025 07:09:49 +0200
> > On Thu, May 29, 2025 at 08:03:06PM -0700, Kuniyuki Iwashima wrote:
> > > I actually tried to to do so as sock_create_user() in the
> > > previous series but was advised to avoid rename as the benefit
> > > against LoC was low.
> >
> > I can't really parse this. What is the 'benefit against LoC'?
>
> It was a kind of subjective opinion whether the amount of changes
> was worth or not.
So the simple scripted renaming was not worth it. Maybe I misunderstand,
but based on the reading we should basically have about a handful
callers of the non-__kern variant left. Or is it a lot more?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 3/7] socket: Restore sock_create_kern().
2025-06-02 5:08 ` Christoph Hellwig
@ 2025-06-03 21:30 ` David Laight
2025-06-04 18:36 ` Kuniyuki Iwashima
0 siblings, 1 reply; 32+ messages in thread
From: David Laight @ 2025-06-03 21:30 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Kuniyuki Iwashima, axboe, chuck.lever, davem, edumazet, horms,
jaka, jlayton, kbusch, kuba, kuniyu, linux-nfs, linux-nvme,
linux-rdma, matttbe, mptcp, netdev, pabeni, sfrench, wenjia,
willemb
On Mon, 2 Jun 2025 07:08:17 +0200
Christoph Hellwig <hch@lst.de> wrote:
> On Thu, May 29, 2025 at 07:53:41PM -0700, Kuniyuki Iwashima wrote:
> > In the old days, sock_create_kern() did take a ref to netns,
> > but an implicit change that avoids taking the ref has caused
> > a lot of problems for people who used to the old semantics.
That must have been a long time ago.
Was it even long after the namespace code was added?
(I don't have a system with the git tree up at the moment)
> >
> > This series rather rolls back the change, so I think using
> > the same name here is better than leaving the catchy
> > sock_create_kern() error-prone.
>
> Ok.
Except that you are changing the semantics again.
So you end up with the same problem the other way around.
I can imagine code ending up with an extra reference to the ns.
The obvious name a a function for general driver use would be
kernel_socket() - matching the other functions that were added
when set_fs(KERNEL_DS) was removed.
I definitely aim to end up where the existing code fails to
compile - just to ensure all the code is found.
David
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-06-03 4:50 ` Christoph Hellwig
@ 2025-06-04 18:20 ` Kuniyuki Iwashima
2025-06-05 4:28 ` Christoph Hellwig
0 siblings, 1 reply; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-06-04 18:20 UTC (permalink / raw)
To: hch
Cc: axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton, kbusch,
kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 3 Jun 2025 06:50:21 +0200
> On Mon, Jun 02, 2025 at 02:52:47PM -0700, Kuniyuki Iwashima wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > Date: Mon, 2 Jun 2025 07:09:49 +0200
> > > On Thu, May 29, 2025 at 08:03:06PM -0700, Kuniyuki Iwashima wrote:
> > > > I actually tried to to do so as sock_create_user() in the
> > > > previous series but was advised to avoid rename as the benefit
> > > > against LoC was low.
> > >
> > > I can't really parse this. What is the 'benefit against LoC'?
> >
> > It was a kind of subjective opinion whether the amount of changes
> > was worth or not.
>
> So the simple scripted renaming was not worth it. Maybe I misunderstand,
> but based on the reading we should basically have about a handful
> callers of the non-__kern variant left. Or is it a lot more?
Yes, after this series, only 2 sock_create() left, one in sctp and
another in core.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 3/7] socket: Restore sock_create_kern().
2025-06-03 21:30 ` David Laight
@ 2025-06-04 18:36 ` Kuniyuki Iwashima
0 siblings, 0 replies; 32+ messages in thread
From: Kuniyuki Iwashima @ 2025-06-04 18:36 UTC (permalink / raw)
To: david.laight.linux
Cc: axboe, chuck.lever, davem, edumazet, hch, horms, jaka, jlayton,
kbusch, kuba, kuni1840, kuniyu, linux-nfs, linux-nvme, linux-rdma,
matttbe, mptcp, netdev, pabeni, sfrench, wenjia, willemb
From: David Laight <david.laight.linux@gmail.com>
Date: Tue, 3 Jun 2025 22:30:20 +0100
> On Mon, 2 Jun 2025 07:08:17 +0200
> Christoph Hellwig <hch@lst.de> wrote:
>
> > On Thu, May 29, 2025 at 07:53:41PM -0700, Kuniyuki Iwashima wrote:
> > > In the old days, sock_create_kern() did take a ref to netns,
> > > but an implicit change that avoids taking the ref has caused
> > > a lot of problems for people who used to the old semantics.
>
> That must have been a long time ago.
> Was it even long after the namespace code was added?
> (I don't have a system with the git tree up at the moment)
2007: 1b8d7ae42d02 ("[NET]: Make socket creation namespace safe.")
2015: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
It's been long since the implicit change, but it's only _recently_ that
people started to notice the issue thanks?/due to k8s use cases, e.g.
fs mounted in netns (ef7134c7fc48, 1be52169c348 + b013b817f32f, etc).
>
> > >
> > > This series rather rolls back the change, so I think using
> > > the same name here is better than leaving the catchy
> > > sock_create_kern() error-prone.
> >
> > Ok.
>
> Except that you are changing the semantics again.
> So you end up with the same problem the other way around.
> I can imagine code ending up with an extra reference to the ns.
I don't think so because it's rare case where we want to use
the no-refcnt version and it usually happens under net/ or
drivers/net.
Now we have SOCKET entry in MAINTAINERS so I can add sock_create
there so that we are always CCed to prevent such issues.
>
> The obvious name a a function for general driver use would be
> kernel_socket() - matching the other functions that were added
> when set_fs(KERNEL_DS) was removed.
kernel_socket() doesn't fit here as kernel_XXX() takes struct
socket, not struct sock.
>
> I definitely aim to end up where the existing code fails to
> compile - just to ensure all the code is found.
You can see the patch 2 renaming sock_create_kern() to __sock_create_kern()
does the job to find all users with the help of compilers.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern().
2025-06-04 18:20 ` Kuniyuki Iwashima
@ 2025-06-05 4:28 ` Christoph Hellwig
0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2025-06-05 4:28 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: hch, axboe, chuck.lever, davem, edumazet, horms, jaka, jlayton,
kbusch, kuba, kuniyu, linux-nfs, linux-nvme, linux-rdma, matttbe,
mptcp, netdev, pabeni, sfrench, wenjia, willemb
On Wed, Jun 04, 2025 at 11:20:17AM -0700, Kuniyuki Iwashima wrote:
> > So the simple scripted renaming was not worth it. Maybe I misunderstand,
> > but based on the reading we should basically have about a handful
> > callers of the non-__kern variant left. Or is it a lot more?
>
> Yes, after this series, only 2 sock_create() left, one in sctp and
> another in core.
Sounds easy enough to rename then, and doing so is useful go guide
people away from it.
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-06-05 4:28 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-23 18:21 [PATCH v2 net-next 0/7] socket: Make sock_create_kern() robust against misuse Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 1/7] socket: Un-export __sock_create() Kuniyuki Iwashima
2025-05-26 5:29 ` Christoph Hellwig
2025-05-26 10:06 ` David Laight
2025-05-30 2:42 ` Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 2/7] socket: Rename sock_create_kern() to __sock_create_kern() Kuniyuki Iwashima
2025-05-26 5:30 ` Christoph Hellwig
2025-05-29 21:29 ` David Laight
2025-05-30 3:05 ` Kuniyuki Iwashima
2025-05-30 6:48 ` David Laight
2025-05-30 2:45 ` Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 3/7] socket: Restore sock_create_kern() Kuniyuki Iwashima
2025-05-26 5:32 ` Christoph Hellwig
2025-05-30 2:53 ` Kuniyuki Iwashima
2025-06-02 5:08 ` Christoph Hellwig
2025-06-03 21:30 ` David Laight
2025-06-04 18:36 ` Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 4/7] smb: client: Add missing net_passive_dec() Kuniyuki Iwashima
2025-05-23 18:21 ` [PATCH v2 net-next 5/7] socket: Remove kernel socket conversion except for net/rds/ Kuniyuki Iwashima
2025-05-26 5:33 ` Christoph Hellwig
2025-05-30 2:59 ` Kuniyuki Iwashima
2025-06-02 5:08 ` Christoph Hellwig
2025-05-23 18:21 ` [PATCH v2 net-next 6/7] socket: Replace most sock_create() calls with sock_create_kern() Kuniyuki Iwashima
2025-05-26 5:33 ` Christoph Hellwig
2025-05-26 5:35 ` Christoph Hellwig
2025-05-30 3:03 ` Kuniyuki Iwashima
2025-06-02 5:09 ` Christoph Hellwig
2025-06-02 21:52 ` Kuniyuki Iwashima
2025-06-03 4:50 ` Christoph Hellwig
2025-06-04 18:20 ` Kuniyuki Iwashima
2025-06-05 4:28 ` Christoph Hellwig
2025-05-23 18:21 ` [PATCH v2 net-next 7/7] socket: Clean up kdoc for sock_create() and sock_create_lite() Kuniyuki Iwashima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).