From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>,
Kuniyuki Iwashima <kuni1840@gmail.com>, <netdev@vger.kernel.org>
Subject: [PATCH v2 net-next 00/15] treewide: socket: Clean up sock_create() and friends.
Date: Tue, 10 Dec 2024 16:38:14 +0900 [thread overview]
Message-ID: <20241210073829.62520-1-kuniyu@amazon.com> (raw)
There are a bunch of weird usages of sock_create() and friends due
to poor documentation.
1) some subsystems use __sock_create(), but all of them can be
replaced with sock_create_kern()
2) some subsystems use sock_create(), but most of the sockets are
not tied to userspace processes nor exposed via file descriptors
but are (most likely unintentionally) exposed to some BPF hooks
(infiniband, ISDN, NVMe over TCP, iscsi, Xen PV call, ocfs2, smbd)
3) some subsystems use sock_create_kern() and convert the sockets
to hold netns refcnt (cifs, mptcp, rds, smc, and sunrpc)
4) the sockets of 2) and 3) are counted in /proc/net/sockstat even
though they are untouchable from userspace
The primary goal is to sort out such confusion and provide enough
documentation for future developers to choose an appropriate API.
Regarding 3), we introduce a new API, sock_create_net(), that holds
a netns refcnt for kernel socket to remove the socket conversion to
avoid use-after-free triggered by TCP kernel socket after commit
26abe14379f8 ("net: Modify sk_alloc to not reference count the netns
of kernel sockets.").
Finally, we rename sock_create() and sock_create_kern() to
sock_create_user() and sock_create_net_noref(), respectively.
This intentionally breaks out-of-tree drivers to give the owners
a chance to choose an appropriate API.
Throughout the series, we follow the definition below:
userspace socket:
* created by sock_create_user()
* holds the reference count of the network namespace
* directly linked to a file descriptor
* currently all sockets created by sane sock_create() users
are tied to userspace process and exposed via file descriptors
* accessed via a file descriptor (and some BPF hooks except
for BPF LSM)
* counted in the first line of /proc/net/sockstat.
kernel socket
* created by sock_create_net() or sock_create_net_noref()
* the former holds the refcnt of netns, but the latter doesn't
* not directly exposed to userspace via a file descriptor nor BPF
except for BPF LSM
Note that __sock_create(kern=1) skips some LSMs (SELinux, AppArmor)
but not all; BPF LSM can enforce security regardless of the argument.
I didn't CC maintainers for mechanical changes as the CC list explodes.
Changes:
v2:
* Patch 8
* Fix build error for PF_IUCV
* Patch 12
* Collect Acked-by from MPTCP/RDS maintainers
v1: https://lore.kernel.org/netdev/20241206075504.24153-1-kuniyu@amazon.com/
Kuniyuki Iwashima (15):
socket: Un-export __sock_create().
socket: Pass hold_net flag to __sock_create().
smc: Pass kern to smc_sock_alloc().
socket: Pass hold_net to struct net_proto_family.create().
ppp: Pass hold_net to struct pppox_proto.create().
nfc: Pass hold_net to struct nfc_protocol.create().
socket: Add hold_net flag to struct proto_accept_arg.
socket: Pass hold_net to sk_alloc().
socket: Respect hold_net in sk_alloc().
socket: Don't count kernel sockets in /proc/net/sockstat.
socket: Introduce sock_create_net().
socket: Remove kernel socket conversion.
socket: Use sock_create_net() instead of sock_create().
socket: Rename sock_create() to sock_create_user().
socket: Rename sock_create_kern() to sock_create_net_noref().
crypto/af_alg.c | 7 +-
drivers/block/drbd/drbd_receiver.c | 12 +-
drivers/infiniband/hw/erdma/erdma_cm.c | 6 +-
drivers/infiniband/sw/rxe/rxe_qp.c | 2 +-
drivers/infiniband/sw/siw/siw_cm.c | 6 +-
drivers/isdn/mISDN/l1oip_core.c | 3 +-
drivers/isdn/mISDN/socket.c | 17 +-
drivers/net/ppp/pppoe.c | 5 +-
drivers/net/ppp/pppox.c | 4 +-
drivers/net/ppp/pptp.c | 5 +-
drivers/net/tap.c | 2 +-
drivers/net/tun.c | 2 +-
drivers/nvme/host/tcp.c | 5 +-
drivers/nvme/target/tcp.c | 5 +-
drivers/soc/qcom/qmi_interface.c | 4 +-
drivers/target/iscsi/iscsi_target_login.c | 7 +-
drivers/xen/pvcalls-back.c | 7 +-
drivers/xen/pvcalls-front.c | 3 +-
fs/afs/rxrpc.c | 3 +-
fs/dlm/lowcomms.c | 8 +-
fs/ocfs2/cluster/tcp.c | 10 +-
fs/smb/client/connect.c | 13 +-
fs/smb/server/transport_tcp.c | 7 +-
include/linux/if_pppox.h | 3 +-
include/linux/net.h | 11 +-
include/net/bluetooth/bluetooth.h | 3 +-
include/net/llc_conn.h | 2 +-
include/net/sctp/structs.h | 2 +-
include/net/sock.h | 3 +-
io_uring/net.c | 2 +
net/9p/trans_fd.c | 8 +-
net/appletalk/ddp.c | 4 +-
net/atm/common.c | 5 +-
net/atm/common.h | 3 +-
net/atm/pvc.c | 4 +-
net/atm/svc.c | 8 +-
net/ax25/af_ax25.c | 7 +-
net/bluetooth/af_bluetooth.c | 9 +-
net/bluetooth/bnep/sock.c | 5 +-
net/bluetooth/cmtp/sock.c | 4 +-
net/bluetooth/hci_sock.c | 4 +-
net/bluetooth/hidp/sock.c | 5 +-
net/bluetooth/iso.c | 11 +-
net/bluetooth/l2cap_sock.c | 14 +-
net/bluetooth/rfcomm/core.c | 3 +-
net/bluetooth/rfcomm/sock.c | 12 +-
net/bluetooth/sco.c | 11 +-
net/bpf/test_run.c | 2 +-
net/caif/caif_socket.c | 4 +-
net/can/af_can.c | 4 +-
net/ceph/messenger.c | 6 +-
net/core/sock.c | 26 ++--
net/handshake/handshake-test.c | 33 ++--
net/ieee802154/socket.c | 4 +-
net/ipv4/af_inet.c | 7 +-
net/ipv4/udp_tunnel_core.c | 2 +-
net/ipv6/af_inet6.c | 4 +-
net/ipv6/ip6_udp_tunnel.c | 4 +-
net/iucv/af_iucv.c | 13 +-
net/kcm/kcmsock.c | 6 +-
net/key/af_key.c | 4 +-
net/l2tp/l2tp_core.c | 8 +-
net/l2tp/l2tp_ppp.c | 6 +-
net/llc/af_llc.c | 6 +-
net/llc/llc_conn.c | 11 +-
net/mctp/af_mctp.c | 4 +-
net/mctp/test/route-test.c | 6 +-
net/mptcp/pm_netlink.c | 4 +-
net/mptcp/subflow.c | 12 +-
net/netfilter/ipvs/ip_vs_sync.c | 8 +-
net/netlink/af_netlink.c | 11 +-
net/netrom/af_netrom.c | 7 +-
net/nfc/af_nfc.c | 5 +-
net/nfc/llcp.h | 3 +-
net/nfc/llcp_core.c | 3 +-
net/nfc/llcp_sock.c | 10 +-
net/nfc/nfc.h | 3 +-
net/nfc/rawsock.c | 5 +-
net/packet/af_packet.c | 4 +-
net/phonet/af_phonet.c | 4 +-
net/phonet/pep.c | 2 +-
net/qrtr/af_qrtr.c | 4 +-
net/qrtr/ns.c | 6 +-
net/rds/af_rds.c | 4 +-
net/rds/tcp.c | 14 --
net/rds/tcp_connect.c | 15 +-
net/rds/tcp_listen.c | 17 +-
net/rose/af_rose.c | 11 +-
net/rxrpc/af_rxrpc.c | 4 +-
net/rxrpc/rxperf.c | 4 +-
net/sctp/ipv6.c | 7 +-
net/sctp/protocol.c | 7 +-
net/sctp/socket.c | 6 +-
net/smc/af_smc.c | 38 ++---
net/smc/smc_inet.c | 2 +-
net/socket.c | 145 +++++++++++++-----
net/sunrpc/clnt.c | 4 +-
net/sunrpc/svcsock.c | 12 +-
net/sunrpc/xprtsock.c | 16 +-
net/tipc/socket.c | 8 +-
net/tipc/topsrv.c | 4 +-
net/unix/af_unix.c | 17 +-
net/vmw_vsock/af_vsock.c | 10 +-
net/wireless/nl80211.c | 4 +-
net/x25/af_x25.c | 13 +-
net/xdp/xsk.c | 4 +-
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 4 +-
107 files changed, 508 insertions(+), 399 deletions(-)
--
2.39.5 (Apple Git-154)
next reply other threads:[~2024-12-10 7:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-10 7:38 Kuniyuki Iwashima [this message]
2024-12-10 7:38 ` [PATCH v2 net-next 01/15] socket: Un-export __sock_create() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 02/15] socket: Pass hold_net flag to __sock_create() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 03/15] smc: Pass kern to smc_sock_alloc() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 04/15] socket: Pass hold_net to struct net_proto_family.create() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 05/15] ppp: Pass hold_net to struct pppox_proto.create() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 06/15] nfc: Pass hold_net to struct nfc_protocol.create() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 07/15] socket: Add hold_net flag to struct proto_accept_arg Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 08/15] socket: Pass hold_net to sk_alloc() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 09/15] socket: Respect hold_net in sk_alloc() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 10/15] socket: Don't count kernel sockets in /proc/net/sockstat Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 11/15] socket: Introduce sock_create_net() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 12/15] socket: Remove kernel socket conversion Kuniyuki Iwashima
2024-12-11 2:20 ` Jakub Kicinski
2024-12-12 17:35 ` Allison Henderson
2024-12-13 8:28 ` Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 13/15] socket: Use sock_create_net() instead of sock_create() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 14/15] socket: Rename sock_create() to sock_create_user() Kuniyuki Iwashima
2024-12-10 7:38 ` [PATCH v2 net-next 15/15] socket: Rename sock_create_kern() to sock_create_net_noref() Kuniyuki Iwashima
2024-12-10 8:46 ` [PATCH v2 net-next 00/15] treewide: socket: Clean up sock_create() and friends Eric Dumazet
2024-12-10 9:47 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241210073829.62520-1-kuniyu@amazon.com \
--to=kuniyu@amazon.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).