public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
@ 2026-04-08 10:30 Breno Leitao
  2026-04-08 10:30 ` [PATCH net-next v3 1/4] net: add getsockopt_iter callback to proto_ops Breno Leitao
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Breno Leitao @ 2026-04-08 10:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev
  Cc: io-uring, bpf, netdev, Linus Torvalds, linux-kernel, kernel-team,
	Breno Leitao

Currently, the .getsockopt callback requires __user pointers:

  int (*getsockopt)(struct socket *sock, int level,
                    int optname, char __user *optval, int __user *optlen);

This prevents kernel callers (io_uring, BPF) from using getsockopt on
levels other than SOL_SOCKET, since they pass kernel pointers.

Following Linus' suggestion [0], this series introduces sockopt_t, a
type-safe wrapper around iov_iter, and a getsockopt_iter callback that
works with both user and kernel buffers. AF_PACKET and CAN raw are
converted as initial users, with selftests covering the trickiest
conversion patterns.

[0] https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/

Updates from v2 to v3:

* Use two iov in sockopt_t instead of a single one:
  a) .iter_in that is populated by the caller and will be read-only in
  the protocols callback.

  b) .iter_out will be populated by the protocol and it will be sent
  back to the caller.

  - This will avoid changing the protocol reset and changing the data
    source at the callback, making the driver callback implementation
    and converstion saner.

* created sockptr_to_sockopt() to convert sockptr to sockopt, making the
  call to getsockopt_iter straight-forward

Link: https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/ [0]
---
Changes in v3:
- Create Two iov in sockopt_t instead of a single one (Stanislav Fomichev)
- Implement the sockptr_to_sockopt() helper (Stanislav Fomichev)
- Link to v2: https://patch.msgid.link/20260401-getsockopt-v2-0-611df6771aff@debian.org

Changes in v2:
- Restore optlen even on error path (getsockopt_iter fails)
- Move af_packet.c and can instead of netlink (given these are the most
  complicate ones).
- Link to v1: https://patch.msgid.link/20260130-getsockopt-v1-0-9154fcff6f95@debian.org

---
Breno Leitao (4):
      net: add getsockopt_iter callback to proto_ops
      net: call getsockopt_iter if available
      af_packet: convert to getsockopt_iter
      can: raw: convert to getsockopt_iter

 include/linux/net.h    | 23 +++++++++++++++++++++
 net/can/raw.c          | 28 ++++++++++++--------------
 net/packet/af_packet.c | 15 +++++++-------
 net/socket.c           | 54 +++++++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 94 insertions(+), 26 deletions(-)
---
base-commit: 9c14d60a50c4b726a3613a02e8b74778e9964891
change-id: 20260130-getsockopt-9f36625eedcb

Best regards,
--  
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 1/4] net: add getsockopt_iter callback to proto_ops
  2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
@ 2026-04-08 10:30 ` Breno Leitao
  2026-04-08 10:30 ` [PATCH net-next v3 2/4] net: call getsockopt_iter if available Breno Leitao
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2026-04-08 10:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev
  Cc: io-uring, bpf, netdev, Linus Torvalds, linux-kernel, kernel-team,
	Breno Leitao

Add a new getsockopt_iter callback to struct proto_ops that uses
sockopt_t, a type-safe wrapper around iov_iter. This provides a clean
interface for socket option operations that works with both user and
kernel buffers.

The sockopt_t type encapsulates an iov_iter and an optlen field.

The optlen field, although not suggested by Linus, serves as both input
(buffer size) and output (returned data size), allowing callbacks to
return random values independent of the bytes written via
copy_to_iter(), so, keep it separated from iov_iter.count.

This is preparatory work for removing the SOL_SOCKET level restriction
from io_uring getsockopt operations.

Keep in mind that both iter_out and iter_in always point to the same
data at all times, and we just have two of them to make the callback
implementation sane.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/net.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/net.h b/include/linux/net.h
index a8e818de95b33..fdd48d5c94441 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -23,9 +23,30 @@
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/sockptr.h>
+#include <linux/uio.h>
 
 #include <uapi/linux/net.h>
 
+/**
+ * struct sockopt - socket option value container
+ * @iter_in: iov_iter for reading optval with the content from the caller.
+ *	     Use copy_from_iter() given this iov direction is ITER_SOURCE
+ * @iter_out: iov_iter for protocols to update optval data to userspace
+ *	      Use _copy_to_iter() given iov direction is ITER_DEST
+ * @optlen: serves as both input (buffer size) and output (returned data size).
+ *
+ * Type-safe wrapper for socket option data that works with both
+ * user and kernel buffers.
+ *
+ * The optlen field allows callbacks to return a specific length value
+ * independent of the bytes written via copy_to_iter().
+ */
+typedef struct sockopt {
+	struct iov_iter iter_in;
+	struct iov_iter iter_out;
+	int optlen;
+} sockopt_t;
+
 struct poll_table_struct;
 struct pipe_inode_info;
 struct inode;
@@ -192,6 +213,8 @@ struct proto_ops {
 				      unsigned int optlen);
 	int		(*getsockopt)(struct socket *sock, int level,
 				      int optname, char __user *optval, int __user *optlen);
+	int		(*getsockopt_iter)(struct socket *sock, int level,
+					   int optname, sockopt_t *opt);
 	void		(*show_fdinfo)(struct seq_file *m, struct socket *sock);
 	int		(*sendmsg)   (struct socket *sock, struct msghdr *m,
 				      size_t total_len);

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 2/4] net: call getsockopt_iter if available
  2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
  2026-04-08 10:30 ` [PATCH net-next v3 1/4] net: add getsockopt_iter callback to proto_ops Breno Leitao
@ 2026-04-08 10:30 ` Breno Leitao
  2026-04-08 10:30 ` [PATCH net-next v3 3/4] af_packet: convert to getsockopt_iter Breno Leitao
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2026-04-08 10:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev
  Cc: io-uring, bpf, netdev, Linus Torvalds, linux-kernel, kernel-team,
	Breno Leitao

Update do_sock_getsockopt() to use the new getsockopt_iter callback
when available. Add do_sock_getsockopt_iter() helper that:

1. Reads optlen from user/kernel space
2. Initializes a sockopt_t with the appropriate iov_iter (kvec for
   kernel, ubuf for user buffers) and sets opt.optlen
3. Calls the protocol's getsockopt_iter callback
4. Writes opt.optlen back to user/kernel space

The optlen is always written back, even on failure. Some protocols
(e.g. CAN raw) return -ERANGE and set optlen to the required buffer
size so userspace knows how much to allocate.

The callback is responsible for setting opt.optlen to indicate the
returned data size.

Important to say that  iov_out does not need to be copied back in
do_sock_getsockopt().

When optval is not kernel (the userspace path), sockptr_to_sockopt()
sets up opt->iter_out as a ITER_DEST ubuf iterator pointing directly at
the userspace buffer (optval.user). So when getsockopt_iter
implementations call copy_to_iter(..., &opt->iter_out), the data is
written directly to userspace — no intermediate kernel buffer is
involved.

When optval.is_kernel is true (the in-kernel path, e.g. from io_uring),
the kvec points at the already-provided kernel buffer (optval.kernel),
so the data lands in the caller's buffer directly via the kvec-backed
iterator.

In both cases the iterator writes to the final destination in-place at
protocol callback. There's nothing to copy back — only optlen needs to
be written back.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/socket.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index ade2ff5845a0c..a25e513cf0f47 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -77,6 +77,7 @@
 #include <linux/mount.h>
 #include <linux/pseudo_fs.h>
 #include <linux/security.h>
+#include <linux/uio.h>
 #include <linux/syscalls.h>
 #include <linux/compat.h>
 #include <linux/kmod.h>
@@ -2349,11 +2350,45 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
 INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
 							 int optname));
 
+/*
+ * Initialize a sockopt_t from sockptr optval/optlen, setting up iov_iter
+ * for both input and output directions.
+ * It is important to remember that both iov points to the same data, but,
+ * .iter_in is read-only and .iter_out is write-only by the protocol callbacks
+ */
+static int sockptr_to_sockopt(sockopt_t *opt, sockptr_t optval,
+			      sockptr_t optlen, struct kvec *kvec)
+{
+	int koptlen;
+
+	if (copy_from_sockptr(&koptlen, optlen, sizeof(int)))
+		return -EFAULT;
+
+	if (koptlen < 0)
+		return -EINVAL;
+
+	if (optval.is_kernel) {
+		kvec->iov_base = optval.kernel;
+		kvec->iov_len = koptlen;
+		iov_iter_kvec(&opt->iter_out, ITER_DEST, kvec, 1, koptlen);
+		iov_iter_kvec(&opt->iter_in, ITER_SOURCE, kvec, 1, koptlen);
+	} else {
+		iov_iter_ubuf(&opt->iter_out, ITER_DEST, optval.user, koptlen);
+		iov_iter_ubuf(&opt->iter_in, ITER_SOURCE, optval.user,
+			      koptlen);
+	}
+	opt->optlen = koptlen;
+
+	return 0;
+}
+
 int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 		       int optname, sockptr_t optval, sockptr_t optlen)
 {
 	int max_optlen __maybe_unused = 0;
 	const struct proto_ops *ops;
+	struct kvec kvec;
+	sockopt_t opt;
 	int err;
 
 	err = security_socket_getsockopt(sock, level, optname);
@@ -2366,15 +2401,28 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 	ops = READ_ONCE(sock->ops);
 	if (level == SOL_SOCKET) {
 		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
-	} else if (unlikely(!ops->getsockopt)) {
-		err = -EOPNOTSUPP;
-	} else {
+	} else if (ops->getsockopt_iter) {
+		err = sockptr_to_sockopt(&opt, optval, optlen, &kvec);
+		if (err)
+			return err;
+
+		err = ops->getsockopt_iter(sock, level, optname, &opt);
+
+		/* Always write back optlen, even on failure. Some protocols
+		 * (e.g. CAN raw) return -ERANGE and set optlen to the
+		 * required buffer size so userspace can discover it.
+		 */
+		if (copy_to_sockptr(optlen, &opt.optlen, sizeof(int)))
+			return -EFAULT;
+	} else if (ops->getsockopt) {
 		if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
 			      "Invalid argument type"))
 			return -EOPNOTSUPP;
 
 		err = ops->getsockopt(sock, level, optname, optval.user,
 				      optlen.user);
+	} else {
+		err = -EOPNOTSUPP;
 	}
 
 	if (!compat)

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 3/4] af_packet: convert to getsockopt_iter
  2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
  2026-04-08 10:30 ` [PATCH net-next v3 1/4] net: add getsockopt_iter callback to proto_ops Breno Leitao
  2026-04-08 10:30 ` [PATCH net-next v3 2/4] net: call getsockopt_iter if available Breno Leitao
@ 2026-04-08 10:30 ` Breno Leitao
  2026-04-08 10:30 ` [PATCH net-next v3 4/4] can: raw: " Breno Leitao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2026-04-08 10:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev
  Cc: io-uring, bpf, netdev, Linus Torvalds, linux-kernel, kernel-team,
	Breno Leitao

Convert AF_PACKET's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()
- For PACKET_HDRLEN which reads from optval: use opt->iter_in with
  copy_from_iter() for the input read, then the common opt->iter_out
  copy_to_iter() epilogue handles the output

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/packet/af_packet.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bb2d88205e5a6..1da78b6ad3d5f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -49,6 +49,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/ethtool.h>
+#include <linux/uio.h>
 #include <linux/filter.h>
 #include <linux/types.h>
 #include <linux/mm.h>
@@ -4051,7 +4052,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval,
 }
 
 static int packet_getsockopt(struct socket *sock, int level, int optname,
-			     char __user *optval, int __user *optlen)
+			     sockopt_t *opt)
 {
 	int len;
 	int val, lv = sizeof(val);
@@ -4065,8 +4066,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 	if (level != SOL_PACKET)
 		return -ENOPROTOOPT;
 
-	if (get_user(len, optlen))
-		return -EFAULT;
+	len = opt->optlen;
 
 	if (len < 0)
 		return -EINVAL;
@@ -4115,7 +4115,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 			len = sizeof(int);
 		if (len < sizeof(int))
 			return -EINVAL;
-		if (copy_from_user(&val, optval, len))
+		if (copy_from_iter(&val, len, &opt->iter_in) != len)
 			return -EFAULT;
 		switch (val) {
 		case TPACKET_V1:
@@ -4171,9 +4171,8 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 
 	if (len > lv)
 		len = lv;
-	if (put_user(len, optlen))
-		return -EFAULT;
-	if (copy_to_user(optval, data, len))
+	opt->optlen = len;
+	if (copy_to_iter(data, len, &opt->iter_out) != len)
 		return -EFAULT;
 	return 0;
 }
@@ -4672,7 +4671,7 @@ static const struct proto_ops packet_ops = {
 	.listen =	sock_no_listen,
 	.shutdown =	sock_no_shutdown,
 	.setsockopt =	packet_setsockopt,
-	.getsockopt =	packet_getsockopt,
+	.getsockopt_iter =	packet_getsockopt,
 	.sendmsg =	packet_sendmsg,
 	.recvmsg =	packet_recvmsg,
 	.mmap =		packet_mmap,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v3 4/4] can: raw: convert to getsockopt_iter
  2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
                   ` (2 preceding siblings ...)
  2026-04-08 10:30 ` [PATCH net-next v3 3/4] af_packet: convert to getsockopt_iter Breno Leitao
@ 2026-04-08 10:30 ` Breno Leitao
  2026-04-08 11:26 ` [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers David Laight
  2026-04-08 17:02 ` Stanislav Fomichev
  5 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2026-04-08 10:30 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev
  Cc: io-uring, bpf, netdev, Linus Torvalds, linux-kernel, kernel-team,
	Breno Leitao

Convert CAN raw socket's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of copy_to_user()
- For CAN_RAW_FILTER and CAN_RAW_XL_VCID_OPTS: on -ERANGE, set
  opt->optlen to the required buffer size. The wrapper writes this
  back to userspace even on error, preserving the existing API that
  lets userspace discover the needed allocation size.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/can/raw.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/net/can/raw.c b/net/can/raw.c
index eee244ffc31ec..6f9ef867a13f2 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -760,7 +760,7 @@ static int raw_setsockopt(struct socket *sock, int level, int optname,
 }
 
 static int raw_getsockopt(struct socket *sock, int level, int optname,
-			  char __user *optval, int __user *optlen)
+			  sockopt_t *opt)
 {
 	struct sock *sk = sock->sk;
 	struct raw_sock *ro = raw_sk(sk);
@@ -770,8 +770,7 @@ static int raw_getsockopt(struct socket *sock, int level, int optname,
 
 	if (level != SOL_CAN_RAW)
 		return -EINVAL;
-	if (get_user(len, optlen))
-		return -EFAULT;
+	len = opt->optlen;
 	if (len < 0)
 		return -EINVAL;
 
@@ -787,12 +786,12 @@ static int raw_getsockopt(struct socket *sock, int level, int optname,
 			if (len < fsize) {
 				/* return -ERANGE and needed space in optlen */
 				err = -ERANGE;
-				if (put_user(fsize, optlen))
-					err = -EFAULT;
+				opt->optlen = fsize;
 			} else {
 				if (len > fsize)
 					len = fsize;
-				if (copy_to_user(optval, ro->filter, len))
+				if (copy_to_iter(ro->filter, len,
+						 &opt->iter_out) != len)
 					err = -EFAULT;
 			}
 		} else {
@@ -801,7 +800,7 @@ static int raw_getsockopt(struct socket *sock, int level, int optname,
 		release_sock(sk);
 
 		if (!err)
-			err = put_user(len, optlen);
+			opt->optlen = len;
 		return err;
 	}
 	case CAN_RAW_ERR_FILTER:
@@ -845,16 +844,16 @@ static int raw_getsockopt(struct socket *sock, int level, int optname,
 		if (len < sizeof(ro->raw_vcid_opts)) {
 			/* return -ERANGE and needed space in optlen */
 			err = -ERANGE;
-			if (put_user(sizeof(ro->raw_vcid_opts), optlen))
-				err = -EFAULT;
+			opt->optlen = sizeof(ro->raw_vcid_opts);
 		} else {
 			if (len > sizeof(ro->raw_vcid_opts))
 				len = sizeof(ro->raw_vcid_opts);
-			if (copy_to_user(optval, &ro->raw_vcid_opts, len))
+			if (copy_to_iter(&ro->raw_vcid_opts, len,
+					 &opt->iter_out) != len)
 				err = -EFAULT;
 		}
 		if (!err)
-			err = put_user(len, optlen);
+			opt->optlen = len;
 		return err;
 	}
 	case CAN_RAW_JOIN_FILTERS:
@@ -868,9 +867,8 @@ static int raw_getsockopt(struct socket *sock, int level, int optname,
 		return -ENOPROTOOPT;
 	}
 
-	if (put_user(len, optlen))
-		return -EFAULT;
-	if (copy_to_user(optval, val, len))
+	opt->optlen = len;
+	if (copy_to_iter(val, len, &opt->iter_out) != len)
 		return -EFAULT;
 	return 0;
 }
@@ -1077,7 +1075,7 @@ static const struct proto_ops raw_ops = {
 	.listen        = sock_no_listen,
 	.shutdown      = sock_no_shutdown,
 	.setsockopt    = raw_setsockopt,
-	.getsockopt    = raw_getsockopt,
+	.getsockopt_iter = raw_getsockopt,
 	.sendmsg       = raw_sendmsg,
 	.recvmsg       = raw_recvmsg,
 	.mmap          = sock_no_mmap,

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
  2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
                   ` (3 preceding siblings ...)
  2026-04-08 10:30 ` [PATCH net-next v3 4/4] can: raw: " Breno Leitao
@ 2026-04-08 11:26 ` David Laight
  2026-04-08 13:52   ` Breno Leitao
  2026-04-08 13:56   ` Stefan Metzmacher
  2026-04-08 17:02 ` Stanislav Fomichev
  5 siblings, 2 replies; 10+ messages in thread
From: David Laight @ 2026-04-08 11:26 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team

On Wed, 08 Apr 2026 03:30:28 -0700
Breno Leitao <leitao@debian.org> wrote:

> Currently, the .getsockopt callback requires __user pointers:
> 
>   int (*getsockopt)(struct socket *sock, int level,
>                     int optname, char __user *optval, int __user *optlen);
> 
> This prevents kernel callers (io_uring, BPF) from using getsockopt on
> levels other than SOL_SOCKET, since they pass kernel pointers.
> 
> Following Linus' suggestion [0], this series introduces sockopt_t, a
> type-safe wrapper around iov_iter, and a getsockopt_iter callback that
> works with both user and kernel buffers. AF_PACKET and CAN raw are
> converted as initial users, with selftests covering the trickiest
> conversion patterns.

What are you doing about the cases where 'optlen' is a complete lie?
IIRC there is one related to some form of async io where it is just
the length of the header, the actual buffer length depends on
data in the header.
This doesn't matter with the existing code for applications, when they
get it wrong they just crash.
But kernel users will need to pass the actual buffer length separately
from optlen.
It also affects any code that tries to cache the actual data and copy
it back to userspace in the syscall wrapper - which makes sense for
most short getsockopt.

(This is different from historic code where the length might be
assumed to be 4 regardless of what was passed.)

	David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
  2026-04-08 11:26 ` [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers David Laight
@ 2026-04-08 13:52   ` Breno Leitao
  2026-04-08 18:56     ` David Laight
  2026-04-08 13:56   ` Stefan Metzmacher
  1 sibling, 1 reply; 10+ messages in thread
From: Breno Leitao @ 2026-04-08 13:52 UTC (permalink / raw)
  To: David Laight
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team

Hello David,

On Wed, Apr 08, 2026 at 12:26:53PM +0100, David Laight wrote:
> On Wed, 08 Apr 2026 03:30:28 -0700
> Breno Leitao <leitao@debian.org> wrote:
>
> > Currently, the .getsockopt callback requires __user pointers:
> >
> >   int (*getsockopt)(struct socket *sock, int level,
> >                     int optname, char __user *optval, int __user *optlen);
> >
> > This prevents kernel callers (io_uring, BPF) from using getsockopt on
> > levels other than SOL_SOCKET, since they pass kernel pointers.
> >
> > Following Linus' suggestion [0], this series introduces sockopt_t, a
> > type-safe wrapper around iov_iter, and a getsockopt_iter callback that
> > works with both user and kernel buffers. AF_PACKET and CAN raw are
> > converted as initial users, with selftests covering the trickiest
> > conversion patterns.
>
> What are you doing about the cases where 'optlen' is a complete lie?

Is this incorrect optlen originating from userspace, and getting into
the .getsockopt callbacks?

> IIRC there is one related to some form of async io where it is just
> the length of the header, the actual buffer length depends on
> data in the header.

Could you point me to the relevant code so I can examine this case?

> This doesn't matter with the existing code for applications, when they
> get it wrong they just crash.

Is this crash being triggered by the protocol callbacks?

I tried searching for this but couldn't find it. I'd appreciate any
hints you could provide about this case.

Thanks
--breno

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
  2026-04-08 11:26 ` [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers David Laight
  2026-04-08 13:52   ` Breno Leitao
@ 2026-04-08 13:56   ` Stefan Metzmacher
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2026-04-08 13:56 UTC (permalink / raw)
  To: David Laight, Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team

Am 08.04.26 um 13:26 schrieb David Laight:
> On Wed, 08 Apr 2026 03:30:28 -0700
> Breno Leitao <leitao@debian.org> wrote:
> 
>> Currently, the .getsockopt callback requires __user pointers:
>>
>>    int (*getsockopt)(struct socket *sock, int level,
>>                      int optname, char __user *optval, int __user *optlen);
>>
>> This prevents kernel callers (io_uring, BPF) from using getsockopt on
>> levels other than SOL_SOCKET, since they pass kernel pointers.
>>
>> Following Linus' suggestion [0], this series introduces sockopt_t, a
>> type-safe wrapper around iov_iter, and a getsockopt_iter callback that
>> works with both user and kernel buffers. AF_PACKET and CAN raw are
>> converted as initial users, with selftests covering the trickiest
>> conversion patterns.
> 
> What are you doing about the cases where 'optlen' is a complete lie?
> IIRC there is one related to some form of async io where it is just
> the length of the header, the actual buffer length depends on
> data in the header.
> This doesn't matter with the existing code for applications, when they
> get it wrong they just crash.
> But kernel users will need to pass the actual buffer length separately
> from optlen.
> It also affects any code that tries to cache the actual data and copy
> it back to userspace in the syscall wrapper - which makes sense for
> most short getsockopt.
> 
> (This is different from historic code where the length might be
> assumed to be 4 regardless of what was passed.)

As the insane legacy cases can only happen for keeping
compatibility with existing userspace applications,
we could get the original optval and optlen __user pointers
out of sockopt_t again via something like:

char __user * __must_check sockopt_get_insame_legacy_optval(sockopt_t *sopt);
int __user * __must_check sockopt_get_insame_legacy_optlen(sockopt_t *sopt);

And for kernel callers they return NULL and the code should
turn that into -EINVAL or something similar.

Then legacy stuff can do what they need, but most things are
sane and able to be called via io_uring and in kernel users.

Unrelated to legacy stuff I think it should be an opt-in
(or at least opt-out) for the writeback of optlen.

metze

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
  2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
                   ` (4 preceding siblings ...)
  2026-04-08 11:26 ` [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers David Laight
@ 2026-04-08 17:02 ` Stanislav Fomichev
  5 siblings, 0 replies; 10+ messages in thread
From: Stanislav Fomichev @ 2026-04-08 17:02 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team

On 04/08, Breno Leitao wrote:
> Currently, the .getsockopt callback requires __user pointers:
> 
>   int (*getsockopt)(struct socket *sock, int level,
>                     int optname, char __user *optval, int __user *optlen);
> 
> This prevents kernel callers (io_uring, BPF) from using getsockopt on
> levels other than SOL_SOCKET, since they pass kernel pointers.
> 
> Following Linus' suggestion [0], this series introduces sockopt_t, a
> type-safe wrapper around iov_iter, and a getsockopt_iter callback that
> works with both user and kernel buffers. AF_PACKET and CAN raw are
> converted as initial users, with selftests covering the trickiest
> conversion patterns.
> 
> [0] https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/
> 
> Updates from v2 to v3:
> 
> * Use two iov in sockopt_t instead of a single one:
>   a) .iter_in that is populated by the caller and will be read-only in
>   the protocols callback.
> 
>   b) .iter_out will be populated by the protocol and it will be sent
>   back to the caller.
> 
>   - This will avoid changing the protocol reset and changing the data
>     source at the callback, making the driver callback implementation
>     and converstion saner.
> 
> * created sockptr_to_sockopt() to convert sockptr to sockopt, making the
>   call to getsockopt_iter straight-forward
> 
> Link: https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/ [0]
> ---
> Changes in v3:
> - Create Two iov in sockopt_t instead of a single one (Stanislav Fomichev)
> - Implement the sockptr_to_sockopt() helper (Stanislav Fomichev)
> - Link to v2: https://patch.msgid.link/20260401-getsockopt-v2-0-611df6771aff@debian.org
> 
> Changes in v2:
> - Restore optlen even on error path (getsockopt_iter fails)
> - Move af_packet.c and can instead of netlink (given these are the most
>   complicate ones).
> - Link to v1: https://patch.msgid.link/20260130-getsockopt-v1-0-9154fcff6f95@debian.org

LGTM! Not sure what's your plan for the selftest? You wanna keep it
outside or maybe repost v4 with it?

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

I'm also not sure your unconditional 'copy-optlen-back' will work for every
proto, but I think we can put something into sockopt_t to make it avoid
the copy if needed in the future.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
  2026-04-08 13:52   ` Breno Leitao
@ 2026-04-08 18:56     ` David Laight
  0 siblings, 0 replies; 10+ messages in thread
From: David Laight @ 2026-04-08 18:56 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team

On Wed, 8 Apr 2026 06:52:54 -0700
Breno Leitao <leitao@debian.org> wrote:

> Hello David,
> 
> On Wed, Apr 08, 2026 at 12:26:53PM +0100, David Laight wrote:
> > On Wed, 08 Apr 2026 03:30:28 -0700
> > Breno Leitao <leitao@debian.org> wrote:
> >  
> > > Currently, the .getsockopt callback requires __user pointers:
> > >
> > >   int (*getsockopt)(struct socket *sock, int level,
> > >                     int optname, char __user *optval, int __user *optlen);
> > >
> > > This prevents kernel callers (io_uring, BPF) from using getsockopt on
> > > levels other than SOL_SOCKET, since they pass kernel pointers.
> > >
> > > Following Linus' suggestion [0], this series introduces sockopt_t, a
> > > type-safe wrapper around iov_iter, and a getsockopt_iter callback that
> > > works with both user and kernel buffers. AF_PACKET and CAN raw are
> > > converted as initial users, with selftests covering the trickiest
> > > conversion patterns.  
> >
> > What are you doing about the cases where 'optlen' is a complete lie?  
> 
> Is this incorrect optlen originating from userspace, and getting into
> the .getsockopt callbacks?

Look at tcp_ao_copy_mptks_to_user() in net/ipv4/tcp_ao.c
This isn't 'old code' it was added in 2023.

Basically what is being transferred is an array and 'optlen' is the
size of one element.
The number of elements is in the first one.

Yes, it is completely broken.

There was also some very old code that just didn't check the length
(probably only for 'int' sized parameters).
That might all have disappeared when decnet support was removed.

There was also a very longstanding bug that pretty much all the IP
protocols would treat negative lengths as 4.
That got 'fixed' not long ago, I do wonder how many applications that
broke! Passing an uninitialised on-stack variable would have worked
(for 'int' parameters) provided it wasn't in [0..3].
Even then there is code that will copy 1 byte (instead of 4) when
a short length is passed - but it only does something sensible on LE.

I've been though all this code trying to replace the 'int *optlen'
with 'unsigned int optlen' and then returning the updated length
(or -ERRNO) to the wrapper.
That simplifies 99% of the code.
However there are a very small number of places that want to return
an error with a corrected length.
If you were starting from scratch you could say that returning a bigger
length would return a specific errno (maybe -ERANGE) and the updated
length - but there is no consistency.

I pretty much decided that the getsockopt() functions would have to
be able to return one of:
	-errno
	length
	GETSOCKOPT_RVAL(errno, lenght)
with the wrapper separating out the merged value.

	David


> 
> > IIRC there is one related to some form of async io where it is just
> > the length of the header, the actual buffer length depends on
> > data in the header.  
> 
> Could you point me to the relevant code so I can examine this case?
> 
> > This doesn't matter with the existing code for applications, when they
> > get it wrong they just crash.  
> 
> Is this crash being triggered by the protocol callbacks?
> 
> I tried searching for this but couldn't find it. I'd appreciate any
> hints you could provide about this case.
> 
> Thanks
> --breno


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-08 18:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 1/4] net: add getsockopt_iter callback to proto_ops Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 2/4] net: call getsockopt_iter if available Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 3/4] af_packet: convert to getsockopt_iter Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 4/4] can: raw: " Breno Leitao
2026-04-08 11:26 ` [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers David Laight
2026-04-08 13:52   ` Breno Leitao
2026-04-08 18:56     ` David Laight
2026-04-08 13:56   ` Stefan Metzmacher
2026-04-08 17:02 ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox