[PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size
@ 2026-05-15  7:15 Yuyang Huang
  2026-05-15  7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Yuyang Huang @ 2026-05-15  7:15 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski,
	Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan,
	Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf,
	linux-kernel, linux-kselftest, netdev

The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but
several commands write outputs back to userspace unconditionally. If the
caller passes a short buffer, this can lead to out-of-bounds writes,
potentially overwriting adjacent userspace memory.

This series addresses this by introducing size-gating based on field type:

1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the
   user-provided buffer size is smaller than the minimum size required to
   cover these fields. This hardens the syscall entry point for several
   commands.
2) Optional fields (later revisions): Skip writeback if the user-provided
   buffer size is too small to cover them. This is applied to
   'query.revision' in BPF_PROG_QUERY.

The first patch implements the plumbing and enforcement in the kernel.
The second patch adds a selftest to verify the behavior.

Yuyang Huang (2):
  bpf: align syscall writeback behavior with caller-declared size
  selftests/bpf: Add verification for BPF_PROG_QUERY attr size
    boundaries

 drivers/net/netkit.c                          |  5 +-
 include/linux/bpf-cgroup.h                    |  5 +-
 include/linux/bpf_mprog.h                     |  4 +-
 include/net/netkit.h                          |  6 +-
 include/net/tcx.h                             |  5 +-
 kernel/bpf/cgroup.c                           | 13 +--
 kernel/bpf/mprog.c                            |  5 +-
 kernel/bpf/syscall.c                          | 34 ++++++--
 kernel/bpf/tcx.c                              |  5 +-
 .../selftests/bpf/prog_tests/bpf_attr_size.c  | 84 +++++++++++++++++++
 10 files changed, 141 insertions(+), 25 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c

-- 
2.54.0.563.g4f69b47b94-goog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-15  7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang
@ 2026-05-15  7:15 ` Yuyang Huang
  2026-05-15  8:14   ` bot+bpf-ci
  2026-05-15  7:15 ` [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries Yuyang Huang
  2026-05-18  3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov
  2 siblings, 1 reply; 17+ messages in thread
From: Yuyang Huang @ 2026-05-15  7:15 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski,
	Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan,
	Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf,
	linux-kernel, linux-kselftest, netdev, Maciej Żenczykowski,
	Lorenzo Colitti

The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but
several commands write outputs back to userspace unconditionally. Because
copy_to_user() does not fault on adjacent mapped memory, a short userspace
buffer results in out-of-bounds writes, potentially overwriting adjacent
userspace memory.

Address this by introducing two policies based on field type:

1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the
   buffer size does not cover them. This hardens the syscall front-gate
   for the following commands:
   - BPF_PROG_QUERY (min size: query.prog_cnt)
   - BPF_PROG_TEST_RUN (min size: test.duration)
   - BPF_*_GET_NEXT_ID (min size: next_id)
   - BPF_OBJ_GET_INFO_BY_FD (min size: info.info_len)
   - BPF_TASK_FD_QUERY (minimum size: task_fd_query.probe_addr)
   - BPF_MAP_*_BATCH (min size: batch.flags)

2) Optional fields (later revisions): Skip writeback if the buffer size
   does not cover the field. This is applied to BPF_PROG_QUERY's
   'query.revision'. Older userspace passing a smaller size (e.g., 40
   bytes) will have the write safely skipped. This size-gating pattern
   mirrors the existing precedent used for 'log_true_size' (verifier.c)
   and 'btf_log_true_size' (btf.c).

To support this, the user-declared 'size' is plumbed from __sys_bpf()
through the query dispatchers (cgroup, tcx, netkit) to the underlying
writeback helpers in cgroup.c and mprog.c.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com
---
 drivers/net/netkit.c       |  5 +++--
 include/linux/bpf-cgroup.h |  5 +++--
 include/linux/bpf_mprog.h  |  4 ++--
 include/net/netkit.h       |  6 ++++--
 include/net/tcx.h          |  5 +++--
 kernel/bpf/cgroup.c        | 13 +++++++------
 kernel/bpf/mprog.c         |  5 +++--
 kernel/bpf/syscall.c       | 34 +++++++++++++++++++++++++++++-----
 kernel/bpf/tcx.c           |  5 +++--
 9 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 5e2eecc3165d..680607d6e039 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -813,7 +813,8 @@ int netkit_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog)
 	return ret;
 }
 
-int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
+int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
+		      u32 uattr_size)
 {
 	struct net_device *dev;
 	int ret;
@@ -826,7 +827,7 @@ int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
 		ret = PTR_ERR(dev);
 		goto out;
 	}
-	ret = bpf_mprog_query(attr, uattr, netkit_entry_fetch(dev, false));
+	ret = bpf_mprog_query(attr, uattr, uattr_size, netkit_entry_fetch(dev, false));
 out:
 	rtnl_unlock();
 	return ret;
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index b2e79c2b41d5..4d0cc65976a1 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -421,7 +421,7 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr,
 			   enum bpf_prog_type ptype);
 int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 int cgroup_bpf_prog_query(const union bpf_attr *attr,
-			  union bpf_attr __user *uattr);
+			  union bpf_attr __user *uattr, u32 uattr_size);
 
 const struct bpf_func_proto *
 cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
@@ -452,7 +452,8 @@ static inline int cgroup_bpf_link_attach(const union bpf_attr *attr,
 }
 
 static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
-					union bpf_attr __user *uattr)
+					union bpf_attr __user *uattr,
+					u32 uattr_size)
 {
 	return -EINVAL;
 }
diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h
index 0b9f4caeeb0a..fa479ace854a 100644
--- a/include/linux/bpf_mprog.h
+++ b/include/linux/bpf_mprog.h
@@ -72,7 +72,7 @@
  *   // bpf_mprog user-side lock
  *   // fetch active @entry from attach location
  *   [...]
- *   ret = bpf_mprog_query(attr, uattr, entry);
+ *   ret = bpf_mprog_query(attr, uattr, uattr_size, entry);
  *   // bpf_mprog user-side unlock
  *
  *  Data/fast path:
@@ -329,7 +329,7 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry,
 		     u32 flags, u32 id_or_fd, u64 revision);
 
 int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
-		    struct bpf_mprog_entry *entry);
+		    u32 uattr_size, struct bpf_mprog_entry *entry);
 
 static inline bool bpf_mprog_supported(enum bpf_prog_type type)
 {
diff --git a/include/net/netkit.h b/include/net/netkit.h
index 9ec0163739f4..fe209d1f9a64 100644
--- a/include/net/netkit.h
+++ b/include/net/netkit.h
@@ -9,7 +9,8 @@
 int netkit_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 int netkit_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 int netkit_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog);
-int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr);
+int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
+		      u32 uattr_size);
 INDIRECT_CALLABLE_DECLARE(struct net_device *netkit_peer_dev(struct net_device *dev));
 #else
 static inline int netkit_prog_attach(const union bpf_attr *attr,
@@ -31,7 +32,8 @@ static inline int netkit_prog_detach(const union bpf_attr *attr,
 }
 
 static inline int netkit_prog_query(const union bpf_attr *attr,
-				    union bpf_attr __user *uattr)
+				    union bpf_attr __user *uattr,
+				    u32 uattr_size)
 {
 	return -EINVAL;
 }
diff --git a/include/net/tcx.h b/include/net/tcx.h
index 23a61af13547..610626b39676 100644
--- a/include/net/tcx.h
+++ b/include/net/tcx.h
@@ -166,7 +166,7 @@ int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog);
 void tcx_uninstall(struct net_device *dev, bool ingress);
 
 int tcx_prog_query(const union bpf_attr *attr,
-		   union bpf_attr __user *uattr);
+		   union bpf_attr __user *uattr, u32 uattr_size);
 
 static inline void dev_tcx_uninstall(struct net_device *dev)
 {
@@ -194,7 +194,8 @@ static inline int tcx_prog_detach(const union bpf_attr *attr,
 }
 
 static inline int tcx_prog_query(const union bpf_attr *attr,
-				 union bpf_attr __user *uattr)
+				 union bpf_attr __user *uattr,
+				 u32 uattr_size)
 {
 	return -EINVAL;
 }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 876f6a81a9b6..2c2bdaa86aa7 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1208,7 +1208,7 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 
 /* Must be called with cgroup_mutex held to avoid races. */
 static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
-			      union bpf_attr __user *uattr)
+			      union bpf_attr __user *uattr, u32 uattr_size)
 {
 	__u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
 	bool effective_query = attr->query.query_flags & BPF_F_QUERY_EFFECTIVE;
@@ -1259,7 +1259,8 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		return -EFAULT;
 	if (!effective_query && from_atype == to_atype)
 		revision = cgrp->bpf.revisions[from_atype];
-	if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
+	if (uattr_size >= offsetofend(union bpf_attr, query.revision) &&
+	    copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
 		return -EFAULT;
 	if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
 		/* return early if user requested only program count + flags */
@@ -1312,12 +1313,12 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 }
 
 static int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
-			    union bpf_attr __user *uattr)
+			    union bpf_attr __user *uattr, u32 uattr_size)
 {
 	int ret;
 
 	cgroup_lock();
-	ret = __cgroup_bpf_query(cgrp, attr, uattr);
+	ret = __cgroup_bpf_query(cgrp, attr, uattr, uattr_size);
 	cgroup_unlock();
 	return ret;
 }
@@ -1520,7 +1521,7 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 }
 
 int cgroup_bpf_prog_query(const union bpf_attr *attr,
-			  union bpf_attr __user *uattr)
+			  union bpf_attr __user *uattr, u32 uattr_size)
 {
 	struct cgroup *cgrp;
 	int ret;
@@ -1529,7 +1530,7 @@ int cgroup_bpf_prog_query(const union bpf_attr *attr,
 	if (IS_ERR(cgrp))
 		return PTR_ERR(cgrp);
 
-	ret = cgroup_bpf_query(cgrp, attr, uattr);
+	ret = cgroup_bpf_query(cgrp, attr, uattr, uattr_size);
 
 	cgroup_put(cgrp);
 	return ret;
diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c
index 1394168062e8..822d9c4c0db4 100644
--- a/kernel/bpf/mprog.c
+++ b/kernel/bpf/mprog.c
@@ -393,7 +393,7 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry,
 }
 
 int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
-		    struct bpf_mprog_entry *entry)
+		    u32 uattr_size, struct bpf_mprog_entry *entry)
 {
 	u32 __user *uprog_flags, *ulink_flags;
 	u32 __user *uprog_id, *ulink_id;
@@ -413,7 +413,8 @@ int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
 	}
 	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
 		return -EFAULT;
-	if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
+	if (uattr_size >= offsetofend(union bpf_attr, query.revision) &&
+	    copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
 		return -EFAULT;
 	if (copy_to_user(&uattr->query.count, &count, sizeof(count)))
 		return -EFAULT;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a3c0214ca934..a46b0510d9e2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4654,7 +4654,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 #define BPF_PROG_QUERY_LAST_FIELD query.revision
 
 static int bpf_prog_query(const union bpf_attr *attr,
-			  union bpf_attr __user *uattr)
+			  union bpf_attr __user *uattr, u32 uattr_size)
 {
 	if (!bpf_net_capable())
 		return -EPERM;
@@ -4693,7 +4693,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
 	case BPF_LSM_CGROUP:
-		return cgroup_bpf_prog_query(attr, uattr);
+		return cgroup_bpf_prog_query(attr, uattr, uattr_size);
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
@@ -4706,10 +4706,10 @@ static int bpf_prog_query(const union bpf_attr *attr,
 		return sock_map_bpf_prog_query(attr, uattr);
 	case BPF_TCX_INGRESS:
 	case BPF_TCX_EGRESS:
-		return tcx_prog_query(attr, uattr);
+		return tcx_prog_query(attr, uattr, uattr_size);
 	case BPF_NETKIT_PRIMARY:
 	case BPF_NETKIT_PEER:
-		return netkit_prog_query(attr, uattr);
+		return netkit_prog_query(attr, uattr, uattr_size);
 	default:
 		return -EINVAL;
 	}
@@ -6260,20 +6260,30 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
 		err = bpf_prog_detach(&attr);
 		break;
 	case BPF_PROG_QUERY:
-		err = bpf_prog_query(&attr, uattr.user);
+		if (size < offsetofend(union bpf_attr, query.prog_cnt))
+			return -EINVAL;
+		err = bpf_prog_query(&attr, uattr.user, size);
 		break;
 	case BPF_PROG_TEST_RUN:
+		if (size < offsetofend(union bpf_attr, test.duration))
+			return -EINVAL;
 		err = bpf_prog_test_run(&attr, uattr.user);
 		break;
 	case BPF_PROG_GET_NEXT_ID:
+		if (size < offsetofend(union bpf_attr, next_id))
+			return -EINVAL;
 		err = bpf_obj_get_next_id(&attr, uattr.user,
 					  &prog_idr, &prog_idr_lock);
 		break;
 	case BPF_MAP_GET_NEXT_ID:
+		if (size < offsetofend(union bpf_attr, next_id))
+			return -EINVAL;
 		err = bpf_obj_get_next_id(&attr, uattr.user,
 					  &map_idr, &map_idr_lock);
 		break;
 	case BPF_BTF_GET_NEXT_ID:
+		if (size < offsetofend(union bpf_attr, next_id))
+			return -EINVAL;
 		err = bpf_obj_get_next_id(&attr, uattr.user,
 					  &btf_idr, &btf_idr_lock);
 		break;
@@ -6284,6 +6294,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
 		err = bpf_map_get_fd_by_id(&attr);
 		break;
 	case BPF_OBJ_GET_INFO_BY_FD:
+		if (size < offsetofend(union bpf_attr, info.info_len))
+			return -EINVAL;
 		err = bpf_obj_get_info_by_fd(&attr, uattr.user);
 		break;
 	case BPF_RAW_TRACEPOINT_OPEN:
@@ -6296,22 +6308,32 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
 		err = bpf_btf_get_fd_by_id(&attr);
 		break;
 	case BPF_TASK_FD_QUERY:
+		if (size < offsetofend(union bpf_attr, task_fd_query.probe_addr))
+			return -EINVAL;
 		err = bpf_task_fd_query(&attr, uattr.user);
 		break;
 	case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
 		err = map_lookup_and_delete_elem(&attr);
 		break;
 	case BPF_MAP_LOOKUP_BATCH:
+		if (size < offsetofend(union bpf_attr, batch.flags))
+			return -EINVAL;
 		err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_LOOKUP_BATCH);
 		break;
 	case BPF_MAP_LOOKUP_AND_DELETE_BATCH:
+		if (size < offsetofend(union bpf_attr, batch.flags))
+			return -EINVAL;
 		err = bpf_map_do_batch(&attr, uattr.user,
 				       BPF_MAP_LOOKUP_AND_DELETE_BATCH);
 		break;
 	case BPF_MAP_UPDATE_BATCH:
+		if (size < offsetofend(union bpf_attr, batch.flags))
+			return -EINVAL;
 		err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_UPDATE_BATCH);
 		break;
 	case BPF_MAP_DELETE_BATCH:
+		if (size < offsetofend(union bpf_attr, batch.flags))
+			return -EINVAL;
 		err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_DELETE_BATCH);
 		break;
 	case BPF_LINK_CREATE:
@@ -6324,6 +6346,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
 		err = bpf_link_get_fd_by_id(&attr);
 		break;
 	case BPF_LINK_GET_NEXT_ID:
+		if (size < offsetofend(union bpf_attr, next_id))
+			return -EINVAL;
 		err = bpf_obj_get_next_id(&attr, uattr.user,
 					  &link_idr, &link_idr_lock);
 		break;
diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
index 02db0113b8e7..2a91f6075511 100644
--- a/kernel/bpf/tcx.c
+++ b/kernel/bpf/tcx.c
@@ -119,7 +119,8 @@ void tcx_uninstall(struct net_device *dev, bool ingress)
 		tcx_entry_free(entry);
 }
 
-int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
+int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
+		   u32 uattr_size)
 {
 	bool ingress = attr->query.attach_type == BPF_TCX_INGRESS;
 	struct net *net = current->nsproxy->net_ns;
@@ -132,7 +133,7 @@ int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
 		ret = -ENODEV;
 		goto out;
 	}
-	ret = bpf_mprog_query(attr, uattr, tcx_entry_fetch(dev, ingress));
+	ret = bpf_mprog_query(attr, uattr, uattr_size, tcx_entry_fetch(dev, ingress));
 out:
 	rtnl_unlock();
 	return ret;
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-15  7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang
@ 2026-05-15  8:14   ` bot+bpf-ci
  2026-05-21 11:40     ` Yuyang Huang
  0 siblings, 1 reply; 17+ messages in thread
From: bot+bpf-ci @ 2026-05-15  8:14 UTC (permalink / raw)
  To: yuyanghuang, yuyanghuang
  Cc: davem, ast, andrew+netdev, andrii, daniel, eddyz87, edumazet,
	kuba, jolsa, john.fastabend, memxor, martin.lau, razor, pabeni,
	shuah, horms, song, sdf, yonghong.song, bpf, linux-kernel,
	linux-kselftest, netdev, maze, lorenzo, ast, andrii, daniel,
	martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

>     bpf: align syscall writeback behavior with caller-declared size
>
>     The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but
>     several commands write outputs back to userspace unconditionally. Because
>     copy_to_user() does not fault on adjacent mapped memory, a short userspace
>     buffer results in out-of-bounds writes, potentially overwriting adjacent
>     userspace memory.
>
>     Address this by introducing two policies based on field type:
>
>     1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the
>        buffer size does not cover them. This hardens the syscall front-gate
>        for the following commands:
>        - BPF_PROG_QUERY (min size: query.prog_cnt)
>        - BPF_PROG_TEST_RUN (min size: test.duration)
>        - BPF_*_GET_NEXT_ID (min size: next_id)
>        - BPF_OBJ_GET_INFO_BY_FD (min size: info.info_len)
>        - BPF_TASK_FD_QUERY (minimum size: task_fd_query.probe_addr)
>        - BPF_MAP_*_BATCH (min size: batch.flags)
>
>     2) Optional fields (later revisions): Skip writeback if the buffer size
>        does not cover the field. This is applied to BPF_PROG_QUERY's
>        'query.revision'. Older userspace passing a smaller size (e.g., 40
>        bytes) will have the write safely skipped. This size-gating pattern
>        mirrors the existing precedent used for 'log_true_size' (verifier.c)
>        and 'btf_log_true_size' (btf.c).
>
>     To support this, the user-declared 'size' is plumbed from __sys_bpf()
>     through the query dispatchers (cgroup, tcx, netkit) to the underlying
>     writeback helpers in cgroup.c and mprog.c.
>
>     Cc: Maciej Żenczykowski <maze@google.com>
>     Cc: Lorenzo Colitti <lorenzo@google.com>
>     Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
>     Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com

This looks like a bug fix for an out-of-bounds write vulnerability.
Should this include a Fixes: tag to identify which commit introduced the
unconditional copy_to_user() calls for query.revision without checking
the user-provided buffer size?

  Fixes: 3fe213c040b3 ("adding ci files")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25905928331

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-15  8:14   ` bot+bpf-ci
@ 2026-05-21 11:40     ` Yuyang Huang
  2026-05-24 11:22       ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Yuyang Huang @ 2026-05-21 11:40 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: davem, ast, andrew+netdev, andrii, daniel, eddyz87, edumazet,
	kuba, jolsa, john.fastabend, memxor, martin.lau, razor, pabeni,
	shuah, horms, song, sdf, yonghong.song, bpf, linux-kernel,
	linux-kselftest, netdev, maze, lorenzo, martin.lau, clm,
	ihor.solodrai

>This looks like a bug fix for an out-of-bounds write vulnerability.
>Should this include a Fixes: tag to identify which commit introduced the
>unconditional copy_to_user() calls for query.revision without checking
>the user-provided buffer size?

 > Fixes: 3fe213c040b3 ("adding ci files")

I completely missed this checkpatch comment. I will add the Fixes tag
in v2 if there is a consensus that this patch is the right approach to
resolve the issue.


On Fri, May 15, 2026 at 5:14 PM <bot+bpf-ci@kernel.org> wrote:
>
> >     bpf: align syscall writeback behavior with caller-declared size
> >
> >     The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but
> >     several commands write outputs back to userspace unconditionally. Because
> >     copy_to_user() does not fault on adjacent mapped memory, a short userspace
> >     buffer results in out-of-bounds writes, potentially overwriting adjacent
> >     userspace memory.
> >
> >     Address this by introducing two policies based on field type:
> >
> >     1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the
> >        buffer size does not cover them. This hardens the syscall front-gate
> >        for the following commands:
> >        - BPF_PROG_QUERY (min size: query.prog_cnt)
> >        - BPF_PROG_TEST_RUN (min size: test.duration)
> >        - BPF_*_GET_NEXT_ID (min size: next_id)
> >        - BPF_OBJ_GET_INFO_BY_FD (min size: info.info_len)
> >        - BPF_TASK_FD_QUERY (minimum size: task_fd_query.probe_addr)
> >        - BPF_MAP_*_BATCH (min size: batch.flags)
> >
> >     2) Optional fields (later revisions): Skip writeback if the buffer size
> >        does not cover the field. This is applied to BPF_PROG_QUERY's
> >        'query.revision'. Older userspace passing a smaller size (e.g., 40
> >        bytes) will have the write safely skipped. This size-gating pattern
> >        mirrors the existing precedent used for 'log_true_size' (verifier.c)
> >        and 'btf_log_true_size' (btf.c).
> >
> >     To support this, the user-declared 'size' is plumbed from __sys_bpf()
> >     through the query dispatchers (cgroup, tcx, netkit) to the underlying
> >     writeback helpers in cgroup.c and mprog.c.
> >
> >     Cc: Maciej Żenczykowski <maze@google.com>
> >     Cc: Lorenzo Colitti <lorenzo@google.com>
> >     Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
> >     Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com
>
> This looks like a bug fix for an out-of-bounds write vulnerability.
> Should this include a Fixes: tag to identify which commit introduced the
> unconditional copy_to_user() calls for query.revision without checking
> the user-provided buffer size?
>
>   Fixes: 3fe213c040b3 ("adding ci files")
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25905928331

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-21 11:40     ` Yuyang Huang
@ 2026-05-24 11:22       ` Alexei Starovoitov
  2026-05-25  7:21         ` Yuyang Huang
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2026-05-24 11:22 UTC (permalink / raw)
  To: Yuyang Huang, Daniel Borkmann
  Cc: bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko,
	Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov,
	Paolo Abeni, Shuah Khan, Simon Horman, Song Liu,
	Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau,
	Chris Mason, Ihor Solodrai

On Thu, May 21, 2026 at 1:41 PM Yuyang Huang <yuyanghuang@google.com> wrote:
>
> >This looks like a bug fix for an out-of-bounds write vulnerability.
> >Should this include a Fixes: tag to identify which commit introduced the
> >unconditional copy_to_user() calls for query.revision without checking
> >the user-provided buffer size?
>
>  > Fixes: 3fe213c040b3 ("adding ci files")
>
> I completely missed this checkpatch comment. I will add the Fixes tag
> in v2 if there is a consensus that this patch is the right approach to
> resolve the issue.

AI is wrong.
The fixes tag should probably point to 053c8e1f235dc or a subsequent patch?

bpf_mprog_query() and tcx_prog_query() were only introduced there.
How come your userspace passes shorter uatter to query newer
BPF_TCX_EGRESS attachment?

Please provide more details.

I'm still not convinced that what you're saying:
"
For example, our Android net_test  suite uses this legacy 40-byte
Python ctypes struct layout:

  # legacy 40-byte layout (pre-revision field)
  BpfAttrProgQuery = cstruct.Struct(
    "bpf_attr_prog_query", "=IIIIQI4xQ",
    "target_fd attach_type query_flags attach_flags prog_ids_ptr
prog_cnt prog_attach_flags"
  )
  # Invoked via: syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40)
"
is a kernel bug.

Looks like user space is querying new attachment types
with old structure.
Potentially the check can only be around:
        case BPF_TCX_INGRESS:
        case BPF_TCX_EGRESS:
                return tcx_prog_query(attr, uattr);

to be nice to broken user space,
but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS
with a smaller uattr.

pw-bot: cr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-24 11:22       ` Alexei Starovoitov
@ 2026-05-25  7:21         ` Yuyang Huang
  2026-05-28  5:42           ` Leon Hwang
  2026-05-29  1:03           ` Alexei Starovoitov
  0 siblings, 2 replies; 17+ messages in thread
From: Yuyang Huang @ 2026-05-25  7:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn,
	Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa,
	John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau,
	Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman,
	Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau,
	Chris Mason, Ihor Solodrai

Thanks for the review comment. To better explain the problem, please
bear with me through this long email reply.


>AI is wrong.
>The fixes tag should probably point to 053c8e1f235dc or a subsequent patch?

Thanks for pointing it out. I think the correct Fixes tags should be
as follows, and I will split them individually in the v2 patches:

Fixes: 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs")
Fixes: 120933984460 ("bpf: Implement mprog API on top of existing cgroup progs")

>Please provide more details.
>I'm still not convinced that what you're saying... is a kernel bug.
>Looks like user space is querying new attachment types with old structure.

I created a small C program using BPF_PROG_QUERY to query
BPF_CGROUP_INET_INGRESS (not the BPF_TCX_EGRESS) to demonstrate the
full problem , hope it clarifies:

```c
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>

#define __NR_bpf 321
#define BPF_PROG_QUERY 16
#define BPF_CGROUP_INET_INGRESS 0
int main() {
    char buf[64];
    memset(buf, 0xAA, 64); // Set 0xAA signature
    memset(buf, 0, 40);    // Zero out legacy 40-byte struct
    *(int*)(buf + 0) = open("/sys/fs/cgroup", O_RDONLY);
    *(int*)(buf + 4) = BPF_CGROUP_INET_INGRESS; // Explicitly set attach_type

    // Call BPF_PROG_QUERY (16) with legacy size (40) on x86_64 (syscall 321)
    long ret = syscall(__NR_bpf, BPF_PROG_QUERY, buf, 40);
    printf("Syscall return value: %ld\n", ret);
    printf("Revision (offsets 56-63): %016llx\n", *(unsigned long
long*)(buf + 56));
    close(*(int*)(buf + 0)); // Clean up cgroup FD
    return 0;
}
```

Before the patch:
  Syscall return value: 0
  Revision (offsets 56-63): 0000000000000001   <-- Stack/Heap Corruption!

After the patch (Safe / revision writeback is gated):
  Syscall return value: 0
  Revision (offsets 56-63): aaaaaaaaaaaaaaaa   <-- Safe (Signature intact)

For this example, the regression lies in legacy CGROUP queries:
1. BPF_PROG_QUERY for cgroups is not a new feature; it has been in the
kernel since 2017.
2. There are existing legacy userspace applications and language
wrappers (like our android net-tests) written years ago with older
structure layouts (passing size = 40, ending at
query.prog_attach_flags) that query cgroups.
3. In June 2025, Commit 120933984460 backported "revision" support to
cgroup queries, introducing an unconditional writeback in
`__cgroup_bpf_query()` at offset 56:
```c
// In __cgroup_bpf_query() (kernel/bpf/cgroup.c) - Introduced in 120933984460:
if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
        return -EFAULT;
```

Now lets talk about BPF_TCX_EGRESS:

> bpf_mprog_query() and tcx_prog_query() were only introduced there.
> How come your userspace passes shorter uatter to query newer
> BPF_TCX_EGRESS attachment?

I understand your point, but two common cases exist where userspace
will legitimately query BPF_TCX_EGRESS with a 40-byte structure:

First, a generic BPF querying tool (assume it called `./bpf-query`)
compiled in 2020 (when the query buffer size was 40 bytes) might
accept the attach type dynamically from the command line:
  ./bpf-query --type 47 --fd 3  # 47 is BPF_TCX_EGRESS

The unmodified 2020 binary populates the struct dynamically and
invokes the syscall with size = 40:
```c
   ....
   attr.query.attach_type = 47; // BPF_TCX_EGRESS (passed dynamically
at runtime)
   syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40);
```

Without the size check, the unconditional writeback of revision at
offset 56 would result in an out-of-bounds write on the stack.

Second, in dynamic language environments (like Python), UAPI
structures are often declared manually. The developer might only
declare the subset of fields the application actually uses (40 bytes
in this case) to optimize memory overhead. Even with this subset
declaration, we believe the kernel should ideally respect the declared
size limit to prevent accidental out-of-bounds writes into adjacent
userspace memory.

> Looks like user space is querying new attachment types
> with old structure.
> Potentially the check can only be around:
>         case BPF_TCX_INGRESS:
>         case BPF_TCX_EGRESS:
>                 return tcx_prog_query(attr, uattr);
>
> to be nice to broken user space,
> but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS
> with a smaller uattr.

Add gating to the TCX path, will not be able to fix the legacy cgroup
query path. As shown above, the cgroup query path has the exact same
OOB writeback regression affecting legacy userspace. Since we must
plumb uattr_size to cgroup.c to resolve the cgroup regression safely,
applying the same consistent size-gating in bpf_mprog_query() seemed
like the most consistent and robust architectural path.

Feel free to let us know your thoughts.

Thanks,
Yuyang

On Sun, May 24, 2026 at 8:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, May 21, 2026 at 1:41 PM Yuyang Huang <yuyanghuang@google.com> wrote:
> >
> > >This looks like a bug fix for an out-of-bounds write vulnerability.
> > >Should this include a Fixes: tag to identify which commit introduced the
> > >unconditional copy_to_user() calls for query.revision without checking
> > >the user-provided buffer size?
> >
> >  > Fixes: 3fe213c040b3 ("adding ci files")
> >
> > I completely missed this checkpatch comment. I will add the Fixes tag
> > in v2 if there is a consensus that this patch is the right approach to
> > resolve the issue.
>
> AI is wrong.
> The fixes tag should probably point to 053c8e1f235dc or a subsequent patch?
>
> bpf_mprog_query() and tcx_prog_query() were only introduced there.
> How come your userspace passes shorter uatter to query newer
> BPF_TCX_EGRESS attachment?
>
> Please provide more details.
>
> I'm still not convinced that what you're saying:
> "
> For example, our Android net_test  suite uses this legacy 40-byte
> Python ctypes struct layout:
>
>   # legacy 40-byte layout (pre-revision field)
>   BpfAttrProgQuery = cstruct.Struct(
>     "bpf_attr_prog_query", "=IIIIQI4xQ",
>     "target_fd attach_type query_flags attach_flags prog_ids_ptr
> prog_cnt prog_attach_flags"
>   )
>   # Invoked via: syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40)
> "
> is a kernel bug.
>
> Looks like user space is querying new attachment types
> with old structure.
> Potentially the check can only be around:
>         case BPF_TCX_INGRESS:
>         case BPF_TCX_EGRESS:
>                 return tcx_prog_query(attr, uattr);
>
> to be nice to broken user space,
> but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS
> with a smaller uattr.
>
> pw-bot: cr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-25  7:21         ` Yuyang Huang
@ 2026-05-28  5:42           ` Leon Hwang
  2026-05-28 13:20             ` Yuyang Huang
  2026-05-29  1:03           ` Alexei Starovoitov
  1 sibling, 1 reply; 17+ messages in thread
From: Leon Hwang @ 2026-05-28  5:42 UTC (permalink / raw)
  To: Yuyang Huang, Alexei Starovoitov
  Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn,
	Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa,
	John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau,
	Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman,
	Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau,
	Chris Mason, Ihor Solodrai

On 25/5/26 15:21, Yuyang Huang wrote:
[...]
> 
> Feel free to let us know your thoughts.
> 
I believe this is a user space issue instead of a kernel bug.

I tried to use mmap() memory as uattr that got -EFAULT instead of crash.

[................] /* mmap() memory */
             ^ tail 40B as uattr
                    ^ 56B offset for copy_to_user()

Thanks,
Leon


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-28  5:42           ` Leon Hwang
@ 2026-05-28 13:20             ` Yuyang Huang
  2026-05-28 14:36               ` Leon Hwang
  0 siblings, 1 reply; 17+ messages in thread
From: Yuyang Huang @ 2026-05-28 13:20 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci,
	Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard,
	Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov,
	Paolo Abeni, Shuah Khan, Simon Horman, Song Liu,
	Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau,
	Chris Mason, Ihor Solodrai

On Thu, May 28, 2026 at 1:43 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> On 25/5/26 15:21, Yuyang Huang wrote:
> [...]
> >
> > Feel free to let us know your thoughts.
> >
> I believe this is a user space issue instead of a kernel bug.
>
> I tried to use mmap() memory as uattr that got -EFAULT instead of crash.
>
> [................] /* mmap() memory */
>              ^ tail 40B as uattr
>                     ^ 56B offset for copy_to_user()
>
> Thanks,
> Leon
>

Thanks for testing this!

There are some discussion in the original thread:
https://lore.kernel.org/all/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com/
as follows, which might answer your question

> > > If the uattr indeed has less than needed space, then for
> > >          if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
> > >                  return -EFAULT;
> > > the kernel will return -EFAULT to user space.
> > >
> > > Maybe userspace didn't handle the return code properly and causing
> > > user space corruption and segfaults. This shouldn't be a kernel issue.
> > > Maybe I missed something?
> >
> > That's not how that works at all.
> >
> > copy_to_user() will only fail and thus EFAULT will only be returned if
> > the memory area copy_to_user() is trying to copy into isn't
> > owned/mapped by the user (or perhaps is read-only protected, not sure
> > about this last one).
> >
> > Because memory is mapped in (at least) 4K pages, the memory after a
> > user buffer is almost always still valid memory.  It might be unused,
> > or it might be something on the stack - like a return address, or it
> > might be on the heap - metadata tracking, or a different memory
> > allocation perhaps entirely.

You might hit the same case as maze@ mentioned in the thread.

To trigger -EFAULT, you likely positioned `uattr` at the very end of a
mapped page immediately followed by a protected page

Could you share the test program you created so we can verify?

Please check the test program I shared earlier in the thread (where
uattr is stored on the stack); the BPF syscall returned 0, but stack
corruption occurred.

If you think my test program contains a bug, feel free to let me know.

Thanks,

Yuyang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-28 13:20             ` Yuyang Huang
@ 2026-05-28 14:36               ` Leon Hwang
  2026-05-28 15:08                 ` Lorenzo Colitti
  0 siblings, 1 reply; 17+ messages in thread
From: Leon Hwang @ 2026-05-28 14:36 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci,
	Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard,
	Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov,
	Paolo Abeni, Shuah Khan, Simon Horman, Song Liu,
	Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau,
	Chris Mason, Ihor Solodrai

On 2026/5/28 21:20, Yuyang Huang wrote:
> On Thu, May 28, 2026 at 1:43 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> On 25/5/26 15:21, Yuyang Huang wrote:
>> [...]
>>>
>>> Feel free to let us know your thoughts.
>>>
>> I believe this is a user space issue instead of a kernel bug.
>>
>> I tried to use mmap() memory as uattr that got -EFAULT instead of crash.
>>
>> [................] /* mmap() memory */
>>              ^ tail 40B as uattr
>>                     ^ 56B offset for copy_to_user()
>>
>> Thanks,
>> Leon
>>
> 
> Thanks for testing this!
> 
> There are some discussion in the original thread:
> https://lore.kernel.org/all/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com/
> as follows, which might answer your question
> 

It seems you haven't convinced Alexei in that thread.

>>>> If the uattr indeed has less than needed space, then for
>>>>          if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
>>>>                  return -EFAULT;
>>>> the kernel will return -EFAULT to user space.
>>>>
>>>> Maybe userspace didn't handle the return code properly and causing
>>>> user space corruption and segfaults. This shouldn't be a kernel issue.
>>>> Maybe I missed something?
>>>
>>> That's not how that works at all.
>>>
>>> copy_to_user() will only fail and thus EFAULT will only be returned if
>>> the memory area copy_to_user() is trying to copy into isn't
>>> owned/mapped by the user (or perhaps is read-only protected, not sure
>>> about this last one).
>>>
>>> Because memory is mapped in (at least) 4K pages, the memory after a
>>> user buffer is almost always still valid memory.  It might be unused,
>>> or it might be something on the stack - like a return address, or it
>>> might be on the heap - metadata tracking, or a different memory
>>> allocation perhaps entirely.
> 
> You might hit the same case as maze@ mentioned in the thread.
> 
> To trigger -EFAULT, you likely positioned `uattr` at the very end of a
> mapped page immediately followed by a protected page
> 
> Could you share the test program you created so we can verify?
> 

Attached below.

> Please check the test program I shared earlier in the thread (where
> uattr is stored on the stack); the BPF syscall returned 0, but stack
> corruption occurred.
> 

To avoid such stack corruption, you should reserve enough space for the
query, e.g., by extracting union bpf_attr from kernel BTF vmlinux.

Thanks,
Leon

> If you think my test program contains a bug, feel free to let me know.
> 
> Thanks,
> 
> Yuyang

---

Assisted-by: Copilot:gemini-3-1-pro-preview

// SPDX-License-Identifier: GPL-2.0
#include <uapi/linux/if_link.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/bpf.h>
#include <net/if.h>
#include <test_progs.h>

#define loopback 1

#include "test_tc_link.skel.h"
#include "tc_helpers.h"

#define SHORT_QUERY_SIZE	offsetofend(union bpf_attr,
query.prog_attach_flags)

/*
 * test_tail_uattr_out_of_mmap:
 *
 * Places uattr at the very tail of a 1-page anonymous mmap so that the
 * mandatory fields (target_ifindex..prog_attach_flags, 40 bytes) fit
inside the page
 * but query.revision (+56) falls in the unmapped page immediately after.
 *
 *   mmap layout (1 page, e.g. 4096 bytes):
 *
 *     [0 ............ page_size - SHORT_QUERY_SIZE - 1]  unused
 *     [page_size - 40 ................. page_size - 1 ]  uattr:
target_ifindex..prog_cnt
 *     [page_size ..................................   ]  UNMAPPED
 *               ^
 *               uattr + 56 (revision) lands here
 */
static void test_tail_uattr_out_of_mmap(void)
{
	long page_size = sysconf(_SC_PAGE_SIZE);
	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
	struct test_tc_link *skel;
	union bpf_attr *attr;
	void *mem, *tail;
	int fd, err;

	skel = test_tc_link__open_and_load();
	if (!ASSERT_OK_PTR(skel, "skel_load"))
		return;

	fd = bpf_program__fd(skel->progs.tc1);

	err = bpf_prog_attach_opts(fd, loopback, BPF_TCX_INGRESS, &opta);
	if (!ASSERT_OK(err, "prog_attach"))
		goto cleanup_skel;

	/*
	 * Allocate 2 contiguous pages then immediately unmap the second one.
	 * This guarantees the page following the first is unmapped, regardless
	 * of what the runtime placed there beforehand.
	 * Place uattr at the tail: last SHORT_QUERY_SIZE (40) bytes of page 1.
	 * uattr + 56 (revision) therefore lands 16 bytes past the page end,
	 * in the unmapped region.
	 */
	mem = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
		   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	if (!ASSERT_OK_PTR(mem, "mmap"))
		goto detach;
	munmap((char *)mem + page_size, page_size);

	tail = (char *)mem + page_size - SHORT_QUERY_SIZE;
	memset(tail, 0, SHORT_QUERY_SIZE);

	attr = (union bpf_attr *)tail;
	attr->query.target_ifindex = loopback;
	attr->query.attach_type = BPF_TCX_INGRESS;

	err = syscall(__NR_bpf, BPF_PROG_QUERY, tail, SHORT_QUERY_SIZE);
	ASSERT_OK(err, "syscall");
	ASSERT_OK(errno, "errno");
	ASSERT_EQ(attr->query.prog_cnt, 1, "prog_cnt_written");

	munmap(mem, page_size);	/* second page already unmapped above */
detach:
	bpf_prog_detach_opts(fd, loopback, BPF_TCX_INGRESS, &optd);
cleanup_skel:
	test_tc_link__destroy(skel);
}

void test_mmap_uattr_corruption(void)
{
	if (test__start_subtest("tail_uattr_out_of_mmap"))
		test_tail_uattr_out_of_mmap();
}


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-28 14:36               ` Leon Hwang
@ 2026-05-28 15:08                 ` Lorenzo Colitti
  2026-05-28 16:01                   ` Yuyang Huang
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Colitti @ 2026-05-28 15:08 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Yuyang Huang, Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci,
	Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard,
	Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov,
	Paolo Abeni, Shuah Khan, Simon Horman, Song Liu,
	Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Martin KaFai Lau, Chris Mason,
	Ihor Solodrai

On Thu, May 28, 2026 at 11:37 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> To avoid such stack corruption, you should reserve enough space for the
> query, e.g., by extracting union bpf_attr from kernel BTF vmlinux.

That seems unreasonable. There's already a size in the bpf syscall,
why can't the kernel respect that?

Also, the length of bpf_attr has increased over time as the kernel
adds more elements. Doesn't that mean that even if the userspace
program passes a sufficient size, a future kernel could start writing
more bytes and start overwriting memory? Breaking userspace on kernel
upgrades should never happen.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-28 15:08                 ` Lorenzo Colitti
@ 2026-05-28 16:01                   ` Yuyang Huang
  2026-05-28 16:18                     ` Yuyang Huang
  0 siblings, 1 reply; 17+ messages in thread
From: Yuyang Huang @ 2026-05-28 16:01 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: Leon Hwang, Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci,
	Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard,
	Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov,
	Paolo Abeni, Shuah Khan, Simon Horman, Song Liu,
	Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Martin KaFai Lau, Chris Mason,
	Ihor Solodrai

>It seems you haven't convinced Alexei in that thread.
>Attached below.

Thanks for the test program. It proves my hypothesis: you had to
manually unmap a page boundary to force that `-EFAULT`. The part that
is still under discussion is whether there is a kernel bug or not. I
think there is no question about the behavior of when the -EFAULT will
be triggered.

> To avoid such stack corruption, you should reserve enough space for the
> query, e.g., by extracting union bpf_attr from kernel BTF vmlinux.

I agree that your suggestion to extract the layout from BTF is good
practice when writing a new program from scratch. However, our focus
here is on maintaining backward compatibility for existing binaries.

I think the core question is: **is a 40-byte query a valid request or not?**

From my understanding, under the syscall ABI contract, passing a
40-byte buffer (at least for query BPF_CGROUP_INET_INGRESS which has
been stable and supported since 2017) is 100% valid and legal. **The
kernel must never break existing userspace on a kernel upgrade**.

One example of a syscall that maintains such backward compatibility is
openat2. The documentation notes it as follows:

EINVAL size was smaller than **any known version of struct open_how**.

If this bpf() syscall want to follow the same way to maintain the ABI
contract, your `mmap` test actually let kernel return -EINVAL to a
userspace program that passes a 40-byte buffer (declaring `size =
40`),  because the kernel unconditionally tries to write to offset 56,
which happens to be unmapped.

Are you arguing that the `bpf()` syscall does not need to follow the
ABI contract to maintain backward compatibility, and that it is
expected that newer kernels can break binaries compiled against older
kernels? If you believe that all binaries must be recompiled or
dynamically updated against the latest vmlinux BTF upon every kernel
upgrade, please be explicit about it.

Thanks,

Yuyang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-28 16:01                   ` Yuyang Huang
@ 2026-05-28 16:18                     ` Yuyang Huang
  0 siblings, 0 replies; 17+ messages in thread
From: Yuyang Huang @ 2026-05-28 16:18 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: Leon Hwang, Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci,
	Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard,
	Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend,
	Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov,
	Paolo Abeni, Shuah Khan, Simon Horman, Song Liu,
	Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Martin KaFai Lau, Chris Mason,
	Ihor Solodrai

>The fixes tag should probably point to 053c8e1f235dc or a subsequent patch?

>bpf_mprog_query() and tcx_prog_query() were only introduced there.
>How come your userspace passes shorter uatter to query newer
>BPF_TCX_EGRESS attachment?

btw, If there is a strong preference for the BPF_TCX_EGRESS type,
specifically, that userspace shouldn't send a size smaller than the
newly added structure when using the new attchment type, I can modify
my patch to limit the fix to the BPF_CGROUP_INET_INGRESS type only. I
do agree newly added BPF_TCX_EGRESS and the existing
BPF_CGROUP_INET_INGRESS worth a different discussion.

On Fri, May 29, 2026 at 12:01 AM Yuyang Huang <yuyanghuang@google.com> wrote:
>
> >It seems you haven't convinced Alexei in that thread.
> >Attached below.
>
> Thanks for the test program. It proves my hypothesis: you had to
> manually unmap a page boundary to force that `-EFAULT`. The part that
> is still under discussion is whether there is a kernel bug or not. I
> think there is no question about the behavior of when the -EFAULT will
> be triggered.
>
> > To avoid such stack corruption, you should reserve enough space for the
> > query, e.g., by extracting union bpf_attr from kernel BTF vmlinux.
>
> I agree that your suggestion to extract the layout from BTF is good
> practice when writing a new program from scratch. However, our focus
> here is on maintaining backward compatibility for existing binaries.
>
> I think the core question is: **is a 40-byte query a valid request or not?**
>
> From my understanding, under the syscall ABI contract, passing a
> 40-byte buffer (at least for query BPF_CGROUP_INET_INGRESS which has
> been stable and supported since 2017) is 100% valid and legal. **The
> kernel must never break existing userspace on a kernel upgrade**.
>
> One example of a syscall that maintains such backward compatibility is
> openat2. The documentation notes it as follows:
>
> EINVAL size was smaller than **any known version of struct open_how**.
>
> If this bpf() syscall want to follow the same way to maintain the ABI
> contract, your `mmap` test actually let kernel return -EINVAL to a
> userspace program that passes a 40-byte buffer (declaring `size =
> 40`),  because the kernel unconditionally tries to write to offset 56,
> which happens to be unmapped.
>
> Are you arguing that the `bpf()` syscall does not need to follow the
> ABI contract to maintain backward compatibility, and that it is
> expected that newer kernels can break binaries compiled against older
> kernels? If you believe that all binaries must be recompiled or
> dynamically updated against the latest vmlinux BTF upon every kernel
> upgrade, please be explicit about it.
>
> Thanks,
>
> Yuyang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-25  7:21         ` Yuyang Huang
  2026-05-28  5:42           ` Leon Hwang
@ 2026-05-29  1:03           ` Alexei Starovoitov
  2026-05-29  4:44             ` Yuyang Huang
  1 sibling, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2026-05-29  1:03 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn,
	Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa,
	John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau,
	Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman,
	Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau,
	Chris Mason, Ihor Solodrai

On Mon May 25, 2026 at 12:21 AM PDT, Yuyang Huang wrote:
>
> For this example, the regression lies in legacy CGROUP queries:
> 1. BPF_PROG_QUERY for cgroups is not a new feature; it has been in the
> kernel since 2017.
> 2. There are existing legacy userspace applications and language
> wrappers (like our android net-tests) written years ago with older
> structure layouts (passing size = 40, ending at
> query.prog_attach_flags) that query cgroups.
> 3. In June 2025, Commit 120933984460 backported "revision" support to
> cgroup queries, introducing an unconditional writeback in
> `__cgroup_bpf_query()` at offset 56:
> ```c
> // In __cgroup_bpf_query() (kernel/bpf/cgroup.c) - Introduced in 120933984460:
> if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
>         return -EFAULT;
> ```

Ok. This is fair. 120933984460 is indeed problematic.

> Now lets talk about BPF_TCX_EGRESS:
>
>> bpf_mprog_query() and tcx_prog_query() were only introduced there.
>> How come your userspace passes shorter uatter to query newer
>> BPF_TCX_EGRESS attachment?
>
> I understand your point, but two common cases exist where userspace
> will legitimately query BPF_TCX_EGRESS with a 40-byte structure:
>
> First, a generic BPF querying tool (assume it called `./bpf-query`)
> compiled in 2020 (when the query buffer size was 40 bytes) might
> accept the attach type dynamically from the command line:
>   ./bpf-query --type 47 --fd 3  # 47 is BPF_TCX_EGRESS

Don't buy this at all. This one is clearly a user space bug.

> Add gating to the TCX path, will not be able to fix the legacy cgroup
> query path. As shown above, the cgroup query path has the exact same
> OOB writeback regression affecting legacy userspace. Since we must
> plumb uattr_size to cgroup.c to resolve the cgroup regression safely,
> applying the same consistent size-gating in bpf_mprog_query() seemed
> like the most consistent and robust architectural path.

Not really. Your patch adds checks to what looks like "random" copy_to_user()
places. This thread is clear indication that it's not a "robust architectural path".

True robust path would be to wrap every copy_to_user() into another helper
that takes uattr start pointer and size, does extra check, and
returns something like ENOSPC (and not EFAULT),
but that would be a significant churn.

So I prefer a minimal patch that add single check in bpf_prog_query()
that checks that user space supplied buffer is large enough for attr->query.
If not -> EFAULT. That would be better than overwritting.
One can argue that partial copy_to_user() in __cgroup_bpf_query()
should still be allowed, but I'm against partial and
inconsistent queries.
query.revision might seem benign, but if we do it for all copy_to_user()
we better return ENOSPC to differentiate vs EFAULT to tell user space
to fix itself.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size
  2026-05-29  1:03           ` Alexei Starovoitov
@ 2026-05-29  4:44             ` Yuyang Huang
  0 siblings, 0 replies; 17+ messages in thread
From: Yuyang Huang @ 2026-05-29  4:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn,
	Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa,
	John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau,
	Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman,
	Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Network Development,
	Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau,
	Chris Mason, Ihor Solodrai, Todd Kjos, Carlos Llamas,
	Kalesh Singh

On Fri, May 29, 2026 at 9:03 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon May 25, 2026 at 12:21 AM PDT, Yuyang Huang wrote:

Thanks for the reply.

> Ok. This is fair. 120933984460 is indeed problematic.

I'm glad to hear we agree that 120933984460 is problematic.

> Not really. Your patch adds checks to what looks like "random" copy_to_user()
> places. This thread is clear indication that it's not a "robust architectural path".
>
> True robust path would be to wrap every copy_to_user() into another helper
> that takes uattr start pointer and size, does extra check, and
> returns something like ENOSPC (and not EFAULT),
> but that would be a significant churn.

I agree that the truly robust path would be a proper copy_to_user()
wrapper instead of doing the ad-hoc check in the current path, but
indeed, that will be a big change.

> So I prefer a minimal patch that add single check in bpf_prog_query()
> that checks that user space supplied buffer is large enough for attr->query.
> If not -> EFAULT. That would be better than overwritting.
> One can argue that partial copy_to_user() in __cgroup_bpf_query()
> should still be allowed, but I'm against partial and
> inconsistent queries.
> query.revision might seem benign, but if we do it for all copy_to_user()
> we better return ENOSPC to differentiate vs EFAULT to tell user space
> to fix itself.

Okay, I understand the preference is for a minimal patch to implement
the fix. Just to make sure I fully understand your suggestion:

Are you proposing that we add a single check at the entry gate in
`bpf_prog_query()` like this?

```c
static int bpf_prog_query(const union bpf_attr *attr,
                          union bpf_attr __user *uattr, u32 uattr_size)
{
        if (uattr_size < offsetofend(union bpf_attr, query.revision))
                return -EFAULT;
        ...
}
```

If we implement this, I want to confirm if you are okay with the
consequence for legacy cgroup queries (e.g.,
`BPF_CGROUP_INET_INGRESS`):

1. Before 120933984460: Legacy userspace compiled years ago with a
40-byte `bpf_attr` layout (ending at `prog_attach_flags`) regularly
queries cgroups passing `size = 40`, and it worked perfectly.
2. With this check: These unmodified legacy applications will now fail
with `-EFAULT` on newer kernels because they pass `size = 40` which is
less than 64. This means all user-space applicaion is expected to
"fix" their code when along with the kernel upgrade.

Is your preference to explicitly fail these legacy `size = 40` cgroup
queries (breaking backward compatibility for them) to avoid "partial
and inconsistent queries"?

If we want to keep these legacy queries working safely (without
failing them and without causing memory corruption), we cannot use a
single check in `bpf_prog_query()`. We would still be forced to plumb
`uattr_size` to `__cgroup_bpf_query()` so we can conditionally skip
the `revision` writeback when `size < 64`.

Would love to get your confirmation on which trade-off you prefer
before I send V2.

Thanks,
Yuyang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries
  2026-05-15  7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang
  2026-05-15  7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang
@ 2026-05-15  7:15 ` Yuyang Huang
  2026-05-18  3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov
  2 siblings, 0 replies; 17+ messages in thread
From: Yuyang Huang @ 2026-05-15  7:15 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski,
	Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan,
	Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf,
	linux-kernel, linux-kselftest, netdev, Maciej Żenczykowski,
	Lorenzo Colitti

Add a new selftest to verify that the BPF syscall (specifically
BPF_PROG_QUERY) correctly respects the caller-declared attribute
size boundaries:
- Optional output fields (like query.revision) are not written if the
  caller-declared size ends before them.
- Calls with a size below the mandatory minimum return -EINVAL.
- Full-size calls still receive the optional fields normally.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com

Tested with virtme-ng:
# ./test_progs -t bpf_attr_size
#17/1    bpf_attr_size/query_size_boundaries:OK
#17/2    bpf_attr_size/query_mandatory_too_short_einval:OK
#17      bpf_attr_size:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
---
 .../selftests/bpf/prog_tests/bpf_attr_size.c  | 84 +++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c b/tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c
new file mode 100644
index 000000000000..65fd717782de
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Google LLC */
+#include <linux/bpf.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <test_progs.h>
+#include "test_tc_link.skel.h"
+#include "tc_helpers.h"
+
+#define OLD_QUERY_SIZE		offsetofend(union bpf_attr, query.prog_cnt)
+#define FULL_QUERY_SIZE		offsetofend(union bpf_attr, query.revision)
+#define SHORT_QUERY_SIZE	offsetofend(union bpf_attr, query.attach_type)
+
+static void test_query_size_boundaries(void)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	struct test_tc_link *skel;
+	union bpf_attr attr;
+	int fd, err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		return;
+
+	fd = bpf_program__fd(skel->progs.tc1);
+
+	err = bpf_prog_attach_opts(fd, loopback, BPF_TCX_INGRESS, &opta);
+	if (!ASSERT_OK(err, "prog_attach"))
+		goto cleanup;
+
+	/* 1. Old size: revision must not be written */
+	memset(&attr, 0, sizeof(attr));
+	attr.query.target_ifindex = loopback;
+	attr.query.attach_type = BPF_TCX_INGRESS;
+
+	err = syscall(__NR_bpf, BPF_PROG_QUERY, &attr, OLD_QUERY_SIZE);
+	if (!ASSERT_OK(err, "query_old_size"))
+		goto detach;
+
+	ASSERT_EQ(attr.query.prog_cnt, 1, "prog_cnt_written");
+	ASSERT_EQ(attr.query.revision, 0, "revision_not_written");
+
+	/* 2. Full size: revision must be written normally */
+	memset(&attr, 0, sizeof(attr));
+	attr.query.target_ifindex = loopback;
+	attr.query.attach_type = BPF_TCX_INGRESS;
+
+	err = syscall(__NR_bpf, BPF_PROG_QUERY, &attr, FULL_QUERY_SIZE);
+	if (!ASSERT_OK(err, "query_full_size"))
+		goto detach;
+
+	ASSERT_EQ(attr.query.prog_cnt, 1, "prog_cnt_written");
+	ASSERT_GT(attr.query.revision, 0, "revision_written");
+
+detach:
+	err = bpf_prog_detach_opts(fd, loopback, BPF_TCX_INGRESS, &optd);
+	ASSERT_OK(err, "prog_detach");
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+static void test_query_mandatory_too_short_einval(void)
+{
+	union bpf_attr attr;
+	int err;
+
+	/* Below minimum size: must return -EINVAL */
+	memset(&attr, 0, sizeof(attr));
+	attr.query.target_ifindex = loopback;
+	attr.query.attach_type = BPF_TCX_INGRESS;
+
+	err = syscall(__NR_bpf, BPF_PROG_QUERY, &attr, SHORT_QUERY_SIZE);
+	ASSERT_EQ(err, -1, "query_too_short_fails");
+	ASSERT_EQ(errno, EINVAL, "query_too_short_einval");
+}
+
+void test_bpf_attr_size(void)
+{
+	if (test__start_subtest("query_size_boundaries"))
+		test_query_size_boundaries();
+	if (test__start_subtest("query_mandatory_too_short_einval"))
+		test_query_mandatory_too_short_einval();
+}
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size
  2026-05-15  7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang
  2026-05-15  7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang
  2026-05-15  7:15 ` [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries Yuyang Huang
@ 2026-05-18  3:29 ` Alexei Starovoitov
  2026-05-18  5:07   ` Yuyang Huang
  2 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2026-05-18  3:29 UTC (permalink / raw)
  To: Yuyang Huang
  Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski,
	Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan,
	Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf,
	LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development

On Fri, May 15, 2026 at 12:15 AM Yuyang Huang <yuyanghuang@google.com> wrote:
>
> The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but
> several commands write outputs back to userspace unconditionally. If the
> caller passes a short buffer, this can lead to out-of-bounds writes,
> potentially overwriting adjacent userspace memory.

The whole thing sounds like a user space bug.
Please demonstrate a case where this issue is seen
by using libbpf APIs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size
  2026-05-18  3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov
@ 2026-05-18  5:07   ` Yuyang Huang
  0 siblings, 0 replies; 17+ messages in thread
From: Yuyang Huang @ 2026-05-18  5:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski,
	Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi,
	Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan,
	Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf,
	LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development

>The whole thing sounds like a user space bug.

Could you please clarify why you consider this a userspace bug?

From our perspective, this is a binary backward compatibility issue.
An older binary compiled against older UAPI headers has no way of
knowing about newer fields. If it passes size = 40 (the correct size
at the time), the kernel violates the ABI contract by writing beyond
that declared size.

> Please demonstrate a case where this issue is seen by using libbpf APIs.

Our old project does not use libbpf; we interface directly with the
raw BPF syscall.

For example, our Android net_test  suite uses this legacy 40-byte
Python ctypes struct layout:

  # legacy 40-byte layout (pre-revision field)
  BpfAttrProgQuery = cstruct.Struct(
    "bpf_attr_prog_query", "=IIIIQI4xQ",
    "target_fd attach_type query_flags attach_flags prog_ids_ptr
prog_cnt prog_attach_flags"
  )
  # Invoked via: syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40)

As shown in the original discussion thread, without this fix, a newer
kernel unconditionally writes the optional revision field at offset
56. Since copy_to_user doesn't fault on adjacent mapped memory, this
silently corrupts 8 bytes of the caller's heap (corrupting the Python
interpreter).

This size-gating pattern in this patch should be identical to the
pattern introduced by commit 47a71c1f9af0 in btf.c.

I am not 100% sure but, it seems running a binary statically linked
with an older libbpf version (or directly using raw BPF syscalls)
might face a similar issue? Do we expect those binaries to be broken
in the newer kernel version?

On Mon, May 18, 2026 at 12:30 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 15, 2026 at 12:15 AM Yuyang Huang <yuyanghuang@google.com> wrote:
> >
> > The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but
> > several commands write outputs back to userspace unconditionally. If the
> > caller passes a short buffer, this can lead to out-of-bounds writes,
> > potentially overwriting adjacent userspace memory.
>
> The whole thing sounds like a user space bug.
> Please demonstrate a case where this issue is seen
> by using libbpf APIs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-29  4:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15  7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang
2026-05-15  7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang
2026-05-15  8:14   ` bot+bpf-ci
2026-05-21 11:40     ` Yuyang Huang
2026-05-24 11:22       ` Alexei Starovoitov
2026-05-25  7:21         ` Yuyang Huang
2026-05-28  5:42           ` Leon Hwang
2026-05-28 13:20             ` Yuyang Huang
2026-05-28 14:36               ` Leon Hwang
2026-05-28 15:08                 ` Lorenzo Colitti
2026-05-28 16:01                   ` Yuyang Huang
2026-05-28 16:18                     ` Yuyang Huang
2026-05-29  1:03           ` Alexei Starovoitov
2026-05-29  4:44             ` Yuyang Huang
2026-05-15  7:15 ` [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries Yuyang Huang
2026-05-18  3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov
2026-05-18  5:07   ` Yuyang Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox