* [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size
@ 2026-05-15 7:15 Yuyang Huang
2026-05-15 7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Yuyang Huang @ 2026-05-15 7:15 UTC (permalink / raw)
To: Yuyang Huang
Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko,
Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski,
Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi,
Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan,
Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf,
linux-kernel, linux-kselftest, netdev
The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but
several commands write outputs back to userspace unconditionally. If the
caller passes a short buffer, this can lead to out-of-bounds writes,
potentially overwriting adjacent userspace memory.
This series addresses this by introducing size-gating based on field type:
1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the
user-provided buffer size is smaller than the minimum size required to
cover these fields. This hardens the syscall entry point for several
commands.
2) Optional fields (later revisions): Skip writeback if the user-provided
buffer size is too small to cover them. This is applied to
'query.revision' in BPF_PROG_QUERY.
The first patch implements the plumbing and enforcement in the kernel.
The second patch adds a selftest to verify the behavior.
Yuyang Huang (2):
bpf: align syscall writeback behavior with caller-declared size
selftests/bpf: Add verification for BPF_PROG_QUERY attr size
boundaries
drivers/net/netkit.c | 5 +-
include/linux/bpf-cgroup.h | 5 +-
include/linux/bpf_mprog.h | 4 +-
include/net/netkit.h | 6 +-
include/net/tcx.h | 5 +-
kernel/bpf/cgroup.c | 13 +--
kernel/bpf/mprog.c | 5 +-
kernel/bpf/syscall.c | 34 ++++++--
kernel/bpf/tcx.c | 5 +-
.../selftests/bpf/prog_tests/bpf_attr_size.c | 84 +++++++++++++++++++
10 files changed, 141 insertions(+), 25 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-15 7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang @ 2026-05-15 7:15 ` Yuyang Huang 2026-05-15 8:14 ` bot+bpf-ci 2026-05-15 7:15 ` [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries Yuyang Huang 2026-05-18 3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov 2 siblings, 1 reply; 17+ messages in thread From: Yuyang Huang @ 2026-05-15 7:15 UTC (permalink / raw) To: Yuyang Huang Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, linux-kernel, linux-kselftest, netdev, Maciej Żenczykowski, Lorenzo Colitti The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but several commands write outputs back to userspace unconditionally. Because copy_to_user() does not fault on adjacent mapped memory, a short userspace buffer results in out-of-bounds writes, potentially overwriting adjacent userspace memory. Address this by introducing two policies based on field type: 1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the buffer size does not cover them. This hardens the syscall front-gate for the following commands: - BPF_PROG_QUERY (min size: query.prog_cnt) - BPF_PROG_TEST_RUN (min size: test.duration) - BPF_*_GET_NEXT_ID (min size: next_id) - BPF_OBJ_GET_INFO_BY_FD (min size: info.info_len) - BPF_TASK_FD_QUERY (minimum size: task_fd_query.probe_addr) - BPF_MAP_*_BATCH (min size: batch.flags) 2) Optional fields (later revisions): Skip writeback if the buffer size does not cover the field. This is applied to BPF_PROG_QUERY's 'query.revision'. Older userspace passing a smaller size (e.g., 40 bytes) will have the write safely skipped. This size-gating pattern mirrors the existing precedent used for 'log_true_size' (verifier.c) and 'btf_log_true_size' (btf.c). To support this, the user-declared 'size' is plumbed from __sys_bpf() through the query dispatchers (cgroup, tcx, netkit) to the underlying writeback helpers in cgroup.c and mprog.c. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com --- drivers/net/netkit.c | 5 +++-- include/linux/bpf-cgroup.h | 5 +++-- include/linux/bpf_mprog.h | 4 ++-- include/net/netkit.h | 6 ++++-- include/net/tcx.h | 5 +++-- kernel/bpf/cgroup.c | 13 +++++++------ kernel/bpf/mprog.c | 5 +++-- kernel/bpf/syscall.c | 34 +++++++++++++++++++++++++++++----- kernel/bpf/tcx.c | 5 +++-- 9 files changed, 57 insertions(+), 25 deletions(-) diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c index 5e2eecc3165d..680607d6e039 100644 --- a/drivers/net/netkit.c +++ b/drivers/net/netkit.c @@ -813,7 +813,8 @@ int netkit_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog) return ret; } -int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) +int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, + u32 uattr_size) { struct net_device *dev; int ret; @@ -826,7 +827,7 @@ int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) ret = PTR_ERR(dev); goto out; } - ret = bpf_mprog_query(attr, uattr, netkit_entry_fetch(dev, false)); + ret = bpf_mprog_query(attr, uattr, uattr_size, netkit_entry_fetch(dev, false)); out: rtnl_unlock(); return ret; diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index b2e79c2b41d5..4d0cc65976a1 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -421,7 +421,7 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype); int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); int cgroup_bpf_prog_query(const union bpf_attr *attr, - union bpf_attr __user *uattr); + union bpf_attr __user *uattr, u32 uattr_size); const struct bpf_func_proto * cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); @@ -452,7 +452,8 @@ static inline int cgroup_bpf_link_attach(const union bpf_attr *attr, } static inline int cgroup_bpf_prog_query(const union bpf_attr *attr, - union bpf_attr __user *uattr) + union bpf_attr __user *uattr, + u32 uattr_size) { return -EINVAL; } diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h index 0b9f4caeeb0a..fa479ace854a 100644 --- a/include/linux/bpf_mprog.h +++ b/include/linux/bpf_mprog.h @@ -72,7 +72,7 @@ * // bpf_mprog user-side lock * // fetch active @entry from attach location * [...] - * ret = bpf_mprog_query(attr, uattr, entry); + * ret = bpf_mprog_query(attr, uattr, uattr_size, entry); * // bpf_mprog user-side unlock * * Data/fast path: @@ -329,7 +329,7 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry, u32 flags, u32 id_or_fd, u64 revision); int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, - struct bpf_mprog_entry *entry); + u32 uattr_size, struct bpf_mprog_entry *entry); static inline bool bpf_mprog_supported(enum bpf_prog_type type) { diff --git a/include/net/netkit.h b/include/net/netkit.h index 9ec0163739f4..fe209d1f9a64 100644 --- a/include/net/netkit.h +++ b/include/net/netkit.h @@ -9,7 +9,8 @@ int netkit_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog); int netkit_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); int netkit_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog); -int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr); +int netkit_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, + u32 uattr_size); INDIRECT_CALLABLE_DECLARE(struct net_device *netkit_peer_dev(struct net_device *dev)); #else static inline int netkit_prog_attach(const union bpf_attr *attr, @@ -31,7 +32,8 @@ static inline int netkit_prog_detach(const union bpf_attr *attr, } static inline int netkit_prog_query(const union bpf_attr *attr, - union bpf_attr __user *uattr) + union bpf_attr __user *uattr, + u32 uattr_size) { return -EINVAL; } diff --git a/include/net/tcx.h b/include/net/tcx.h index 23a61af13547..610626b39676 100644 --- a/include/net/tcx.h +++ b/include/net/tcx.h @@ -166,7 +166,7 @@ int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog); void tcx_uninstall(struct net_device *dev, bool ingress); int tcx_prog_query(const union bpf_attr *attr, - union bpf_attr __user *uattr); + union bpf_attr __user *uattr, u32 uattr_size); static inline void dev_tcx_uninstall(struct net_device *dev) { @@ -194,7 +194,8 @@ static inline int tcx_prog_detach(const union bpf_attr *attr, } static inline int tcx_prog_query(const union bpf_attr *attr, - union bpf_attr __user *uattr) + union bpf_attr __user *uattr, + u32 uattr_size) { return -EINVAL; } diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 876f6a81a9b6..2c2bdaa86aa7 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1208,7 +1208,7 @@ static int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog, /* Must be called with cgroup_mutex held to avoid races. */ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, - union bpf_attr __user *uattr) + union bpf_attr __user *uattr, u32 uattr_size) { __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags); bool effective_query = attr->query.query_flags & BPF_F_QUERY_EFFECTIVE; @@ -1259,7 +1259,8 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, return -EFAULT; if (!effective_query && from_atype == to_atype) revision = cgrp->bpf.revisions[from_atype]; - if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) + if (uattr_size >= offsetofend(union bpf_attr, query.revision) && + copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) return -EFAULT; if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt) /* return early if user requested only program count + flags */ @@ -1312,12 +1313,12 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, } static int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr, - union bpf_attr __user *uattr) + union bpf_attr __user *uattr, u32 uattr_size) { int ret; cgroup_lock(); - ret = __cgroup_bpf_query(cgrp, attr, uattr); + ret = __cgroup_bpf_query(cgrp, attr, uattr, uattr_size); cgroup_unlock(); return ret; } @@ -1520,7 +1521,7 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) } int cgroup_bpf_prog_query(const union bpf_attr *attr, - union bpf_attr __user *uattr) + union bpf_attr __user *uattr, u32 uattr_size) { struct cgroup *cgrp; int ret; @@ -1529,7 +1530,7 @@ int cgroup_bpf_prog_query(const union bpf_attr *attr, if (IS_ERR(cgrp)) return PTR_ERR(cgrp); - ret = cgroup_bpf_query(cgrp, attr, uattr); + ret = cgroup_bpf_query(cgrp, attr, uattr, uattr_size); cgroup_put(cgrp); return ret; diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c index 1394168062e8..822d9c4c0db4 100644 --- a/kernel/bpf/mprog.c +++ b/kernel/bpf/mprog.c @@ -393,7 +393,7 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry, } int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, - struct bpf_mprog_entry *entry) + u32 uattr_size, struct bpf_mprog_entry *entry) { u32 __user *uprog_flags, *ulink_flags; u32 __user *uprog_id, *ulink_id; @@ -413,7 +413,8 @@ int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, } if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) return -EFAULT; - if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) + if (uattr_size >= offsetofend(union bpf_attr, query.revision) && + copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) return -EFAULT; if (copy_to_user(&uattr->query.count, &count, sizeof(count))) return -EFAULT; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a3c0214ca934..a46b0510d9e2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4654,7 +4654,7 @@ static int bpf_prog_detach(const union bpf_attr *attr) #define BPF_PROG_QUERY_LAST_FIELD query.revision static int bpf_prog_query(const union bpf_attr *attr, - union bpf_attr __user *uattr) + union bpf_attr __user *uattr, u32 uattr_size) { if (!bpf_net_capable()) return -EPERM; @@ -4693,7 +4693,7 @@ static int bpf_prog_query(const union bpf_attr *attr, case BPF_CGROUP_GETSOCKOPT: case BPF_CGROUP_SETSOCKOPT: case BPF_LSM_CGROUP: - return cgroup_bpf_prog_query(attr, uattr); + return cgroup_bpf_prog_query(attr, uattr, uattr_size); case BPF_LIRC_MODE2: return lirc_prog_query(attr, uattr); case BPF_FLOW_DISSECTOR: @@ -4706,10 +4706,10 @@ static int bpf_prog_query(const union bpf_attr *attr, return sock_map_bpf_prog_query(attr, uattr); case BPF_TCX_INGRESS: case BPF_TCX_EGRESS: - return tcx_prog_query(attr, uattr); + return tcx_prog_query(attr, uattr, uattr_size); case BPF_NETKIT_PRIMARY: case BPF_NETKIT_PEER: - return netkit_prog_query(attr, uattr); + return netkit_prog_query(attr, uattr, uattr_size); default: return -EINVAL; } @@ -6260,20 +6260,30 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size) err = bpf_prog_detach(&attr); break; case BPF_PROG_QUERY: - err = bpf_prog_query(&attr, uattr.user); + if (size < offsetofend(union bpf_attr, query.prog_cnt)) + return -EINVAL; + err = bpf_prog_query(&attr, uattr.user, size); break; case BPF_PROG_TEST_RUN: + if (size < offsetofend(union bpf_attr, test.duration)) + return -EINVAL; err = bpf_prog_test_run(&attr, uattr.user); break; case BPF_PROG_GET_NEXT_ID: + if (size < offsetofend(union bpf_attr, next_id)) + return -EINVAL; err = bpf_obj_get_next_id(&attr, uattr.user, &prog_idr, &prog_idr_lock); break; case BPF_MAP_GET_NEXT_ID: + if (size < offsetofend(union bpf_attr, next_id)) + return -EINVAL; err = bpf_obj_get_next_id(&attr, uattr.user, &map_idr, &map_idr_lock); break; case BPF_BTF_GET_NEXT_ID: + if (size < offsetofend(union bpf_attr, next_id)) + return -EINVAL; err = bpf_obj_get_next_id(&attr, uattr.user, &btf_idr, &btf_idr_lock); break; @@ -6284,6 +6294,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size) err = bpf_map_get_fd_by_id(&attr); break; case BPF_OBJ_GET_INFO_BY_FD: + if (size < offsetofend(union bpf_attr, info.info_len)) + return -EINVAL; err = bpf_obj_get_info_by_fd(&attr, uattr.user); break; case BPF_RAW_TRACEPOINT_OPEN: @@ -6296,22 +6308,32 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size) err = bpf_btf_get_fd_by_id(&attr); break; case BPF_TASK_FD_QUERY: + if (size < offsetofend(union bpf_attr, task_fd_query.probe_addr)) + return -EINVAL; err = bpf_task_fd_query(&attr, uattr.user); break; case BPF_MAP_LOOKUP_AND_DELETE_ELEM: err = map_lookup_and_delete_elem(&attr); break; case BPF_MAP_LOOKUP_BATCH: + if (size < offsetofend(union bpf_attr, batch.flags)) + return -EINVAL; err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_LOOKUP_BATCH); break; case BPF_MAP_LOOKUP_AND_DELETE_BATCH: + if (size < offsetofend(union bpf_attr, batch.flags)) + return -EINVAL; err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_LOOKUP_AND_DELETE_BATCH); break; case BPF_MAP_UPDATE_BATCH: + if (size < offsetofend(union bpf_attr, batch.flags)) + return -EINVAL; err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_UPDATE_BATCH); break; case BPF_MAP_DELETE_BATCH: + if (size < offsetofend(union bpf_attr, batch.flags)) + return -EINVAL; err = bpf_map_do_batch(&attr, uattr.user, BPF_MAP_DELETE_BATCH); break; case BPF_LINK_CREATE: @@ -6324,6 +6346,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size) err = bpf_link_get_fd_by_id(&attr); break; case BPF_LINK_GET_NEXT_ID: + if (size < offsetofend(union bpf_attr, next_id)) + return -EINVAL; err = bpf_obj_get_next_id(&attr, uattr.user, &link_idr, &link_idr_lock); break; diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c index 02db0113b8e7..2a91f6075511 100644 --- a/kernel/bpf/tcx.c +++ b/kernel/bpf/tcx.c @@ -119,7 +119,8 @@ void tcx_uninstall(struct net_device *dev, bool ingress) tcx_entry_free(entry); } -int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) +int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, + u32 uattr_size) { bool ingress = attr->query.attach_type == BPF_TCX_INGRESS; struct net *net = current->nsproxy->net_ns; @@ -132,7 +133,7 @@ int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) ret = -ENODEV; goto out; } - ret = bpf_mprog_query(attr, uattr, tcx_entry_fetch(dev, ingress)); + ret = bpf_mprog_query(attr, uattr, uattr_size, tcx_entry_fetch(dev, ingress)); out: rtnl_unlock(); return ret; -- 2.54.0.563.g4f69b47b94-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-15 7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang @ 2026-05-15 8:14 ` bot+bpf-ci 2026-05-21 11:40 ` Yuyang Huang 0 siblings, 1 reply; 17+ messages in thread From: bot+bpf-ci @ 2026-05-15 8:14 UTC (permalink / raw) To: yuyanghuang, yuyanghuang Cc: davem, ast, andrew+netdev, andrii, daniel, eddyz87, edumazet, kuba, jolsa, john.fastabend, memxor, martin.lau, razor, pabeni, shuah, horms, song, sdf, yonghong.song, bpf, linux-kernel, linux-kselftest, netdev, maze, lorenzo, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai [-- Attachment #1: Type: text/plain, Size: 2436 bytes --] > bpf: align syscall writeback behavior with caller-declared size > > The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but > several commands write outputs back to userspace unconditionally. Because > copy_to_user() does not fault on adjacent mapped memory, a short userspace > buffer results in out-of-bounds writes, potentially overwriting adjacent > userspace memory. > > Address this by introducing two policies based on field type: > > 1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the > buffer size does not cover them. This hardens the syscall front-gate > for the following commands: > - BPF_PROG_QUERY (min size: query.prog_cnt) > - BPF_PROG_TEST_RUN (min size: test.duration) > - BPF_*_GET_NEXT_ID (min size: next_id) > - BPF_OBJ_GET_INFO_BY_FD (min size: info.info_len) > - BPF_TASK_FD_QUERY (minimum size: task_fd_query.probe_addr) > - BPF_MAP_*_BATCH (min size: batch.flags) > > 2) Optional fields (later revisions): Skip writeback if the buffer size > does not cover the field. This is applied to BPF_PROG_QUERY's > 'query.revision'. Older userspace passing a smaller size (e.g., 40 > bytes) will have the write safely skipped. This size-gating pattern > mirrors the existing precedent used for 'log_true_size' (verifier.c) > and 'btf_log_true_size' (btf.c). > > To support this, the user-declared 'size' is plumbed from __sys_bpf() > through the query dispatchers (cgroup, tcx, netkit) to the underlying > writeback helpers in cgroup.c and mprog.c. > > Cc: Maciej Żenczykowski <maze@google.com> > Cc: Lorenzo Colitti <lorenzo@google.com> > Signed-off-by: Yuyang Huang <yuyanghuang@google.com> > Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com This looks like a bug fix for an out-of-bounds write vulnerability. Should this include a Fixes: tag to identify which commit introduced the unconditional copy_to_user() calls for query.revision without checking the user-provided buffer size? Fixes: 3fe213c040b3 ("adding ci files") --- AI reviewed your patch. Please fix the bug or email reply why it's not a bug. See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25905928331 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-15 8:14 ` bot+bpf-ci @ 2026-05-21 11:40 ` Yuyang Huang 2026-05-24 11:22 ` Alexei Starovoitov 0 siblings, 1 reply; 17+ messages in thread From: Yuyang Huang @ 2026-05-21 11:40 UTC (permalink / raw) To: bot+bpf-ci Cc: davem, ast, andrew+netdev, andrii, daniel, eddyz87, edumazet, kuba, jolsa, john.fastabend, memxor, martin.lau, razor, pabeni, shuah, horms, song, sdf, yonghong.song, bpf, linux-kernel, linux-kselftest, netdev, maze, lorenzo, martin.lau, clm, ihor.solodrai >This looks like a bug fix for an out-of-bounds write vulnerability. >Should this include a Fixes: tag to identify which commit introduced the >unconditional copy_to_user() calls for query.revision without checking >the user-provided buffer size? > Fixes: 3fe213c040b3 ("adding ci files") I completely missed this checkpatch comment. I will add the Fixes tag in v2 if there is a consensus that this patch is the right approach to resolve the issue. On Fri, May 15, 2026 at 5:14 PM <bot+bpf-ci@kernel.org> wrote: > > > bpf: align syscall writeback behavior with caller-declared size > > > > The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but > > several commands write outputs back to userspace unconditionally. Because > > copy_to_user() does not fault on adjacent mapped memory, a short userspace > > buffer results in out-of-bounds writes, potentially overwriting adjacent > > userspace memory. > > > > Address this by introducing two policies based on field type: > > > > 1) Mandatory fields (original ABI): Return -EINVAL in __sys_bpf() if the > > buffer size does not cover them. This hardens the syscall front-gate > > for the following commands: > > - BPF_PROG_QUERY (min size: query.prog_cnt) > > - BPF_PROG_TEST_RUN (min size: test.duration) > > - BPF_*_GET_NEXT_ID (min size: next_id) > > - BPF_OBJ_GET_INFO_BY_FD (min size: info.info_len) > > - BPF_TASK_FD_QUERY (minimum size: task_fd_query.probe_addr) > > - BPF_MAP_*_BATCH (min size: batch.flags) > > > > 2) Optional fields (later revisions): Skip writeback if the buffer size > > does not cover the field. This is applied to BPF_PROG_QUERY's > > 'query.revision'. Older userspace passing a smaller size (e.g., 40 > > bytes) will have the write safely skipped. This size-gating pattern > > mirrors the existing precedent used for 'log_true_size' (verifier.c) > > and 'btf_log_true_size' (btf.c). > > > > To support this, the user-declared 'size' is plumbed from __sys_bpf() > > through the query dispatchers (cgroup, tcx, netkit) to the underlying > > writeback helpers in cgroup.c and mprog.c. > > > > Cc: Maciej Żenczykowski <maze@google.com> > > Cc: Lorenzo Colitti <lorenzo@google.com> > > Signed-off-by: Yuyang Huang <yuyanghuang@google.com> > > Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com > > This looks like a bug fix for an out-of-bounds write vulnerability. > Should this include a Fixes: tag to identify which commit introduced the > unconditional copy_to_user() calls for query.revision without checking > the user-provided buffer size? > > Fixes: 3fe213c040b3 ("adding ci files") > > > --- > AI reviewed your patch. Please fix the bug or email reply why it's not a bug. > See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md > > CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25905928331 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-21 11:40 ` Yuyang Huang @ 2026-05-24 11:22 ` Alexei Starovoitov 2026-05-25 7:21 ` Yuyang Huang 0 siblings, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2026-05-24 11:22 UTC (permalink / raw) To: Yuyang Huang, Daniel Borkmann Cc: bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau, Chris Mason, Ihor Solodrai On Thu, May 21, 2026 at 1:41 PM Yuyang Huang <yuyanghuang@google.com> wrote: > > >This looks like a bug fix for an out-of-bounds write vulnerability. > >Should this include a Fixes: tag to identify which commit introduced the > >unconditional copy_to_user() calls for query.revision without checking > >the user-provided buffer size? > > > Fixes: 3fe213c040b3 ("adding ci files") > > I completely missed this checkpatch comment. I will add the Fixes tag > in v2 if there is a consensus that this patch is the right approach to > resolve the issue. AI is wrong. The fixes tag should probably point to 053c8e1f235dc or a subsequent patch? bpf_mprog_query() and tcx_prog_query() were only introduced there. How come your userspace passes shorter uatter to query newer BPF_TCX_EGRESS attachment? Please provide more details. I'm still not convinced that what you're saying: " For example, our Android net_test suite uses this legacy 40-byte Python ctypes struct layout: # legacy 40-byte layout (pre-revision field) BpfAttrProgQuery = cstruct.Struct( "bpf_attr_prog_query", "=IIIIQI4xQ", "target_fd attach_type query_flags attach_flags prog_ids_ptr prog_cnt prog_attach_flags" ) # Invoked via: syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40) " is a kernel bug. Looks like user space is querying new attachment types with old structure. Potentially the check can only be around: case BPF_TCX_INGRESS: case BPF_TCX_EGRESS: return tcx_prog_query(attr, uattr); to be nice to broken user space, but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS with a smaller uattr. pw-bot: cr ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-24 11:22 ` Alexei Starovoitov @ 2026-05-25 7:21 ` Yuyang Huang 2026-05-28 5:42 ` Leon Hwang 2026-05-29 1:03 ` Alexei Starovoitov 0 siblings, 2 replies; 17+ messages in thread From: Yuyang Huang @ 2026-05-25 7:21 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau, Chris Mason, Ihor Solodrai Thanks for the review comment. To better explain the problem, please bear with me through this long email reply. >AI is wrong. >The fixes tag should probably point to 053c8e1f235dc or a subsequent patch? Thanks for pointing it out. I think the correct Fixes tags should be as follows, and I will split them individually in the v2 patches: Fixes: 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs") Fixes: 120933984460 ("bpf: Implement mprog API on top of existing cgroup progs") >Please provide more details. >I'm still not convinced that what you're saying... is a kernel bug. >Looks like user space is querying new attachment types with old structure. I created a small C program using BPF_PROG_QUERY to query BPF_CGROUP_INET_INGRESS (not the BPF_TCX_EGRESS) to demonstrate the full problem , hope it clarifies: ```c #include <unistd.h> #include <sys/syscall.h> #include <stdio.h> #include <string.h> #include <fcntl.h> #define __NR_bpf 321 #define BPF_PROG_QUERY 16 #define BPF_CGROUP_INET_INGRESS 0 int main() { char buf[64]; memset(buf, 0xAA, 64); // Set 0xAA signature memset(buf, 0, 40); // Zero out legacy 40-byte struct *(int*)(buf + 0) = open("/sys/fs/cgroup", O_RDONLY); *(int*)(buf + 4) = BPF_CGROUP_INET_INGRESS; // Explicitly set attach_type // Call BPF_PROG_QUERY (16) with legacy size (40) on x86_64 (syscall 321) long ret = syscall(__NR_bpf, BPF_PROG_QUERY, buf, 40); printf("Syscall return value: %ld\n", ret); printf("Revision (offsets 56-63): %016llx\n", *(unsigned long long*)(buf + 56)); close(*(int*)(buf + 0)); // Clean up cgroup FD return 0; } ``` Before the patch: Syscall return value: 0 Revision (offsets 56-63): 0000000000000001 <-- Stack/Heap Corruption! After the patch (Safe / revision writeback is gated): Syscall return value: 0 Revision (offsets 56-63): aaaaaaaaaaaaaaaa <-- Safe (Signature intact) For this example, the regression lies in legacy CGROUP queries: 1. BPF_PROG_QUERY for cgroups is not a new feature; it has been in the kernel since 2017. 2. There are existing legacy userspace applications and language wrappers (like our android net-tests) written years ago with older structure layouts (passing size = 40, ending at query.prog_attach_flags) that query cgroups. 3. In June 2025, Commit 120933984460 backported "revision" support to cgroup queries, introducing an unconditional writeback in `__cgroup_bpf_query()` at offset 56: ```c // In __cgroup_bpf_query() (kernel/bpf/cgroup.c) - Introduced in 120933984460: if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) return -EFAULT; ``` Now lets talk about BPF_TCX_EGRESS: > bpf_mprog_query() and tcx_prog_query() were only introduced there. > How come your userspace passes shorter uatter to query newer > BPF_TCX_EGRESS attachment? I understand your point, but two common cases exist where userspace will legitimately query BPF_TCX_EGRESS with a 40-byte structure: First, a generic BPF querying tool (assume it called `./bpf-query`) compiled in 2020 (when the query buffer size was 40 bytes) might accept the attach type dynamically from the command line: ./bpf-query --type 47 --fd 3 # 47 is BPF_TCX_EGRESS The unmodified 2020 binary populates the struct dynamically and invokes the syscall with size = 40: ```c .... attr.query.attach_type = 47; // BPF_TCX_EGRESS (passed dynamically at runtime) syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40); ``` Without the size check, the unconditional writeback of revision at offset 56 would result in an out-of-bounds write on the stack. Second, in dynamic language environments (like Python), UAPI structures are often declared manually. The developer might only declare the subset of fields the application actually uses (40 bytes in this case) to optimize memory overhead. Even with this subset declaration, we believe the kernel should ideally respect the declared size limit to prevent accidental out-of-bounds writes into adjacent userspace memory. > Looks like user space is querying new attachment types > with old structure. > Potentially the check can only be around: > case BPF_TCX_INGRESS: > case BPF_TCX_EGRESS: > return tcx_prog_query(attr, uattr); > > to be nice to broken user space, > but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS > with a smaller uattr. Add gating to the TCX path, will not be able to fix the legacy cgroup query path. As shown above, the cgroup query path has the exact same OOB writeback regression affecting legacy userspace. Since we must plumb uattr_size to cgroup.c to resolve the cgroup regression safely, applying the same consistent size-gating in bpf_mprog_query() seemed like the most consistent and robust architectural path. Feel free to let us know your thoughts. Thanks, Yuyang On Sun, May 24, 2026 at 8:22 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, May 21, 2026 at 1:41 PM Yuyang Huang <yuyanghuang@google.com> wrote: > > > > >This looks like a bug fix for an out-of-bounds write vulnerability. > > >Should this include a Fixes: tag to identify which commit introduced the > > >unconditional copy_to_user() calls for query.revision without checking > > >the user-provided buffer size? > > > > > Fixes: 3fe213c040b3 ("adding ci files") > > > > I completely missed this checkpatch comment. I will add the Fixes tag > > in v2 if there is a consensus that this patch is the right approach to > > resolve the issue. > > AI is wrong. > The fixes tag should probably point to 053c8e1f235dc or a subsequent patch? > > bpf_mprog_query() and tcx_prog_query() were only introduced there. > How come your userspace passes shorter uatter to query newer > BPF_TCX_EGRESS attachment? > > Please provide more details. > > I'm still not convinced that what you're saying: > " > For example, our Android net_test suite uses this legacy 40-byte > Python ctypes struct layout: > > # legacy 40-byte layout (pre-revision field) > BpfAttrProgQuery = cstruct.Struct( > "bpf_attr_prog_query", "=IIIIQI4xQ", > "target_fd attach_type query_flags attach_flags prog_ids_ptr > prog_cnt prog_attach_flags" > ) > # Invoked via: syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40) > " > is a kernel bug. > > Looks like user space is querying new attachment types > with old structure. > Potentially the check can only be around: > case BPF_TCX_INGRESS: > case BPF_TCX_EGRESS: > return tcx_prog_query(attr, uattr); > > to be nice to broken user space, > but, really, attr->query.attach_type shouldn't be equal to BPF_TCX_EGRESS > with a smaller uattr. > > pw-bot: cr ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-25 7:21 ` Yuyang Huang @ 2026-05-28 5:42 ` Leon Hwang 2026-05-28 13:20 ` Yuyang Huang 2026-05-29 1:03 ` Alexei Starovoitov 1 sibling, 1 reply; 17+ messages in thread From: Leon Hwang @ 2026-05-28 5:42 UTC (permalink / raw) To: Yuyang Huang, Alexei Starovoitov Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau, Chris Mason, Ihor Solodrai On 25/5/26 15:21, Yuyang Huang wrote: [...] > > Feel free to let us know your thoughts. > I believe this is a user space issue instead of a kernel bug. I tried to use mmap() memory as uattr that got -EFAULT instead of crash. [................] /* mmap() memory */ ^ tail 40B as uattr ^ 56B offset for copy_to_user() Thanks, Leon ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-28 5:42 ` Leon Hwang @ 2026-05-28 13:20 ` Yuyang Huang 2026-05-28 14:36 ` Leon Hwang 0 siblings, 1 reply; 17+ messages in thread From: Yuyang Huang @ 2026-05-28 13:20 UTC (permalink / raw) To: Leon Hwang Cc: Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau, Chris Mason, Ihor Solodrai On Thu, May 28, 2026 at 1:43 PM Leon Hwang <leon.hwang@linux.dev> wrote: > > On 25/5/26 15:21, Yuyang Huang wrote: > [...] > > > > Feel free to let us know your thoughts. > > > I believe this is a user space issue instead of a kernel bug. > > I tried to use mmap() memory as uattr that got -EFAULT instead of crash. > > [................] /* mmap() memory */ > ^ tail 40B as uattr > ^ 56B offset for copy_to_user() > > Thanks, > Leon > Thanks for testing this! There are some discussion in the original thread: https://lore.kernel.org/all/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com/ as follows, which might answer your question > > > If the uattr indeed has less than needed space, then for > > > if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) > > > return -EFAULT; > > > the kernel will return -EFAULT to user space. > > > > > > Maybe userspace didn't handle the return code properly and causing > > > user space corruption and segfaults. This shouldn't be a kernel issue. > > > Maybe I missed something? > > > > That's not how that works at all. > > > > copy_to_user() will only fail and thus EFAULT will only be returned if > > the memory area copy_to_user() is trying to copy into isn't > > owned/mapped by the user (or perhaps is read-only protected, not sure > > about this last one). > > > > Because memory is mapped in (at least) 4K pages, the memory after a > > user buffer is almost always still valid memory. It might be unused, > > or it might be something on the stack - like a return address, or it > > might be on the heap - metadata tracking, or a different memory > > allocation perhaps entirely. You might hit the same case as maze@ mentioned in the thread. To trigger -EFAULT, you likely positioned `uattr` at the very end of a mapped page immediately followed by a protected page Could you share the test program you created so we can verify? Please check the test program I shared earlier in the thread (where uattr is stored on the stack); the BPF syscall returned 0, but stack corruption occurred. If you think my test program contains a bug, feel free to let me know. Thanks, Yuyang ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-28 13:20 ` Yuyang Huang @ 2026-05-28 14:36 ` Leon Hwang 2026-05-28 15:08 ` Lorenzo Colitti 0 siblings, 1 reply; 17+ messages in thread From: Leon Hwang @ 2026-05-28 14:36 UTC (permalink / raw) To: Yuyang Huang Cc: Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau, Chris Mason, Ihor Solodrai On 2026/5/28 21:20, Yuyang Huang wrote: > On Thu, May 28, 2026 at 1:43 PM Leon Hwang <leon.hwang@linux.dev> wrote: >> >> On 25/5/26 15:21, Yuyang Huang wrote: >> [...] >>> >>> Feel free to let us know your thoughts. >>> >> I believe this is a user space issue instead of a kernel bug. >> >> I tried to use mmap() memory as uattr that got -EFAULT instead of crash. >> >> [................] /* mmap() memory */ >> ^ tail 40B as uattr >> ^ 56B offset for copy_to_user() >> >> Thanks, >> Leon >> > > Thanks for testing this! > > There are some discussion in the original thread: > https://lore.kernel.org/all/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com/ > as follows, which might answer your question > It seems you haven't convinced Alexei in that thread. >>>> If the uattr indeed has less than needed space, then for >>>> if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) >>>> return -EFAULT; >>>> the kernel will return -EFAULT to user space. >>>> >>>> Maybe userspace didn't handle the return code properly and causing >>>> user space corruption and segfaults. This shouldn't be a kernel issue. >>>> Maybe I missed something? >>> >>> That's not how that works at all. >>> >>> copy_to_user() will only fail and thus EFAULT will only be returned if >>> the memory area copy_to_user() is trying to copy into isn't >>> owned/mapped by the user (or perhaps is read-only protected, not sure >>> about this last one). >>> >>> Because memory is mapped in (at least) 4K pages, the memory after a >>> user buffer is almost always still valid memory. It might be unused, >>> or it might be something on the stack - like a return address, or it >>> might be on the heap - metadata tracking, or a different memory >>> allocation perhaps entirely. > > You might hit the same case as maze@ mentioned in the thread. > > To trigger -EFAULT, you likely positioned `uattr` at the very end of a > mapped page immediately followed by a protected page > > Could you share the test program you created so we can verify? > Attached below. > Please check the test program I shared earlier in the thread (where > uattr is stored on the stack); the BPF syscall returned 0, but stack > corruption occurred. > To avoid such stack corruption, you should reserve enough space for the query, e.g., by extracting union bpf_attr from kernel BTF vmlinux. Thanks, Leon > If you think my test program contains a bug, feel free to let me know. > > Thanks, > > Yuyang --- Assisted-by: Copilot:gemini-3-1-pro-preview // SPDX-License-Identifier: GPL-2.0 #include <uapi/linux/if_link.h> #include <sys/mman.h> #include <sys/syscall.h> #include <linux/bpf.h> #include <net/if.h> #include <test_progs.h> #define loopback 1 #include "test_tc_link.skel.h" #include "tc_helpers.h" #define SHORT_QUERY_SIZE offsetofend(union bpf_attr, query.prog_attach_flags) /* * test_tail_uattr_out_of_mmap: * * Places uattr at the very tail of a 1-page anonymous mmap so that the * mandatory fields (target_ifindex..prog_attach_flags, 40 bytes) fit inside the page * but query.revision (+56) falls in the unmapped page immediately after. * * mmap layout (1 page, e.g. 4096 bytes): * * [0 ............ page_size - SHORT_QUERY_SIZE - 1] unused * [page_size - 40 ................. page_size - 1 ] uattr: target_ifindex..prog_cnt * [page_size .................................. ] UNMAPPED * ^ * uattr + 56 (revision) lands here */ static void test_tail_uattr_out_of_mmap(void) { long page_size = sysconf(_SC_PAGE_SIZE); LIBBPF_OPTS(bpf_prog_attach_opts, opta); LIBBPF_OPTS(bpf_prog_detach_opts, optd); struct test_tc_link *skel; union bpf_attr *attr; void *mem, *tail; int fd, err; skel = test_tc_link__open_and_load(); if (!ASSERT_OK_PTR(skel, "skel_load")) return; fd = bpf_program__fd(skel->progs.tc1); err = bpf_prog_attach_opts(fd, loopback, BPF_TCX_INGRESS, &opta); if (!ASSERT_OK(err, "prog_attach")) goto cleanup_skel; /* * Allocate 2 contiguous pages then immediately unmap the second one. * This guarantees the page following the first is unmapped, regardless * of what the runtime placed there beforehand. * Place uattr at the tail: last SHORT_QUERY_SIZE (40) bytes of page 1. * uattr + 56 (revision) therefore lands 16 bytes past the page end, * in the unmapped region. */ mem = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (!ASSERT_OK_PTR(mem, "mmap")) goto detach; munmap((char *)mem + page_size, page_size); tail = (char *)mem + page_size - SHORT_QUERY_SIZE; memset(tail, 0, SHORT_QUERY_SIZE); attr = (union bpf_attr *)tail; attr->query.target_ifindex = loopback; attr->query.attach_type = BPF_TCX_INGRESS; err = syscall(__NR_bpf, BPF_PROG_QUERY, tail, SHORT_QUERY_SIZE); ASSERT_OK(err, "syscall"); ASSERT_OK(errno, "errno"); ASSERT_EQ(attr->query.prog_cnt, 1, "prog_cnt_written"); munmap(mem, page_size); /* second page already unmapped above */ detach: bpf_prog_detach_opts(fd, loopback, BPF_TCX_INGRESS, &optd); cleanup_skel: test_tc_link__destroy(skel); } void test_mmap_uattr_corruption(void) { if (test__start_subtest("tail_uattr_out_of_mmap")) test_tail_uattr_out_of_mmap(); } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-28 14:36 ` Leon Hwang @ 2026-05-28 15:08 ` Lorenzo Colitti 2026-05-28 16:01 ` Yuyang Huang 0 siblings, 1 reply; 17+ messages in thread From: Lorenzo Colitti @ 2026-05-28 15:08 UTC (permalink / raw) To: Leon Hwang Cc: Yuyang Huang, Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Martin KaFai Lau, Chris Mason, Ihor Solodrai On Thu, May 28, 2026 at 11:37 PM Leon Hwang <leon.hwang@linux.dev> wrote: > To avoid such stack corruption, you should reserve enough space for the > query, e.g., by extracting union bpf_attr from kernel BTF vmlinux. That seems unreasonable. There's already a size in the bpf syscall, why can't the kernel respect that? Also, the length of bpf_attr has increased over time as the kernel adds more elements. Doesn't that mean that even if the userspace program passes a sufficient size, a future kernel could start writing more bytes and start overwriting memory? Breaking userspace on kernel upgrades should never happen. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-28 15:08 ` Lorenzo Colitti @ 2026-05-28 16:01 ` Yuyang Huang 2026-05-28 16:18 ` Yuyang Huang 0 siblings, 1 reply; 17+ messages in thread From: Yuyang Huang @ 2026-05-28 16:01 UTC (permalink / raw) To: Lorenzo Colitti Cc: Leon Hwang, Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Martin KaFai Lau, Chris Mason, Ihor Solodrai >It seems you haven't convinced Alexei in that thread. >Attached below. Thanks for the test program. It proves my hypothesis: you had to manually unmap a page boundary to force that `-EFAULT`. The part that is still under discussion is whether there is a kernel bug or not. I think there is no question about the behavior of when the -EFAULT will be triggered. > To avoid such stack corruption, you should reserve enough space for the > query, e.g., by extracting union bpf_attr from kernel BTF vmlinux. I agree that your suggestion to extract the layout from BTF is good practice when writing a new program from scratch. However, our focus here is on maintaining backward compatibility for existing binaries. I think the core question is: **is a 40-byte query a valid request or not?** From my understanding, under the syscall ABI contract, passing a 40-byte buffer (at least for query BPF_CGROUP_INET_INGRESS which has been stable and supported since 2017) is 100% valid and legal. **The kernel must never break existing userspace on a kernel upgrade**. One example of a syscall that maintains such backward compatibility is openat2. The documentation notes it as follows: EINVAL size was smaller than **any known version of struct open_how**. If this bpf() syscall want to follow the same way to maintain the ABI contract, your `mmap` test actually let kernel return -EINVAL to a userspace program that passes a 40-byte buffer (declaring `size = 40`), because the kernel unconditionally tries to write to offset 56, which happens to be unmapped. Are you arguing that the `bpf()` syscall does not need to follow the ABI contract to maintain backward compatibility, and that it is expected that newer kernels can break binaries compiled against older kernels? If you believe that all binaries must be recompiled or dynamically updated against the latest vmlinux BTF upon every kernel upgrade, please be explicit about it. Thanks, Yuyang ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-28 16:01 ` Yuyang Huang @ 2026-05-28 16:18 ` Yuyang Huang 0 siblings, 0 replies; 17+ messages in thread From: Yuyang Huang @ 2026-05-28 16:18 UTC (permalink / raw) To: Lorenzo Colitti Cc: Leon Hwang, Alexei Starovoitov, Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Martin KaFai Lau, Chris Mason, Ihor Solodrai >The fixes tag should probably point to 053c8e1f235dc or a subsequent patch? >bpf_mprog_query() and tcx_prog_query() were only introduced there. >How come your userspace passes shorter uatter to query newer >BPF_TCX_EGRESS attachment? btw, If there is a strong preference for the BPF_TCX_EGRESS type, specifically, that userspace shouldn't send a size smaller than the newly added structure when using the new attchment type, I can modify my patch to limit the fix to the BPF_CGROUP_INET_INGRESS type only. I do agree newly added BPF_TCX_EGRESS and the existing BPF_CGROUP_INET_INGRESS worth a different discussion. On Fri, May 29, 2026 at 12:01 AM Yuyang Huang <yuyanghuang@google.com> wrote: > > >It seems you haven't convinced Alexei in that thread. > >Attached below. > > Thanks for the test program. It proves my hypothesis: you had to > manually unmap a page boundary to force that `-EFAULT`. The part that > is still under discussion is whether there is a kernel bug or not. I > think there is no question about the behavior of when the -EFAULT will > be triggered. > > > To avoid such stack corruption, you should reserve enough space for the > > query, e.g., by extracting union bpf_attr from kernel BTF vmlinux. > > I agree that your suggestion to extract the layout from BTF is good > practice when writing a new program from scratch. However, our focus > here is on maintaining backward compatibility for existing binaries. > > I think the core question is: **is a 40-byte query a valid request or not?** > > From my understanding, under the syscall ABI contract, passing a > 40-byte buffer (at least for query BPF_CGROUP_INET_INGRESS which has > been stable and supported since 2017) is 100% valid and legal. **The > kernel must never break existing userspace on a kernel upgrade**. > > One example of a syscall that maintains such backward compatibility is > openat2. The documentation notes it as follows: > > EINVAL size was smaller than **any known version of struct open_how**. > > If this bpf() syscall want to follow the same way to maintain the ABI > contract, your `mmap` test actually let kernel return -EINVAL to a > userspace program that passes a 40-byte buffer (declaring `size = > 40`), because the kernel unconditionally tries to write to offset 56, > which happens to be unmapped. > > Are you arguing that the `bpf()` syscall does not need to follow the > ABI contract to maintain backward compatibility, and that it is > expected that newer kernels can break binaries compiled against older > kernels? If you believe that all binaries must be recompiled or > dynamically updated against the latest vmlinux BTF upon every kernel > upgrade, please be explicit about it. > > Thanks, > > Yuyang ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-25 7:21 ` Yuyang Huang 2026-05-28 5:42 ` Leon Hwang @ 2026-05-29 1:03 ` Alexei Starovoitov 2026-05-29 4:44 ` Yuyang Huang 1 sibling, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2026-05-29 1:03 UTC (permalink / raw) To: Yuyang Huang Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau, Chris Mason, Ihor Solodrai On Mon May 25, 2026 at 12:21 AM PDT, Yuyang Huang wrote: > > For this example, the regression lies in legacy CGROUP queries: > 1. BPF_PROG_QUERY for cgroups is not a new feature; it has been in the > kernel since 2017. > 2. There are existing legacy userspace applications and language > wrappers (like our android net-tests) written years ago with older > structure layouts (passing size = 40, ending at > query.prog_attach_flags) that query cgroups. > 3. In June 2025, Commit 120933984460 backported "revision" support to > cgroup queries, introducing an unconditional writeback in > `__cgroup_bpf_query()` at offset 56: > ```c > // In __cgroup_bpf_query() (kernel/bpf/cgroup.c) - Introduced in 120933984460: > if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision))) > return -EFAULT; > ``` Ok. This is fair. 120933984460 is indeed problematic. > Now lets talk about BPF_TCX_EGRESS: > >> bpf_mprog_query() and tcx_prog_query() were only introduced there. >> How come your userspace passes shorter uatter to query newer >> BPF_TCX_EGRESS attachment? > > I understand your point, but two common cases exist where userspace > will legitimately query BPF_TCX_EGRESS with a 40-byte structure: > > First, a generic BPF querying tool (assume it called `./bpf-query`) > compiled in 2020 (when the query buffer size was 40 bytes) might > accept the attach type dynamically from the command line: > ./bpf-query --type 47 --fd 3 # 47 is BPF_TCX_EGRESS Don't buy this at all. This one is clearly a user space bug. > Add gating to the TCX path, will not be able to fix the legacy cgroup > query path. As shown above, the cgroup query path has the exact same > OOB writeback regression affecting legacy userspace. Since we must > plumb uattr_size to cgroup.c to resolve the cgroup regression safely, > applying the same consistent size-gating in bpf_mprog_query() seemed > like the most consistent and robust architectural path. Not really. Your patch adds checks to what looks like "random" copy_to_user() places. This thread is clear indication that it's not a "robust architectural path". True robust path would be to wrap every copy_to_user() into another helper that takes uattr start pointer and size, does extra check, and returns something like ENOSPC (and not EFAULT), but that would be a significant churn. So I prefer a minimal patch that add single check in bpf_prog_query() that checks that user space supplied buffer is large enough for attr->query. If not -> EFAULT. That would be better than overwritting. One can argue that partial copy_to_user() in __cgroup_bpf_query() should still be allowed, but I'm against partial and inconsistent queries. query.revision might seem benign, but if we do it for all copy_to_user() we better return ENOSPC to differentiate vs EFAULT to tell user space to fix itself. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size 2026-05-29 1:03 ` Alexei Starovoitov @ 2026-05-29 4:44 ` Yuyang Huang 0 siblings, 0 replies; 17+ messages in thread From: Yuyang Huang @ 2026-05-29 4:44 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, bot+bpf-ci, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Eduard, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development, Maciej Żenczykowski, Lorenzo Colitti, Martin KaFai Lau, Chris Mason, Ihor Solodrai, Todd Kjos, Carlos Llamas, Kalesh Singh On Fri, May 29, 2026 at 9:03 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon May 25, 2026 at 12:21 AM PDT, Yuyang Huang wrote: Thanks for the reply. > Ok. This is fair. 120933984460 is indeed problematic. I'm glad to hear we agree that 120933984460 is problematic. > Not really. Your patch adds checks to what looks like "random" copy_to_user() > places. This thread is clear indication that it's not a "robust architectural path". > > True robust path would be to wrap every copy_to_user() into another helper > that takes uattr start pointer and size, does extra check, and > returns something like ENOSPC (and not EFAULT), > but that would be a significant churn. I agree that the truly robust path would be a proper copy_to_user() wrapper instead of doing the ad-hoc check in the current path, but indeed, that will be a big change. > So I prefer a minimal patch that add single check in bpf_prog_query() > that checks that user space supplied buffer is large enough for attr->query. > If not -> EFAULT. That would be better than overwritting. > One can argue that partial copy_to_user() in __cgroup_bpf_query() > should still be allowed, but I'm against partial and > inconsistent queries. > query.revision might seem benign, but if we do it for all copy_to_user() > we better return ENOSPC to differentiate vs EFAULT to tell user space > to fix itself. Okay, I understand the preference is for a minimal patch to implement the fix. Just to make sure I fully understand your suggestion: Are you proposing that we add a single check at the entry gate in `bpf_prog_query()` like this? ```c static int bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, u32 uattr_size) { if (uattr_size < offsetofend(union bpf_attr, query.revision)) return -EFAULT; ... } ``` If we implement this, I want to confirm if you are okay with the consequence for legacy cgroup queries (e.g., `BPF_CGROUP_INET_INGRESS`): 1. Before 120933984460: Legacy userspace compiled years ago with a 40-byte `bpf_attr` layout (ending at `prog_attach_flags`) regularly queries cgroups passing `size = 40`, and it worked perfectly. 2. With this check: These unmodified legacy applications will now fail with `-EFAULT` on newer kernels because they pass `size = 40` which is less than 64. This means all user-space applicaion is expected to "fix" their code when along with the kernel upgrade. Is your preference to explicitly fail these legacy `size = 40` cgroup queries (breaking backward compatibility for them) to avoid "partial and inconsistent queries"? If we want to keep these legacy queries working safely (without failing them and without causing memory corruption), we cannot use a single check in `bpf_prog_query()`. We would still be forced to plumb `uattr_size` to `__cgroup_bpf_query()` so we can conditionally skip the `revision` writeback when `size < 64`. Would love to get your confirmation on which trade-off you prefer before I send V2. Thanks, Yuyang ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries 2026-05-15 7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang 2026-05-15 7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang @ 2026-05-15 7:15 ` Yuyang Huang 2026-05-18 3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov 2 siblings, 0 replies; 17+ messages in thread From: Yuyang Huang @ 2026-05-15 7:15 UTC (permalink / raw) To: Yuyang Huang Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, linux-kernel, linux-kselftest, netdev, Maciej Żenczykowski, Lorenzo Colitti Add a new selftest to verify that the BPF syscall (specifically BPF_PROG_QUERY) correctly respects the caller-declared attribute size boundaries: - Optional output fields (like query.revision) are not written if the caller-declared size ends before them. - Calls with a size below the mandatory minimum return -EINVAL. - Full-size calls still receive the optional fields normally. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Link: https://lore.kernel.org/r/CANP3RGfZTXM_u=E_atoomPZXutoQJ02nOMkCCR-YBZbOm2suWA@mail.gmail.com Tested with virtme-ng: # ./test_progs -t bpf_attr_size #17/1 bpf_attr_size/query_size_boundaries:OK #17/2 bpf_attr_size/query_mandatory_too_short_einval:OK #17 bpf_attr_size:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED --- .../selftests/bpf/prog_tests/bpf_attr_size.c | 84 +++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c b/tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c new file mode 100644 index 000000000000..65fd717782de --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/bpf_attr_size.c @@ -0,0 +1,84 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2026 Google LLC */ +#include <linux/bpf.h> +#include <unistd.h> +#include <sys/syscall.h> +#include <test_progs.h> +#include "test_tc_link.skel.h" +#include "tc_helpers.h" + +#define OLD_QUERY_SIZE offsetofend(union bpf_attr, query.prog_cnt) +#define FULL_QUERY_SIZE offsetofend(union bpf_attr, query.revision) +#define SHORT_QUERY_SIZE offsetofend(union bpf_attr, query.attach_type) + +static void test_query_size_boundaries(void) +{ + LIBBPF_OPTS(bpf_prog_attach_opts, opta); + LIBBPF_OPTS(bpf_prog_detach_opts, optd); + struct test_tc_link *skel; + union bpf_attr attr; + int fd, err; + + skel = test_tc_link__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_load")) + return; + + fd = bpf_program__fd(skel->progs.tc1); + + err = bpf_prog_attach_opts(fd, loopback, BPF_TCX_INGRESS, &opta); + if (!ASSERT_OK(err, "prog_attach")) + goto cleanup; + + /* 1. Old size: revision must not be written */ + memset(&attr, 0, sizeof(attr)); + attr.query.target_ifindex = loopback; + attr.query.attach_type = BPF_TCX_INGRESS; + + err = syscall(__NR_bpf, BPF_PROG_QUERY, &attr, OLD_QUERY_SIZE); + if (!ASSERT_OK(err, "query_old_size")) + goto detach; + + ASSERT_EQ(attr.query.prog_cnt, 1, "prog_cnt_written"); + ASSERT_EQ(attr.query.revision, 0, "revision_not_written"); + + /* 2. Full size: revision must be written normally */ + memset(&attr, 0, sizeof(attr)); + attr.query.target_ifindex = loopback; + attr.query.attach_type = BPF_TCX_INGRESS; + + err = syscall(__NR_bpf, BPF_PROG_QUERY, &attr, FULL_QUERY_SIZE); + if (!ASSERT_OK(err, "query_full_size")) + goto detach; + + ASSERT_EQ(attr.query.prog_cnt, 1, "prog_cnt_written"); + ASSERT_GT(attr.query.revision, 0, "revision_written"); + +detach: + err = bpf_prog_detach_opts(fd, loopback, BPF_TCX_INGRESS, &optd); + ASSERT_OK(err, "prog_detach"); +cleanup: + test_tc_link__destroy(skel); +} + +static void test_query_mandatory_too_short_einval(void) +{ + union bpf_attr attr; + int err; + + /* Below minimum size: must return -EINVAL */ + memset(&attr, 0, sizeof(attr)); + attr.query.target_ifindex = loopback; + attr.query.attach_type = BPF_TCX_INGRESS; + + err = syscall(__NR_bpf, BPF_PROG_QUERY, &attr, SHORT_QUERY_SIZE); + ASSERT_EQ(err, -1, "query_too_short_fails"); + ASSERT_EQ(errno, EINVAL, "query_too_short_einval"); +} + +void test_bpf_attr_size(void) +{ + if (test__start_subtest("query_size_boundaries")) + test_query_size_boundaries(); + if (test__start_subtest("query_mandatory_too_short_einval")) + test_query_mandatory_too_short_einval(); +} -- 2.54.0.563.g4f69b47b94-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size 2026-05-15 7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang 2026-05-15 7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang 2026-05-15 7:15 ` [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries Yuyang Huang @ 2026-05-18 3:29 ` Alexei Starovoitov 2026-05-18 5:07 ` Yuyang Huang 2 siblings, 1 reply; 17+ messages in thread From: Alexei Starovoitov @ 2026-05-18 3:29 UTC (permalink / raw) To: Yuyang Huang Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development On Fri, May 15, 2026 at 12:15 AM Yuyang Huang <yuyanghuang@google.com> wrote: > > The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but > several commands write outputs back to userspace unconditionally. If the > caller passes a short buffer, this can lead to out-of-bounds writes, > potentially overwriting adjacent userspace memory. The whole thing sounds like a user space bug. Please demonstrate a case where this issue is seen by using libbpf APIs. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size 2026-05-18 3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov @ 2026-05-18 5:07 ` Yuyang Huang 0 siblings, 0 replies; 17+ messages in thread From: Yuyang Huang @ 2026-05-18 5:07 UTC (permalink / raw) To: Alexei Starovoitov Cc: David S. Miller, Alexei Starovoitov, Andrew Lunn, Andrii Nakryiko, Daniel Borkmann, Eduard Zingerman, Eric Dumazet, Jakub Kicinski, Jiri Olsa, John Fastabend, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Nikolay Aleksandrov, Paolo Abeni, Shuah Khan, Simon Horman, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, LKML, open list:KERNEL SELFTEST FRAMEWORK, Network Development >The whole thing sounds like a user space bug. Could you please clarify why you consider this a userspace bug? From our perspective, this is a binary backward compatibility issue. An older binary compiled against older UAPI headers has no way of knowing about newer fields. If it passes size = 40 (the correct size at the time), the kernel violates the ABI contract by writing beyond that declared size. > Please demonstrate a case where this issue is seen by using libbpf APIs. Our old project does not use libbpf; we interface directly with the raw BPF syscall. For example, our Android net_test suite uses this legacy 40-byte Python ctypes struct layout: # legacy 40-byte layout (pre-revision field) BpfAttrProgQuery = cstruct.Struct( "bpf_attr_prog_query", "=IIIIQI4xQ", "target_fd attach_type query_flags attach_flags prog_ids_ptr prog_cnt prog_attach_flags" ) # Invoked via: syscall(__NR_bpf, BPF_PROG_QUERY, &attr, 40) As shown in the original discussion thread, without this fix, a newer kernel unconditionally writes the optional revision field at offset 56. Since copy_to_user doesn't fault on adjacent mapped memory, this silently corrupts 8 bytes of the caller's heap (corrupting the Python interpreter). This size-gating pattern in this patch should be identical to the pattern introduced by commit 47a71c1f9af0 in btf.c. I am not 100% sure but, it seems running a binary statically linked with an older libbpf version (or directly using raw BPF syscalls) might face a similar issue? Do we expect those binaries to be broken in the newer kernel version? On Mon, May 18, 2026 at 12:30 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, May 15, 2026 at 12:15 AM Yuyang Huang <yuyanghuang@google.com> wrote: > > > > The bpf(cmd, attr, size) syscall copies up to 'size' bytes on input, but > > several commands write outputs back to userspace unconditionally. If the > > caller passes a short buffer, this can lead to out-of-bounds writes, > > potentially overwriting adjacent userspace memory. > > The whole thing sounds like a user space bug. > Please demonstrate a case where this issue is seen > by using libbpf APIs. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-05-29 4:44 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-15 7:15 [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Yuyang Huang 2026-05-15 7:15 ` [PATCH bpf-next 1/2] bpf: align syscall writeback behavior with caller-declared size Yuyang Huang 2026-05-15 8:14 ` bot+bpf-ci 2026-05-21 11:40 ` Yuyang Huang 2026-05-24 11:22 ` Alexei Starovoitov 2026-05-25 7:21 ` Yuyang Huang 2026-05-28 5:42 ` Leon Hwang 2026-05-28 13:20 ` Yuyang Huang 2026-05-28 14:36 ` Leon Hwang 2026-05-28 15:08 ` Lorenzo Colitti 2026-05-28 16:01 ` Yuyang Huang 2026-05-28 16:18 ` Yuyang Huang 2026-05-29 1:03 ` Alexei Starovoitov 2026-05-29 4:44 ` Yuyang Huang 2026-05-15 7:15 ` [PATCH bpf-next 2/2] selftests/bpf: Add verification for BPF_PROG_QUERY attr size boundaries Yuyang Huang 2026-05-18 3:29 ` [PATCH bpf-next 0/2] bpf: Align syscall writeback behavior with user-declared size Alexei Starovoitov 2026-05-18 5:07 ` Yuyang Huang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox