Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 1/4] selftests: net: Move some UAPI header inclusions after libc ones
From: Matthieu Baerts @ 2026-01-26 18:13 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: netdev, linux-kernel, linux-api, Arnd Bergmann, linux-kselftest,
	mptcp, linux-security-module, bpf, libc-alpha,
	Carlos O'Donell, Adhemerval Zanella, Rich Felker, klibc,
	Florian Weimer, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S. Miller, Jakub Kicinski, Simon Horman,
	Shuah Khan, Mat Martineau, Geliang Tang, Mickaël Salaün,
	Günther Noack, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa
In-Reply-To: <20260120-uapi-sockaddr-v2-1-63c319111cf6@linutronix.de>

Hi Thomas,

On 20/01/2026 15:10, Thomas Weißschuh wrote:
> Interleaving inclusions of UAPI headers and libc headers is problematic.
> Both sets of headers define conflicting symbols. To enable their
> coexistence a compatibility-mechanism is in place.
> 
> An upcoming change will define 'struct sockaddr' from linux/socket.h.
> However sys/socket.h from libc does not yet handle this case and a
> symbol conflict will arise.
> 
> Furthermore libc-compat.h evaluates the state of the libc
> inclusions only once, at the point it is included first. If another
> problematic header from libc is included later, symbol conflicts arise.
> This will trigger other duplicate definitions when linux/libc-compat.h
> is added to linux/socket.h
> 
> Move the inclusion of UAPI headers after the inclusion of the glibc
> ones, so the libc-compat.h continues to work correctly.

Thank you for looking at this!

Here is my (late, sorry) review for the modifications related to MPTCP:
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_diag.c b/tools/testing/selftests/net/mptcp/mptcp_diag.c
> index 8e0b1b8d84b6..af25ebfd2915 100644
> --- a/tools/testing/selftests/net/mptcp/mptcp_diag.c
> +++ b/tools/testing/selftests/net/mptcp/mptcp_diag.c
> @@ -1,11 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2025, Kylin Software */
>  
> -#include <linux/sock_diag.h>
> -#include <linux/rtnetlink.h>
> -#include <linux/inet_diag.h>
> -#include <linux/netlink.h>
> -#include <linux/compiler.h>
>  #include <sys/socket.h>
>  #include <netinet/in.h>
>  #include <linux/tcp.h>

There is a remaining one (linux/tcp.h) here that you might want to move
below too.

> @@ -17,6 +12,12 @@
>  #include <errno.h>
>  #include <stdio.h>
>  
> +#include <linux/sock_diag.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/inet_diag.h>
> +#include <linux/netlink.h>
> +#include <linux/compiler.h>

Note that I just noticed this is the only file from this directory where
the "includes" are not sorted by type and alphabetical order, see
pm_nl_ctl.c as an example. A bit of a detail, but if you plan to send a
v2, do you mind doing that too here while at it, please?

If not, I can look at that later, but better to avoid doing that in
parallel.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Jeff Layton @ 2026-01-26 16:43 UTC (permalink / raw)
  To: Jan Kara, The 8472
  Cc: Zack Weinberg, Rich Felker, Alejandro Colomar, Vincent Lefevre,
	Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
	GNU libc development
In-Reply-To: <pt7hcmgnzwveyzxdfpxtrmz2bt5tki5wosu3kkboil7bjrolyr@hd4ctkpzzqzi>

On Mon, 2026-01-26 at 16:56 +0100, Jan Kara wrote:
> On Mon 26-01-26 14:53:12, The 8472 wrote:
> > On 26/01/2026 13:15, Jan Kara wrote:
> > > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > > >       [QUERY: Do delayed errors ever happen in any of these situations?
> > > > > > 
> > > > > >          - The fd is not the last reference to the open file description
> > > > > > 
> > > > > >          - The OFD was opened with O_RDONLY
> > > > > > 
> > > > > >          - The OFD was opened with O_RDWR but has never actually
> > > > > >            been written to
> > > > > > 
> > > > > >          - No data has been written to the OFD since the last call to
> > > > > >            fsync() for that OFD
> > > > > > 
> > > > > >          - No data has been written to the OFD since the last call to
> > > > > >            fdatasync() for that OFD
> > > > > > 
> > > > > >          If we can give some guidance about when people don’t need to
> > > > > >          worry about delayed errors, it would be helpful.]
> > > > 
> > > > In particular, I really hope delayed errors *aren’t* ever reported
> > > > when you close a file descriptor that *isn’t* the last reference
> > > > to its open file description, because the thread-safe way to close
> > > > stdout without losing write errors[2] depends on that not happening.
> > > 
> > > So I've checked and in Linux ->flush callback for the file is called
> > > whenever you close a file descriptor (regardless whether there are other
> > > file descriptors pointing to the same file description) so it's upto
> > > filesystem implementation what it decides to do and which error it will
> > > return... Checking the implementations e.g. FUSE and NFS *will* return
> > > delayed writeback errors on *first* descriptor close even if there are
> > > other still open descriptors for the description AFAICS.

...and I really wish they _didn't_.

Reporting a writeback error on close is not particularly useful. Most
filesystems don't require you to write back all data on a close(). A
successful close() on those just means that no error has happened yet.

Any application that cares about writeback errors needs to fsync(),
full stop.

> > Regarding the "first", does that mean the errors only get delivered once?
> 
> I've added Jeff to CC who should be able to provide you with a more
> authoritative answer but AFAIK the answer is yes.
> 
> E.g. NFS does:
> 
> static int
> nfs_file_flush(struct file *file, fl_owner_t id)
> {
> ...
>         /* Flush writes to the server and return any errors */
>         since = filemap_sample_wb_err(file->f_mapping);
>         nfs_wb_all(inode);
>         return filemap_check_wb_err(file->f_mapping, since);
> }
> 
> which will writeback all outstanding data on the first close and report
> error if it happened. Following close has nothing to flush and thus no
> error to report.
> 
> That being said if you call fsync(2) you'll still get the error back again
> because fsync uses a separate writeback error counter in the file
> description. But again only the first fsync(2) will return the error.
> Following fsyncs will report no error.
> 

Note that NFS is "special" in that it will flush data on close() in
order to maintain close-to-open cache consistency.

Technically, what nfs is doing above is sampling the errseq_t in the
mapping, and then writing back any dirty data, and then checking for
errors that happened since the sample. close() will only report
writeback errors that happened within that window. If a preexisting
writeback error occurred before "since" was sampled, then it won't
report that here...which is weird, and another good argument for not
reporting or checking for writeback errors at close().


> > I.e. if a concurrent fork/exec happens for process spawning and the
> > fork-child closes the file descriptors then this closing may basically
> > receive the errors and the parent will not see them (unless additional
> > errors happen)?
> 
> Correct AFAICT.
>

It will see them if it calls fsync(). Reporting on close() is iffy.

> > Or if _any_ part of the program dups the descriptor and then closes it
> > without reporting errors then all uses of those descriptor must consider
> > error delivery on close to be unreliable?
> 
> Correct as well AFAICT.
> 
> I should probably also add that traditional filesystems (classical local
> disk based filesystems) don't bother with reporting delayed errors on
> close(2) *at all*. So unless you call fsync(2) you will never learn there
> was any writeback error. After all for these filesystems there are good
> chances writeback didn't even start by the time you are calling close(2).
> So overall I'd say that error reporting from close(2) is so random and
> filesystem dependent that the errors are not worth paying attention to. If
> you really care about data integrity (and thus writeback errors) you must
> call fsync(2) in which case the kernel provides at least somewhat
> consistent error reporting story. 
> 

+1.

tl;dr: the only useful error from close() is EBADF.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Jan Kara @ 2026-01-26 15:56 UTC (permalink / raw)
  To: The 8472
  Cc: Jan Kara, Zack Weinberg, Rich Felker, Alejandro Colomar,
	Vincent Lefevre, Alexander Viro, Christian Brauner, linux-fsdevel,
	linux-api, GNU libc development, Jeff Layton
In-Reply-To: <c59361e4-ad50-4cdf-888e-3d9a4aa6f69b@infinite-source.de>

On Mon 26-01-26 14:53:12, The 8472 wrote:
> On 26/01/2026 13:15, Jan Kara wrote:
> > On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> > > On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> > > > >       [QUERY: Do delayed errors ever happen in any of these situations?
> > > > > 
> > > > >          - The fd is not the last reference to the open file description
> > > > > 
> > > > >          - The OFD was opened with O_RDONLY
> > > > > 
> > > > >          - The OFD was opened with O_RDWR but has never actually
> > > > >            been written to
> > > > > 
> > > > >          - No data has been written to the OFD since the last call to
> > > > >            fsync() for that OFD
> > > > > 
> > > > >          - No data has been written to the OFD since the last call to
> > > > >            fdatasync() for that OFD
> > > > > 
> > > > >          If we can give some guidance about when people don’t need to
> > > > >          worry about delayed errors, it would be helpful.]
> > > 
> > > In particular, I really hope delayed errors *aren’t* ever reported
> > > when you close a file descriptor that *isn’t* the last reference
> > > to its open file description, because the thread-safe way to close
> > > stdout without losing write errors[2] depends on that not happening.
> > 
> > So I've checked and in Linux ->flush callback for the file is called
> > whenever you close a file descriptor (regardless whether there are other
> > file descriptors pointing to the same file description) so it's upto
> > filesystem implementation what it decides to do and which error it will
> > return... Checking the implementations e.g. FUSE and NFS *will* return
> > delayed writeback errors on *first* descriptor close even if there are
> > other still open descriptors for the description AFAICS.
> Regarding the "first", does that mean the errors only get delivered once?

I've added Jeff to CC who should be able to provide you with a more
authoritative answer but AFAIK the answer is yes.

E.g. NFS does:

static int
nfs_file_flush(struct file *file, fl_owner_t id)
{
...
        /* Flush writes to the server and return any errors */
        since = filemap_sample_wb_err(file->f_mapping);
        nfs_wb_all(inode);
        return filemap_check_wb_err(file->f_mapping, since);
}

which will writeback all outstanding data on the first close and report
error if it happened. Following close has nothing to flush and thus no
error to report.

That being said if you call fsync(2) you'll still get the error back again
because fsync uses a separate writeback error counter in the file
description. But again only the first fsync(2) will return the error.
Following fsyncs will report no error.

> I.e. if a concurrent fork/exec happens for process spawning and the
> fork-child closes the file descriptors then this closing may basically
> receive the errors and the parent will not see them (unless additional
> errors happen)?

Correct AFAICT.

> Or if _any_ part of the program dups the descriptor and then closes it
> without reporting errors then all uses of those descriptor must consider
> error delivery on close to be unreliable?

Correct as well AFAICT.

I should probably also add that traditional filesystems (classical local
disk based filesystems) don't bother with reporting delayed errors on
close(2) *at all*. So unless you call fsync(2) you will never learn there
was any writeback error. After all for these filesystems there are good
chances writeback didn't even start by the time you are calling close(2).
So overall I'd say that error reporting from close(2) is so random and
filesystem dependent that the errors are not worth paying attention to. If
you really care about data integrity (and thus writeback errors) you must
call fsync(2) in which case the kernel provides at least somewhat
consistent error reporting story. 

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* [PATCH bpf-next v8 9/9] selftests/bpf: Add tests to verify map create failure log
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

Add tests to verify that the kernel reports the expected error messages
when map creation fails.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/map_init.c       | 168 ++++++++++++++++++
 1 file changed, 168 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/map_init.c b/tools/testing/selftests/bpf/prog_tests/map_init.c
index 14a31109dd0e..89e6daf2fcfd 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_init.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_init.c
@@ -212,3 +212,171 @@ void test_map_init(void)
 	if (test__start_subtest("pcpu_lru_map_init"))
 		test_pcpu_lru_map_init();
 }
+
+#define BPF_LOG_FIXED	8
+
+static void test_map_create(enum bpf_map_type map_type, const char *map_name,
+			    struct bpf_map_create_opts *opts, const char *exp_msg)
+{
+	const int key_size = 4, value_size = 4, max_entries = 1;
+	char log_buf[128];
+	int fd;
+	LIBBPF_OPTS(bpf_log_opts, log_opts);
+
+	log_buf[0] = '\0';
+	log_opts.log_buf = log_buf;
+	log_opts.log_size = sizeof(log_buf);
+	log_opts.log_level = BPF_LOG_FIXED;
+	opts->log_opts = &log_opts;
+	fd = bpf_map_create(map_type, map_name, key_size, value_size, max_entries, opts);
+	if (!ASSERT_LT(fd, 0, "bpf_map_create")) {
+		close(fd);
+		return;
+	}
+
+	ASSERT_STREQ(log_buf, exp_msg, "log_buf");
+	ASSERT_EQ(log_opts.log_true_size, strlen(exp_msg) + 1, "log_true_size");
+}
+
+static void test_map_create_array(struct bpf_map_create_opts *opts, const char *exp_msg)
+{
+	test_map_create(BPF_MAP_TYPE_ARRAY, "test_map_create", opts, exp_msg);
+}
+
+static void test_invalid_vmlinux_value_type_id_struct_ops(void)
+{
+	const char *msg = "btf_vmlinux_value_type_id can only be used with struct_ops maps.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .btf_vmlinux_value_type_id = 1,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_vmlinux_value_type_id_kv_type_id(void)
+{
+	const char *msg = "btf_vmlinux_value_type_id is mutually exclusive with btf_key_type_id and btf_value_type_id.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .btf_vmlinux_value_type_id = 1,
+		    .btf_key_type_id = 1,
+	);
+
+	test_map_create(BPF_MAP_TYPE_STRUCT_OPS, "test_map_create", &opts, msg);
+}
+
+static void test_invalid_value_type_id(void)
+{
+	const char *msg = "Invalid btf_value_type_id.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .btf_key_type_id = 1,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_map_extra(void)
+{
+	const char *msg = "Invalid map_extra.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .map_extra = 1,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_numa_node(void)
+{
+	const char *msg = "Invalid numa_node.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .map_flags = BPF_F_NUMA_NODE,
+		    .numa_node = 0xFF,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_map_type(void)
+{
+	const char *msg = "Invalid map_type.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts);
+
+	test_map_create(__MAX_BPF_MAP_TYPE, "test_map_create", &opts, msg);
+}
+
+static void test_invalid_token_fd(void)
+{
+	const char *msg = "Invalid map_token_fd.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .map_flags = BPF_F_TOKEN_FD,
+		    .token_fd = 0xFF,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_map_name(void)
+{
+	const char *msg = "Invalid map_name.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts);
+
+	test_map_create(BPF_MAP_TYPE_ARRAY, "test-!@#", &opts, msg);
+}
+
+static void test_invalid_btf_fd(void)
+{
+	const char *msg = "Invalid btf_fd.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .btf_fd = -1,
+		    .btf_key_type_id = 1,
+		    .btf_value_type_id = 1,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+static void test_excl_prog_hash_size_1(void)
+{
+	const char *msg = "Invalid excl_prog_hash_size.\n";
+	const char *hash = "DEADCODE";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .excl_prog_hash = hash,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+static void test_excl_prog_hash_size_2(void)
+{
+	const char *msg = "Invalid excl_prog_hash_size.\n";
+	LIBBPF_OPTS(bpf_map_create_opts, opts,
+		    .excl_prog_hash_size = 1,
+	);
+
+	test_map_create_array(&opts, msg);
+}
+
+void test_map_create_failure(void)
+{
+	if (test__start_subtest("invalid_vmlinux_value_type_id_struct_ops"))
+		test_invalid_vmlinux_value_type_id_struct_ops();
+	if (test__start_subtest("invalid_vmlinux_value_type_id_kv_type_id"))
+		test_invalid_vmlinux_value_type_id_kv_type_id();
+	if (test__start_subtest("invalid_value_type_id"))
+		test_invalid_value_type_id();
+	if (test__start_subtest("invalid_map_extra"))
+		test_invalid_map_extra();
+	if (test__start_subtest("invalid_numa_node"))
+		test_invalid_numa_node();
+	if (test__start_subtest("invalid_map_type"))
+		test_invalid_map_type();
+	if (test__start_subtest("invalid_token_fd"))
+		test_invalid_token_fd();
+	if (test__start_subtest("invalid_map_name"))
+		test_invalid_map_name();
+	if (test__start_subtest("invalid_btf_fd"))
+		test_invalid_btf_fd();
+	if (test__start_subtest("invalid_excl_prog_hash_size_1"))
+		test_excl_prog_hash_size_1();
+	if (test__start_subtest("invalid_excl_prog_hash_size_2"))
+		test_excl_prog_hash_size_2();
+}
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 8/9] libbpf: Add common attr support for map_create
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

With the previous commit adding common attribute support for
BPF_MAP_CREATE, users can now retrieve detailed error messages when map
creation fails via the log_buf field.

Introduce struct bpf_log_opts with the following fields:
log_buf, log_size, log_level, and log_true_size.

Extend bpf_map_create_opts with a new field log_opts, allowing users to
capture and inspect log messages on map creation failures.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/lib/bpf/bpf.c | 16 +++++++++++++++-
 tools/lib/bpf/bpf.h | 17 ++++++++++++++++-
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9d8740761b7a..0c3e40844d80 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -209,6 +209,9 @@ int bpf_map_create(enum bpf_map_type map_type,
 		   const struct bpf_map_create_opts *opts)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, excl_prog_hash_size);
+	const size_t attr_common_sz = sizeof(struct bpf_common_attr);
+	struct bpf_common_attr attr_common;
+	struct bpf_log_opts *log_opts;
 	union bpf_attr attr;
 	int fd;
 
@@ -242,7 +245,18 @@ int bpf_map_create(enum bpf_map_type map_type,
 	attr.excl_prog_hash = ptr_to_u64(OPTS_GET(opts, excl_prog_hash, NULL));
 	attr.excl_prog_hash_size = OPTS_GET(opts, excl_prog_hash_size, 0);
 
-	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
+	log_opts = OPTS_GET(opts, log_opts, NULL);
+	if (log_opts && feat_supported(NULL, FEAT_BPF_SYSCALL_COMMON_ATTRS)) {
+		memset(&attr_common, 0, attr_common_sz);
+		attr_common.log_buf = ptr_to_u64(OPTS_GET(log_opts, log_buf, NULL));
+		attr_common.log_size = OPTS_GET(log_opts, log_size, 0);
+		attr_common.log_level = OPTS_GET(log_opts, log_level, 0);
+		fd = sys_bpf_ext_fd(BPF_MAP_CREATE, &attr, attr_sz, &attr_common, attr_common_sz);
+		OPTS_SET(log_opts, log_true_size, attr_common.log_true_size);
+	} else {
+		fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
+		OPTS_SET(log_opts, log_true_size, 0);
+	}
 	return libbpf_err_errno(fd);
 }
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 2c8e88ddb674..59673f094f86 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -37,6 +37,18 @@ extern "C" {
 
 LIBBPF_API int libbpf_set_memlock_rlim(size_t memlock_bytes);
 
+struct bpf_log_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+
+	char *log_buf;
+	__u32 log_size;
+	__u32 log_level;
+	__u32 log_true_size;
+
+	size_t :0;
+};
+#define bpf_log_opts__last_field log_true_size
+
 struct bpf_map_create_opts {
 	size_t sz; /* size of this struct for forward/backward compatibility */
 
@@ -57,9 +69,12 @@ struct bpf_map_create_opts {
 
 	const void *excl_prog_hash;
 	__u32 excl_prog_hash_size;
+
+	struct bpf_log_opts *log_opts;
+
 	size_t :0;
 };
-#define bpf_map_create_opts__last_field excl_prog_hash_size
+#define bpf_map_create_opts__last_field log_opts
 
 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
 			      const char *map_name,
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 6/9] bpf: Add syscall common attributes support for btf_load
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

Since bpf_log_attr_init() now supports struct bpf_common_attr, pass the
common attributes to it to enable syscall common attributes support for
BPF_BTF_LOAD.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf_verifier.h | 3 ++-
 kernel/bpf/log.c             | 5 +++--
 kernel/bpf/syscall.c         | 8 +++++---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 28e22a03ac84..732bc4baee1c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -642,7 +642,8 @@ struct bpf_log_attr {
 
 int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
 				struct bpf_attrs *attrs_common);
-int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+			       struct bpf_attrs *attrs_common);
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
 
 #define BPF_MAX_SUBPROGS 256
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index f1ed24157d71..3cccb0c5e482 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -902,13 +902,14 @@ int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs
 				 offsetof(union bpf_attr, log_true_size), attrs_common);
 }
 
-int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+			       struct bpf_attrs *attrs_common)
 {
 	const union bpf_attr *attr = attrs->attr;
 
 	return bpf_log_attr_init(log_attr, attrs, attr->btf_log_buf, attr->btf_log_size,
 				 attr->btf_log_level, offsetof(union bpf_attr, btf_log_true_size),
-				 NULL);
+				 attrs_common);
 }
 
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1ed007511776..040b105ab676 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5449,7 +5449,8 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 
 #define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
 
-static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
+static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size,
+			struct bpf_attrs *attrs_common)
 {
 	struct bpf_token *token = NULL;
 	struct bpf_log_attr log_attr;
@@ -5463,7 +5464,7 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_
 		return -EINVAL;
 
 	bpf_attrs_init(&attrs, attr, uattr, uattr_size);
-	err = bpf_btf_load_log_attr_init(&log_attr, &attrs);
+	err = bpf_btf_load_log_attr_init(&log_attr, &attrs, attrs_common);
 	if (err)
 		return err;
 
@@ -6297,7 +6298,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
 		err = bpf_raw_tracepoint_open(&attr);
 		break;
 	case BPF_BTF_LOAD:
-		err = bpf_btf_load(&attr, uattr, size);
+		bpf_attrs_init(&attrs_common, &attr_common, uattr_common, size_common);
+		err = bpf_btf_load(&attr, uattr, size, &attrs_common);
 		break;
 	case BPF_BTF_GET_FD_BY_ID:
 		err = bpf_btf_get_fd_by_id(&attr);
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 7/9] bpf: Add syscall common attributes support for map_create
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

Currently, many BPF_MAP_CREATE failures return -EINVAL without providing
any explanation to userspace.

With extended BPF syscall support, detailed error messages can now be
reported via the log buffer, allowing users to understand the specific
reason for a failed map creation.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf_verifier.h |  2 ++
 kernel/bpf/log.c             | 30 +++++++++++++++++
 kernel/bpf/syscall.c         | 65 ++++++++++++++++++++++++++++++------
 3 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 732bc4baee1c..917293a552b6 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -644,6 +644,8 @@ int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs
 				struct bpf_attrs *attrs_common);
 int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
 			       struct bpf_attrs *attrs_common);
+struct bpf_verifier_log *bpf_log_attr_create_vlog(struct bpf_log_attr *log_attr,
+						  struct bpf_attrs *attrs_common);
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
 
 #define BPF_MAX_SUBPROGS 256
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 3cccb0c5e482..d7933a412c36 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -912,6 +912,36 @@ int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *
 				 attrs_common);
 }
 
+struct bpf_verifier_log *bpf_log_attr_create_vlog(struct bpf_log_attr *log_attr,
+						  struct bpf_attrs *attrs_common)
+{
+	const struct bpf_common_attr *common = attrs_common->attr;
+	struct bpf_verifier_log *log;
+	int err;
+
+	memset(log_attr, 0, sizeof(*log_attr));
+	log_attr->log_buf = common->log_buf;
+	log_attr->log_size = common->log_size;
+	log_attr->log_level = common->log_level;
+	log_attr->attrs_common = attrs_common;
+
+	if (!log_attr->log_buf)
+		return NULL;
+
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log)
+		return ERR_PTR(-ENOMEM);
+
+	err = bpf_vlog_init(log, log_attr->log_level, u64_to_user_ptr(log_attr->log_buf),
+			    log_attr->log_size);
+	if (err) {
+		kfree(log);
+		return ERR_PTR(err);
+	}
+
+	return log;
+}
+
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
 {
 	u32 log_true_size, off;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 040b105ab676..a596a3f22ade 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1370,7 +1370,7 @@ static bool bpf_net_capable(void)
 
 #define BPF_MAP_CREATE_LAST_FIELD excl_prog_hash_size
 /* called via syscall */
-static int map_create(union bpf_attr *attr, bpfptr_t uattr)
+static int __map_create(union bpf_attr *attr, bpfptr_t uattr, struct bpf_verifier_log *log)
 {
 	const struct bpf_map_ops *ops;
 	struct bpf_token *token = NULL;
@@ -1382,8 +1382,10 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 	int err;
 
 	err = CHECK_ATTR(BPF_MAP_CREATE);
-	if (err)
+	if (err) {
+		bpf_log(log, "Invalid attr.\n");
 		return -EINVAL;
+	}
 
 	/* check BPF_F_TOKEN_FD flag, remember if it's set, and then clear it
 	 * to avoid per-map type checks tripping on unknown flag
@@ -1392,17 +1394,25 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 	attr->map_flags &= ~BPF_F_TOKEN_FD;
 
 	if (attr->btf_vmlinux_value_type_id) {
-		if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS ||
-		    attr->btf_key_type_id || attr->btf_value_type_id)
+		if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS) {
+			bpf_log(log, "btf_vmlinux_value_type_id can only be used with struct_ops maps.\n");
 			return -EINVAL;
+		}
+		if (attr->btf_key_type_id || attr->btf_value_type_id) {
+			bpf_log(log, "btf_vmlinux_value_type_id is mutually exclusive with btf_key_type_id and btf_value_type_id.\n");
+			return -EINVAL;
+		}
 	} else if (attr->btf_key_type_id && !attr->btf_value_type_id) {
+		bpf_log(log, "Invalid btf_value_type_id.\n");
 		return -EINVAL;
 	}
 
 	if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
 	    attr->map_type != BPF_MAP_TYPE_ARENA &&
-	    attr->map_extra != 0)
+	    attr->map_extra != 0) {
+		bpf_log(log, "Invalid map_extra.\n");
 		return -EINVAL;
+	}
 
 	f_flags = bpf_get_file_flag(attr->map_flags);
 	if (f_flags < 0)
@@ -1410,13 +1420,17 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 
 	if (numa_node != NUMA_NO_NODE &&
 	    ((unsigned int)numa_node >= nr_node_ids ||
-	     !node_online(numa_node)))
+	     !node_online(numa_node))) {
+		bpf_log(log, "Invalid numa_node.\n");
 		return -EINVAL;
+	}
 
 	/* find map type and init map: hashtable vs rbtree vs bloom vs ... */
 	map_type = attr->map_type;
-	if (map_type >= ARRAY_SIZE(bpf_map_types))
+	if (map_type >= ARRAY_SIZE(bpf_map_types)) {
+		bpf_log(log, "Invalid map_type.\n");
 		return -EINVAL;
+	}
 	map_type = array_index_nospec(map_type, ARRAY_SIZE(bpf_map_types));
 	ops = bpf_map_types[map_type];
 	if (!ops)
@@ -1434,8 +1448,10 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 
 	if (token_flag) {
 		token = bpf_token_get_from_fd(attr->map_token_fd);
-		if (IS_ERR(token))
+		if (IS_ERR(token)) {
+			bpf_log(log, "Invalid map_token_fd.\n");
 			return PTR_ERR(token);
+		}
 
 		/* if current token doesn't grant map creation permissions,
 		 * then we can't use this token, so ignore it and rely on
@@ -1518,8 +1534,10 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 
 	err = bpf_obj_name_cpy(map->name, attr->map_name,
 			       sizeof(attr->map_name));
-	if (err < 0)
+	if (err < 0) {
+		bpf_log(log, "Invalid map_name.\n");
 		goto free_map;
+	}
 
 	preempt_disable();
 	map->cookie = gen_cookie_next(&bpf_map_cookie);
@@ -1542,6 +1560,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 
 		btf = btf_get_by_fd(attr->btf_fd);
 		if (IS_ERR(btf)) {
+			bpf_log(log, "Invalid btf_fd.\n");
 			err = PTR_ERR(btf);
 			goto free_map;
 		}
@@ -1569,6 +1588,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 		bpfptr_t uprog_hash = make_bpfptr(attr->excl_prog_hash, uattr.is_kernel);
 
 		if (attr->excl_prog_hash_size != SHA256_DIGEST_SIZE) {
+			bpf_log(log, "Invalid excl_prog_hash_size.\n");
 			err = -EINVAL;
 			goto free_map;
 		}
@@ -1584,6 +1604,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 			goto free_map;
 		}
 	} else if (attr->excl_prog_hash_size) {
+		bpf_log(log, "Invalid excl_prog_hash_size.\n");
 		err = -EINVAL;
 		goto free_map;
 	}
@@ -1622,6 +1643,29 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
 	return err;
 }
 
+static int map_create(union bpf_attr *attr, bpfptr_t uattr, struct bpf_attrs *attrs_common)
+{
+	struct bpf_verifier_log *log;
+	struct bpf_log_attr log_attr;
+	int err, ret;
+
+	log = bpf_log_attr_create_vlog(&log_attr, attrs_common);
+	if (IS_ERR(log))
+		return PTR_ERR(log);
+
+	err = __map_create(attr, uattr, log);
+	if (err >= 0)
+		goto free;
+
+	ret = bpf_log_attr_finalize(&log_attr, log);
+	if (ret)
+		err = ret;
+
+free:
+	kfree(log);
+	return err;
+}
+
 void bpf_map_inc(struct bpf_map *map)
 {
 	atomic64_inc(&map->refcnt);
@@ -6234,7 +6278,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
 
 	switch (cmd) {
 	case BPF_MAP_CREATE:
-		err = map_create(&attr, uattr);
+		bpf_attrs_init(&attrs_common, &attr_common, uattr_common, size_common);
+		err = map_create(&attr, uattr, &attrs_common);
 		break;
 	case BPF_MAP_LOOKUP_ELEM:
 		err = map_lookup_elem(&attr);
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 5/9] bpf: Refactor reporting btf_log_true_size for btf_load
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

In the next commit, it will be able to report logs via extended common
attributes, which will report 'log_true_size' via the extended common
attributes meanwhile.

Therefore, refactor the way of 'btf_log_true_size' reporting in order to
report 'log_true_size' via the extended common attributes easily.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf_verifier.h |  1 +
 include/linux/btf.h          |  3 ++-
 kernel/bpf/btf.c             | 32 +++++++++-----------------------
 kernel/bpf/log.c             |  9 +++++++++
 kernel/bpf/syscall.c         | 10 +++++++++-
 5 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7eb024e83d2d..28e22a03ac84 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -642,6 +642,7 @@ struct bpf_log_attr {
 
 int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
 				struct bpf_attrs *attrs_common);
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
 
 #define BPF_MAX_SUBPROGS 256
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 48108471c5b1..2812caa6c60e 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -145,7 +145,8 @@ const char *btf_get_name(const struct btf *btf);
 void btf_get(struct btf *btf);
 void btf_put(struct btf *btf);
 const struct btf_header *btf_header(const struct btf *btf);
-int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz);
+struct bpf_log_attr;
+int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_log_attr *log_attr);
 struct btf *btf_get_by_fd(int fd);
 int btf_get_info_by_fd(const struct btf *btf,
 		       const union bpf_attr *attr,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 8959f3bc1e92..3565570601e5 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5856,25 +5856,11 @@ static int btf_check_type_tags(struct btf_verifier_env *env,
 	return 0;
 }
 
-static int finalize_log(struct bpf_verifier_log *log, bpfptr_t uattr, u32 uattr_size)
-{
-	u32 log_true_size;
-	int err;
-
-	err = bpf_vlog_finalize(log, &log_true_size);
-
-	if (uattr_size >= offsetofend(union bpf_attr, btf_log_true_size) &&
-	    copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, btf_log_true_size),
-				  &log_true_size, sizeof(log_true_size)))
-		err = -EFAULT;
-
-	return err;
-}
-
-static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
+static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr,
+			     struct bpf_log_attr *log_attr)
 {
 	bpfptr_t btf_data = make_bpfptr(attr->btf, uattr.is_kernel);
-	char __user *log_ubuf = u64_to_user_ptr(attr->btf_log_buf);
+	char __user *log_ubuf = u64_to_user_ptr(log_attr->log_buf);
 	struct btf_struct_metas *struct_meta_tab;
 	struct btf_verifier_env *env = NULL;
 	struct btf *btf = NULL;
@@ -5891,8 +5877,8 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
 	/* user could have requested verbose verifier output
 	 * and supplied buffer to store the verification trace
 	 */
-	err = bpf_vlog_init(&env->log, attr->btf_log_level,
-			    log_ubuf, attr->btf_log_size);
+	err = bpf_vlog_init(&env->log, log_attr->log_level,
+			    log_ubuf, log_attr->log_size);
 	if (err)
 		goto errout_free;
 
@@ -5953,7 +5939,7 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
 		}
 	}
 
-	err = finalize_log(&env->log, uattr, uattr_size);
+	err = bpf_log_attr_finalize(log_attr, &env->log);
 	if (err)
 		goto errout_free;
 
@@ -5965,7 +5951,7 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
 	btf_free_struct_meta_tab(btf);
 errout:
 	/* overwrite err with -ENOSPC or -EFAULT */
-	ret = finalize_log(&env->log, uattr, uattr_size);
+	ret = bpf_log_attr_finalize(log_attr, &env->log);
 	if (ret)
 		err = ret;
 errout_free:
@@ -8136,12 +8122,12 @@ static int __btf_new_fd(struct btf *btf)
 	return anon_inode_getfd("btf", &btf_fops, btf, O_RDONLY | O_CLOEXEC);
 }
 
-int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
+int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_log_attr *log_attr)
 {
 	struct btf *btf;
 	int ret;
 
-	btf = btf_parse(attr, uattr, uattr_size);
+	btf = btf_parse(attr, uattr, log_attr);
 	if (IS_ERR(btf))
 		return PTR_ERR(btf);
 
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index c0b816e84384..f1ed24157d71 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -902,6 +902,15 @@ int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs
 				 offsetof(union bpf_attr, log_true_size), attrs_common);
 }
 
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+{
+	const union bpf_attr *attr = attrs->attr;
+
+	return bpf_log_attr_init(log_attr, attrs, attr->btf_log_buf, attr->btf_log_size,
+				 attr->btf_log_level, offsetof(union bpf_attr, btf_log_true_size),
+				 NULL);
+}
+
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
 {
 	u32 log_true_size, off;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3d1d1181b9b4..1ed007511776 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5452,6 +5452,9 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
 {
 	struct bpf_token *token = NULL;
+	struct bpf_log_attr log_attr;
+	struct bpf_attrs attrs;
+	int err;
 
 	if (CHECK_ATTR(BPF_BTF_LOAD))
 		return -EINVAL;
@@ -5459,6 +5462,11 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_
 	if (attr->btf_flags & ~BPF_F_TOKEN_FD)
 		return -EINVAL;
 
+	bpf_attrs_init(&attrs, attr, uattr, uattr_size);
+	err = bpf_btf_load_log_attr_init(&log_attr, &attrs);
+	if (err)
+		return err;
+
 	if (attr->btf_flags & BPF_F_TOKEN_FD) {
 		token = bpf_token_get_from_fd(attr->btf_token_fd);
 		if (IS_ERR(token))
@@ -5476,7 +5484,7 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_
 
 	bpf_token_put(token);
 
-	return btf_new_fd(attr, uattr, uattr_size);
+	return btf_new_fd(attr, uattr, &log_attr);
 }
 
 #define BPF_BTF_GET_FD_BY_ID_LAST_FIELD fd_by_id_token_fd
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 4/9] bpf: Add syscall common attributes support for prog_load
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

The log buffer of common attributes would be confusing with the one in
'union bpf_attr' for BPF_PROG_LOAD.

In order to clarify the usage of these two log buffers, they both can be
used for logging if:

* They are same, including 'log_buf', 'log_level' and 'log_size'.
* One of them is missing, then another one will be used for logging.

If they both have 'log_buf' but they are not same totally, return -EINVAL.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf_verifier.h |  4 +++-
 kernel/bpf/log.c             | 29 ++++++++++++++++++++++++++---
 kernel/bpf/syscall.c         |  9 ++++++---
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 4a0c5ef296b9..7eb024e83d2d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -637,9 +637,11 @@ struct bpf_log_attr {
 	u32 log_level;
 	struct bpf_attrs *attrs;
 	u32 offsetof_log_true_size;
+	struct bpf_attrs *attrs_common;
 };
 
-int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+				struct bpf_attrs *attrs_common);
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
 
 #define BPF_MAX_SUBPROGS 256
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 457b724c4176..c0b816e84384 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -865,23 +865,41 @@ void print_insn_state(struct bpf_verifier_env *env, const struct bpf_verifier_st
 }
 
 static int bpf_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs, u64 log_buf,
-			     u32 log_size, u32 log_level, int offsetof_log_true_size)
+			     u32 log_size, u32 log_level, int offsetof_log_true_size,
+			     struct bpf_attrs *attrs_common)
 {
+	const struct bpf_common_attr *common = attrs_common ? attrs_common->attr : NULL;
+
 	memset(log_attr, 0, sizeof(*log_attr));
 	log_attr->log_buf = log_buf;
 	log_attr->log_size = log_size;
 	log_attr->log_level = log_level;
 	log_attr->attrs = attrs;
 	log_attr->offsetof_log_true_size = offsetof_log_true_size;
+	log_attr->attrs_common = attrs_common;
+
+	if (log_buf && common && common->log_buf &&
+		(log_buf != common->log_buf ||
+		 log_size != common->log_size ||
+		 log_level != common->log_level))
+		return -EINVAL;
+
+	if (!log_buf && common && common->log_buf) {
+		log_attr->log_buf = common->log_buf;
+		log_attr->log_size = common->log_size;
+		log_attr->log_level = common->log_level;
+	}
+
 	return 0;
 }
 
-int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+				struct bpf_attrs *attrs_common)
 {
 	const union bpf_attr *attr = attrs->attr;
 
 	return bpf_log_attr_init(log_attr, attrs, attr->log_buf, attr->log_size, attr->log_level,
-				 offsetof(union bpf_attr, log_true_size));
+				 offsetof(union bpf_attr, log_true_size), attrs_common);
 }
 
 int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
@@ -901,5 +919,10 @@ int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log
 	    copy_to_bpfptr_offset(log_attr->attrs->uattr, off, &log_true_size, size))
 		err = -EFAULT;
 
+	off = offsetof(struct bpf_common_attr, log_true_size);
+	if (log_attr->attrs_common && log_attr->attrs_common->size >= off + size &&
+	    copy_to_bpfptr_offset(log_attr->attrs_common->uattr, off, &log_true_size, size))
+		err = -EFAULT;
+
 	return err;
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d422664e00dd..3d1d1181b9b4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2865,7 +2865,8 @@ static int bpf_prog_mark_insn_arrays_ready(struct bpf_prog *prog)
 /* last field in 'union bpf_attr' used by this command */
 #define BPF_PROG_LOAD_LAST_FIELD keyring_id
 
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
+static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size,
+			 struct bpf_attrs *attrs_common)
 {
 	enum bpf_prog_type type = attr->prog_type;
 	struct bpf_prog *prog, *dst_prog = NULL;
@@ -3085,7 +3086,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 		goto free_prog_sec;
 
 	bpf_attrs_init(&attrs, attr, uattr, uattr_size);
-	err = bpf_prog_load_log_attr_init(&log_attr, &attrs);
+	err = bpf_prog_load_log_attr_init(&log_attr, &attrs, attrs_common);
 	if (err < 0)
 		goto free_used_maps;
 
@@ -6190,6 +6191,7 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
 		     bpfptr_t uattr_common, unsigned int size_common)
 {
 	struct bpf_common_attr attr_common;
+	struct bpf_attrs attrs_common;
 	union bpf_attr attr;
 	int err;
 
@@ -6241,7 +6243,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
 		err = map_freeze(&attr);
 		break;
 	case BPF_PROG_LOAD:
-		err = bpf_prog_load(&attr, uattr, size);
+		bpf_attrs_init(&attrs_common, &attr_common, uattr_common, size_common);
+		err = bpf_prog_load(&attr, uattr, size, &attrs_common);
 		break;
 	case BPF_OBJ_PIN:
 		err = bpf_obj_pin(&attr);
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 3/9] bpf: Refactor reporting log_true_size for prog_load
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

The next commit will add support for reporting logs via extended common
attributes, including 'log_true_size'.

To prepare for that, refactor the 'log_true_size' reporting logic by
introducing a new struct bpf_log_attr to encapsulate log-related behavior:

 * bpf_prog_load_log_attr_init(): initialize the log fields, which will
   support extended common attributes in the next commit.
 * bpf_log_attr_finalize(): handle log finalization and write back
   'log_true_size' to userspace.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h          | 19 ++++++++++++++++-
 include/linux/bpf_verifier.h | 11 ++++++++++
 kernel/bpf/log.c             | 40 ++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c         |  9 +++++++-
 kernel/bpf/verifier.c        | 19 ++++++-----------
 5 files changed, 83 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4427c6e98331..1946f35b44fb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2903,8 +2903,25 @@ int bpf_get_file_flag(int flags);
 int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size,
 			     size_t actual_size);
 
+struct bpf_attrs {
+	const void *attr;
+	bpfptr_t uattr;
+	u32 size;
+};
+
+static inline void bpf_attrs_init(struct bpf_attrs *attrs, const void *attr, bpfptr_t uattr,
+				  u32 size)
+{
+	memset(attrs, 0, sizeof(*attrs));
+	attrs->attr = attr;
+	attrs->uattr = uattr;
+	attrs->size = size;
+}
+
 /* verify correctness of eBPF program */
-int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size);
+struct bpf_log_attr;
+int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr,
+	      struct bpf_log_attr *log_attr);
 
 #ifndef CONFIG_BPF_JIT_ALWAYS_ON
 void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 8355b585cd18..4a0c5ef296b9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -631,6 +631,17 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 	return log && log->level;
 }
 
+struct bpf_log_attr {
+	u64 log_buf;
+	u32 log_size;
+	u32 log_level;
+	struct bpf_attrs *attrs;
+	u32 offsetof_log_true_size;
+};
+
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
+int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
+
 #define BPF_MAX_SUBPROGS 256
 
 struct bpf_subprog_arg_info {
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index a0c3b35de2ce..457b724c4176 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -863,3 +863,43 @@ void print_insn_state(struct bpf_verifier_env *env, const struct bpf_verifier_st
 	}
 	print_verifier_state(env, vstate, frameno, false);
 }
+
+static int bpf_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs, u64 log_buf,
+			     u32 log_size, u32 log_level, int offsetof_log_true_size)
+{
+	memset(log_attr, 0, sizeof(*log_attr));
+	log_attr->log_buf = log_buf;
+	log_attr->log_size = log_size;
+	log_attr->log_level = log_level;
+	log_attr->attrs = attrs;
+	log_attr->offsetof_log_true_size = offsetof_log_true_size;
+	return 0;
+}
+
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+{
+	const union bpf_attr *attr = attrs->attr;
+
+	return bpf_log_attr_init(log_attr, attrs, attr->log_buf, attr->log_size, attr->log_level,
+				 offsetof(union bpf_attr, log_true_size));
+}
+
+int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
+{
+	u32 log_true_size, off;
+	size_t size;
+	int err;
+
+	if (!log)
+		return 0;
+
+	err = bpf_vlog_finalize(log, &log_true_size);
+
+	size = sizeof(log_true_size);
+	off = log_attr->offsetof_log_true_size;
+	if (log_attr->attrs && log_attr->attrs->size >= off + size &&
+	    copy_to_bpfptr_offset(log_attr->attrs->uattr, off, &log_true_size, size))
+		err = -EFAULT;
+
+	return err;
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 69bfcffb4389..d422664e00dd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2871,6 +2871,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	struct bpf_prog *prog, *dst_prog = NULL;
 	struct btf *attach_btf = NULL;
 	struct bpf_token *token = NULL;
+	struct bpf_log_attr log_attr;
+	struct bpf_attrs attrs;
 	bool bpf_cap;
 	int err;
 	char license[128];
@@ -3082,8 +3084,13 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (err)
 		goto free_prog_sec;
 
+	bpf_attrs_init(&attrs, attr, uattr, uattr_size);
+	err = bpf_prog_load_log_attr_init(&log_attr, &attrs);
+	if (err < 0)
+		goto free_used_maps;
+
 	/* run eBPF verifier */
-	err = bpf_check(&prog, attr, uattr, uattr_size);
+	err = bpf_check(&prog, attr, uattr, &log_attr);
 	if (err < 0)
 		goto free_used_maps;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c2f2650db9fd..134871f46afb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -25643,12 +25643,12 @@ static int compute_scc(struct bpf_verifier_env *env)
 	return err;
 }
 
-int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
+int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr,
+	      struct bpf_log_attr *log_attr)
 {
 	u64 start_time = ktime_get_ns();
 	struct bpf_verifier_env *env;
 	int i, len, ret = -EINVAL, err;
-	u32 log_true_size;
 	bool is_priv;
 
 	BTF_TYPE_EMIT(enum bpf_features);
@@ -25695,9 +25695,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	/* user could have requested verbose verifier output
 	 * and supplied buffer to store the verification trace
 	 */
-	ret = bpf_vlog_init(&env->log, attr->log_level,
-			    (char __user *) (unsigned long) attr->log_buf,
-			    attr->log_size);
+	ret = bpf_vlog_init(&env->log, log_attr->log_level,
+			    u64_to_user_ptr(log_attr->log_buf),
+			    log_attr->log_size);
 	if (ret)
 		goto err_unlock;
 
@@ -25847,17 +25847,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	env->prog->aux->verified_insns = env->insn_processed;
 
 	/* preserve original error even if log finalization is successful */
-	err = bpf_vlog_finalize(&env->log, &log_true_size);
+	err = bpf_log_attr_finalize(log_attr, &env->log);
 	if (err)
 		ret = err;
 
-	if (uattr_size >= offsetofend(union bpf_attr, log_true_size) &&
-	    copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, log_true_size),
-				  &log_true_size, sizeof(log_true_size))) {
-		ret = -EFAULT;
-		goto err_release_maps;
-	}
-
 	if (ret)
 		goto err_release_maps;
 
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 2/9] libbpf: Add support for extended bpf syscall
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

To support the extended BPF syscall introduced in the previous commit,
introduce the following internal APIs:

* 'sys_bpf_ext()'
* 'sys_bpf_ext_fd()'
  They wrap the raw 'syscall()' interface to support passing extended
  attributes.
* 'probe_sys_bpf_ext()'
  Check whether current kernel supports the BPF syscall common attributes.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 tools/lib/bpf/bpf.c             | 36 +++++++++++++++++++++++++++++++++
 tools/lib/bpf/features.c        |  8 ++++++++
 tools/lib/bpf/libbpf_internal.h |  3 +++
 3 files changed, 47 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5846de364209..9d8740761b7a 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -69,6 +69,42 @@ static inline __u64 ptr_to_u64(const void *ptr)
 	return (__u64) (unsigned long) ptr;
 }
 
+static inline int sys_bpf_ext(enum bpf_cmd cmd, union bpf_attr *attr,
+			      unsigned int size,
+			      struct bpf_common_attr *attr_common,
+			      unsigned int size_common)
+{
+	cmd = attr_common ? (cmd | BPF_COMMON_ATTRS) : (cmd & ~BPF_COMMON_ATTRS);
+	return syscall(__NR_bpf, cmd, attr, size, attr_common, size_common);
+}
+
+static inline int sys_bpf_ext_fd(enum bpf_cmd cmd, union bpf_attr *attr,
+				 unsigned int size,
+				 struct bpf_common_attr *attr_common,
+				 unsigned int size_common)
+{
+	int fd;
+
+	fd = sys_bpf_ext(cmd, attr, size, attr_common, size_common);
+	return ensure_good_fd(fd);
+}
+
+int probe_sys_bpf_ext(void)
+{
+	const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
+	union bpf_attr attr;
+	int fd;
+
+	memset(&attr, 0, attr_sz);
+	fd = syscall(__NR_bpf, BPF_PROG_LOAD | BPF_COMMON_ATTRS, &attr, attr_sz, NULL,
+		     sizeof(struct bpf_common_attr));
+	if (fd >= 0) {
+		close(fd);
+		return -EINVAL;
+	}
+	return errno == EFAULT ? 1 : 0;
+}
+
 static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
 			  unsigned int size)
 {
diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
index b842b83e2480..e0d646a9e233 100644
--- a/tools/lib/bpf/features.c
+++ b/tools/lib/bpf/features.c
@@ -506,6 +506,11 @@ static int probe_kern_arg_ctx_tag(int token_fd)
 	return probe_fd(prog_fd);
 }
 
+static int probe_bpf_syscall_common_attrs(int token_fd)
+{
+	return probe_sys_bpf_ext();
+}
+
 typedef int (*feature_probe_fn)(int /* token_fd */);
 
 static struct kern_feature_cache feature_cache;
@@ -581,6 +586,9 @@ static struct kern_feature_desc {
 	[FEAT_BTF_QMARK_DATASEC] = {
 		"BTF DATASEC names starting from '?'", probe_kern_btf_qmark_datasec,
 	},
+	[FEAT_BPF_SYSCALL_COMMON_ATTRS] = {
+		"BPF syscall common attributes support", probe_bpf_syscall_common_attrs,
+	},
 };
 
 bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index fc59b21b51b5..aa16be869c4f 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -392,6 +392,8 @@ enum kern_feature_id {
 	FEAT_ARG_CTX_TAG,
 	/* Kernel supports '?' at the front of datasec names */
 	FEAT_BTF_QMARK_DATASEC,
+	/* Kernel supports BPF syscall common attributes */
+	FEAT_BPF_SYSCALL_COMMON_ATTRS,
 	__FEAT_CNT,
 };
 
@@ -757,4 +759,5 @@ int probe_fd(int fd);
 #define SHA256_DWORD_SIZE SHA256_DIGEST_LENGTH / sizeof(__u64)
 
 void libbpf_sha256(const void *data, size_t len, __u8 out[SHA256_DIGEST_LENGTH]);
+int probe_sys_bpf_ext(void);
 #endif /* __LIBBPF_LIBBPF_INTERNAL_H */
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 1/9] bpf: Extend BPF syscall with common attributes support
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot
In-Reply-To: <20260126151409.52072-1-leon.hwang@linux.dev>

Extend the BPF syscall to support a set of common attributes shared
across all BPF commands:

1. 'log_buf': User-provided buffer for storing logs.
2. 'log_size': Size of the log buffer.
3. 'log_level': Log verbosity level.
4. 'log_true_size': The size of log reported by kernel.

These common attributes are passed as the 4th argument to the BPF
syscall, with the 5th argument specifying the size of this structure.

To indicate the use of these common attributes from userspace, a new flag
'BPF_COMMON_ATTRS' ('1 << 16') is introduced. This flag is OR-ed into the
'cmd' field of the syscall.

When 'cmd & BPF_COMMON_ATTRS' is set, the kernel will copy the common
attributes from userspace into kernel space for use.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/syscalls.h       |  3 ++-
 include/uapi/linux/bpf.h       |  8 ++++++++
 kernel/bpf/syscall.c           | 25 +++++++++++++++++++++----
 tools/include/uapi/linux/bpf.h |  8 ++++++++
 4 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index cf84d98964b2..729659202d77 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -937,7 +937,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
 asmlinkage long sys_getrandom(char __user *buf, size_t count,
 			      unsigned int flags);
 asmlinkage long sys_memfd_create(const char __user *uname_ptr, unsigned int flags);
-asmlinkage long sys_bpf(int cmd, union bpf_attr __user *attr, unsigned int size);
+asmlinkage long sys_bpf(int cmd, union bpf_attr __user *attr, unsigned int size,
+			struct bpf_common_attr __user *attr_common, unsigned int size_common);
 asmlinkage long sys_execveat(int dfd, const char __user *filename,
 			const char __user *const __user *argv,
 			const char __user *const __user *envp, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 44e7dbc278e3..656757e7a4fb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -986,6 +986,7 @@ enum bpf_cmd {
 	BPF_PROG_STREAM_READ_BY_FD,
 	BPF_PROG_ASSOC_STRUCT_OPS,
 	__MAX_BPF_CMD,
+	BPF_COMMON_ATTRS = 1 << 16, /* Indicate carrying syscall common attrs. */
 };
 
 enum bpf_map_type {
@@ -1492,6 +1493,13 @@ struct bpf_stack_build_id {
 	};
 };
 
+struct bpf_common_attr {
+	__u64 log_buf;
+	__u32 log_size;
+	__u32 log_level;
+	__u32 log_true_size;
+};
+
 #define BPF_OBJ_NAME_LEN 16U
 
 enum {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b9184545c3fd..69bfcffb4389 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -6179,8 +6179,10 @@ static int prog_assoc_struct_ops(union bpf_attr *attr)
 	return ret;
 }
 
-static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
+static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
+		     bpfptr_t uattr_common, unsigned int size_common)
 {
+	struct bpf_common_attr attr_common;
 	union bpf_attr attr;
 	int err;
 
@@ -6194,6 +6196,20 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
 	if (copy_from_bpfptr(&attr, uattr, size) != 0)
 		return -EFAULT;
 
+	memset(&attr_common, 0, sizeof(attr_common));
+	if (cmd & BPF_COMMON_ATTRS) {
+		err = bpf_check_uarg_tail_zero(uattr_common, sizeof(attr_common), size_common);
+		if (err)
+			return err;
+
+		cmd &= ~BPF_COMMON_ATTRS;
+		size_common = min_t(u32, size_common, sizeof(attr_common));
+		if (copy_from_bpfptr(&attr_common, uattr_common, size_common) != 0)
+			return -EFAULT;
+	} else {
+		size_common = 0;
+	}
+
 	err = security_bpf(cmd, &attr, size, uattr.is_kernel);
 	if (err < 0)
 		return err;
@@ -6329,9 +6345,10 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
 	return err;
 }
 
-SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
+SYSCALL_DEFINE5(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size,
+		struct bpf_common_attr __user *, uattr_common, unsigned int, size_common)
 {
-	return __sys_bpf(cmd, USER_BPFPTR(uattr), size);
+	return __sys_bpf(cmd, USER_BPFPTR(uattr), size, USER_BPFPTR(uattr_common), size_common);
 }
 
 static bool syscall_prog_is_valid_access(int off, int size,
@@ -6362,7 +6379,7 @@ BPF_CALL_3(bpf_sys_bpf, int, cmd, union bpf_attr *, attr, u32, attr_size)
 	default:
 		return -EINVAL;
 	}
-	return __sys_bpf(cmd, KERNEL_BPFPTR(attr), attr_size);
+	return __sys_bpf(cmd, KERNEL_BPFPTR(attr), attr_size, KERNEL_BPFPTR(NULL), 0);
 }
 
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3ca7d76e05f0..39022e07f6fd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -986,6 +986,7 @@ enum bpf_cmd {
 	BPF_PROG_STREAM_READ_BY_FD,
 	BPF_PROG_ASSOC_STRUCT_OPS,
 	__MAX_BPF_CMD,
+	BPF_COMMON_ATTRS = 1 << 16, /* Indicate carrying syscall common attrs. */
 };
 
 enum bpf_map_type {
@@ -1492,6 +1493,13 @@ struct bpf_stack_build_id {
 	};
 };
 
+struct bpf_common_attr {
+	__u64 log_buf;
+	__u32 log_size;
+	__u32 log_level;
+	__u32 log_true_size;
+};
+
 #define BPF_OBJ_NAME_LEN 16U
 
 enum {
-- 
2.52.0


^ permalink raw reply related

* [PATCH bpf-next v8 0/9] bpf: Extend BPF syscall with common attributes support
From: Leon Hwang @ 2026-01-26 15:14 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
	Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
	Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
	linux-kselftest, kernel-patches-bot

This patch series builds upon the discussion in
"[PATCH bpf-next v4 0/4] bpf: Improve error reporting for freplace attachment failure" [1].

This patch series introduces support for *common attributes* in the BPF
syscall, providing a unified mechanism for passing shared metadata across
all BPF commands.

The initial set of common attributes includes:

1. 'log_buf': User-provided buffer for storing log output.
2. 'log_size': Size of the provided log buffer.
3. 'log_level': Verbosity level for logging.
4. 'log_true_size': The size of log reported by kernel.

With this extension, the BPF syscall will be able to return meaningful
error messages (e.g., failures of creating map), improving debuggability
and user experience.

Links:
[1] https://lore.kernel.org/bpf/20250224153352.64689-1-leon.hwang@linux.dev/

Changes:
v7 -> v8:
* Return 0 when fd < 0 and errno != EFAULT in probe_sys_bpf_ext(), then simplify
  probe_bpf_syscall_common_attrs() (per Alexei and Andrii).
* v7: https://lore.kernel.org/bpf/20260123032445.125259-1-leon.hwang@linux.dev/

v6 -> v7:
* Return -errno when fd < 0 and errno != EFAULT in probe_sys_bpf_ext().
* Convert return value of probe_sys_bpf_ext() to bool in
  probe_bpf_syscall_common_attrs().
* Address comments from Andrii:
  * Drop the comment, and handle fd >= 0 case explicitly in
    probe_sys_bpf_ext().
  * Return an error when fd >= 0 in probe_sys_bpf_ext().
* v6: https://lore.kernel.org/bpf/20260120152424.40766-1-leon.hwang@linux.dev/

v5 -> v6:
* Address comments from Andrii:
  * Update some variables' name.
  * Drop unnecessary 'close(fd)' in libbpf.
  * Rename FEAT_EXTENDED_SYSCALL to FEAT_BPF_SYSCALL_COMMON_ATTRS with
    updated description in libbpf.
  * Use EINVAL instead of EUSERS, as EUSERS is not used in bpf yet.
  * Rename struct bpf_syscall_common_attr_opts to bpf_log_opts in libbpf.
  * Add 'OPTS_SET(log_opts, log_true_size, 0);' in libbpf's 'bpf_map_create()'.
* v5: https://lore.kernel.org/bpf/20260112145616.44195-1-leon.hwang@linux.dev/

v4 -> v5:
* Rework reporting 'log_true_size' for prog_load, btf_load, and map_create
  (per Alexei).
* v4: https://lore.kernel.org/bpf/20260106172018.57757-1-leon.hwang@linux.dev/

RFC v3 -> v4:
* Drop RFC.
* Address comments from Andrii:
  * Add parentheses in 'sys_bpf_ext()'.
  * Avoid creating new fd in 'probe_sys_bpf_ext()'.
  * Add a new struct to wrap log fields in libbpf.
* Address comments from Alexei:
  * Do not skip writing to user space when log_true_size is zero.
  * Do not use 'bool' arguments.
  * Drop the adding WARN_ON_ONCE()'s.
* v3: https://lore.kernel.org/bpf/20251002154841.99348-1-leon.hwang@linux.dev/

RFC v2 -> RFC v3:
* Rename probe_sys_bpf_extended to probe_sys_bpf_ext.
* Refactor reporting 'log_true_size' for prog_load.
* Refactor reporting 'btf_log_true_size' for btf_load.
* Add warnings for internal bugs in map_create.
* Check log_true_size in test cases.
* Address comment from Alexei:
  * Change kvzalloc/kvfree to kzalloc/kfree.
* Address comments from Andrii:
  * Move BPF_COMMON_ATTRS to 'enum bpf_cmd' alongside brief comment.
  * Add bpf_check_uarg_tail_zero() for extra checks.
  * Rename sys_bpf_extended to sys_bpf_ext.
  * Rename sys_bpf_fd_extended to sys_bpf_ext_fd.
  * Probe the new feature using NULL and -EFAULT.
  * Move probe_sys_bpf_ext to libbpf_internal.h and drop LIBBPF_API.
  * Return -EUSERS when log attrs are conflict between bpf_attr and
    bpf_common_attr.
  * Avoid touching bpf_vlog_init().
  * Update the reason messages in map_create.
  * Finalize the log using __cleanup().
  * Report log size to users.
  * Change type of log_buf from '__u64' to 'const char *' and cast type
    using ptr_to_u64() in bpf_map_create().
  * Do not return -EOPNOTSUPP when kernel doesn't support this feature
    in bpf_map_create().
  * Add log_level support for map creation for consistency.
* Address comment from Eduard:
  * Use common_attrs->log_level instead of BPF_LOG_FIXED.
* v2: https://lore.kernel.org/bpf/20250911163328.93490-1-leon.hwang@linux.dev/

RFC v1 -> RFC v2:
* Fix build error reported by test bot.
* Address comments from Alexei:
  * Drop new uapi for freplace.
  * Add common attributes support for prog_load and btf_load.
  * Add common attributes support for map_create.
* v1: https://lore.kernel.org/bpf/20250728142346.95681-1-leon.hwang@linux.dev/

Leon Hwang (9):
  bpf: Extend BPF syscall with common attributes support
  libbpf: Add support for extended bpf syscall
  bpf: Refactor reporting log_true_size for prog_load
  bpf: Add syscall common attributes support for prog_load
  bpf: Refactor reporting btf_log_true_size for btf_load
  bpf: Add syscall common attributes support for btf_load
  bpf: Add syscall common attributes support for map_create
  libbpf: Add common attr support for map_create
  selftests/bpf: Add tests to verify map create failure log

 include/linux/bpf.h                           |  19 +-
 include/linux/bpf_verifier.h                  |  17 ++
 include/linux/btf.h                           |   3 +-
 include/linux/syscalls.h                      |   3 +-
 include/uapi/linux/bpf.h                      |   8 +
 kernel/bpf/btf.c                              |  32 +---
 kernel/bpf/log.c                              | 103 +++++++++++
 kernel/bpf/syscall.c                          | 122 ++++++++++---
 kernel/bpf/verifier.c                         |  19 +-
 tools/include/uapi/linux/bpf.h                |   8 +
 tools/lib/bpf/bpf.c                           |  52 +++++-
 tools/lib/bpf/bpf.h                           |  17 +-
 tools/lib/bpf/features.c                      |   8 +
 tools/lib/bpf/libbpf_internal.h               |   3 +
 .../selftests/bpf/prog_tests/map_init.c       | 168 ++++++++++++++++++
 15 files changed, 521 insertions(+), 61 deletions(-)

--
2.52.0


^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: The 8472 @ 2026-01-26 13:53 UTC (permalink / raw)
  To: Jan Kara, Zack Weinberg
  Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
	Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
	GNU libc development
In-Reply-To: <whaocgx6bopndbpag2wazn2ko4skxl4pe6owbavj3wblxjps4s@ntdfvzwggxv3>

On 26/01/2026 13:15, Jan Kara wrote:
> On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
>> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
>>
>>>>       [QUERY: Do delayed errors ever happen in any of these situations?
>>>>
>>>>          - The fd is not the last reference to the open file description
>>>>
>>>>          - The OFD was opened with O_RDONLY
>>>>
>>>>          - The OFD was opened with O_RDWR but has never actually
>>>>            been written to
>>>>
>>>>          - No data has been written to the OFD since the last call to
>>>>            fsync() for that OFD
>>>>
>>>>          - No data has been written to the OFD since the last call to
>>>>            fdatasync() for that OFD
>>>>
>>>>          If we can give some guidance about when people don’t need to
>>>>          worry about delayed errors, it would be helpful.]
>>
>> In particular, I really hope delayed errors *aren’t* ever reported
>> when you close a file descriptor that *isn’t* the last reference
>> to its open file description, because the thread-safe way to close
>> stdout without losing write errors[2] depends on that not happening.
> 
> So I've checked and in Linux ->flush callback for the file is called
> whenever you close a file descriptor (regardless whether there are other
> file descriptors pointing to the same file description) so it's upto
> filesystem implementation what it decides to do and which error it will
> return... Checking the implementations e.g. FUSE and NFS *will* return
> delayed writeback errors on *first* descriptor close even if there are
> other still open descriptors for the description AFAICS.
Regarding the "first", does that mean the errors only get delivered once?
I.e. if a concurrent fork/exec happens for process spawning and the fork-child
closes the file descriptors then this closing may basically receive the errors
and the parent will not see them (unless additional errors happen)?
Or if _any_ part of the program dups the descriptor and then closes it without
reporting errors then all uses of those descriptor must consider error delivery
on close to be unreliable?

^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Jan Kara @ 2026-01-26 12:15 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
	Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
	linux-api, GNU libc development
In-Reply-To: <de07d292-99d8-44e8-b7d6-c491ac5fe5be@app.fastmail.com>

On Sun 25-01-26 10:37:01, Zack Weinberg wrote:
> On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> >>     Delayed errors reported by close()
> >>
> >>         In a variety of situations, most notably when writing to a file
> >>         that is hosted on a network file server, write(2) operations may
> >>         “optimistically” return successfully as soon as the write has
> >>         been queued for processing.
> >>
> >>         close(2) waits for confirmation that *most* of the processing
> >>         for previous writes to a file has been completed, and reports
> >>         any errors that the earlier write() calls *would have* reported,
> >>         if they hadn’t returned optimistically.  Especially, close()
> >>         will report “disk full” (ENOSPC) and “disk quota exceeded”
> >>         (EDQUOT) errors that write() didn’t wait for.
> >
> > The Rust standard library team is also interested in this topic, there
> > is lively discussion[1] whether it makes sense to surface errors from
> > close at all. Our current default is to ignore them.
> > It is my understanding that errors may not have happened yet at
> > the time of close due to delayed writeback or additional descriptors
> > pointing to the description, e.g. in a forked child, and thus
> > close() is not a reliable mechanism for error detection and
> > fsync() is the only available option.
> >
> > [1] https://github.com/rust-lang/libs-team/issues/705
> 
> This is something I care about a lot as well, but I currently don’t
> have an *opinion*.  To form an informed opinion, I need the answers
> to these questions:
> 
> >>      [QUERY: Do delayed errors ever happen in any of these situations?
> >>
> >>         - The fd is not the last reference to the open file description
> >>
> >>         - The OFD was opened with O_RDONLY
> >>
> >>         - The OFD was opened with O_RDWR but has never actually
> >>           been written to
> >>
> >>         - No data has been written to the OFD since the last call to
> >>           fsync() for that OFD
> >>
> >>         - No data has been written to the OFD since the last call to
> >>           fdatasync() for that OFD
> >>
> >>         If we can give some guidance about when people don’t need to
> >>         worry about delayed errors, it would be helpful.]
> 
> In particular, I really hope delayed errors *aren’t* ever reported
> when you close a file descriptor that *isn’t* the last reference
> to its open file description, because the thread-safe way to close
> stdout without losing write errors[2] depends on that not happening.

So I've checked and in Linux ->flush callback for the file is called
whenever you close a file descriptor (regardless whether there are other
file descriptors pointing to the same file description) so it's upto
filesystem implementation what it decides to do and which error it will
return... Checking the implementations e.g. FUSE and NFS *will* return
delayed writeback errors on *first* descriptor close even if there are
other still open descriptors for the description AFAICS.

> And whether the Rust stdlib can legitimately say “leaving aside the
> additional cost of calling fsync(), you do not *need* the error return
> from close() because you can call fsync() first,” depends on whether
> it’s actually true that you *won’t* ever get a delayed error from
> close() if you called fsync() first and didn’t do any more output in
> between (assume the fd has no duplicates here).  I would not be
> surprised at all if those FUSE guys insisted on their right to make
> 
>     char msg[] = "soon I will be invincible\n";
>     int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666);
>     write(fd, msg, sizeof(msg) - 1);
>     fsync(fd);
>     close(fd);
> 
> return an error *only* from the close, not the write or the fsync.

So fsync(2) must make sure data is persistently stored and return error if
it was not. Thus as a VFS person I'd consider it a filesystem bug if an
error preveting reading data later was not returned from fsync(2). OTOH
that doesn't necessarily mean that later close doesn't return an error -
e.g. FUSE does communicate with the server on close that can fail and
error can be returned.

With this in mind let me now try to answer your remaining questions:

> >>         - The OFD was opened with O_RDONLY

If the filesystem supports atime, close can in principle report that atime
update failed. 

> >>         - The OFD was opened with O_RDWR but has never actually
> >>           been written to

The same as above but with inode mtime updates.

> >>         - No data has been written to the OFD since the last call to
> >>           fsync() for that OFD

No writeback errors should happen in this case. As I wrote above I'd
consider this a filesystem bug.

> >>
> >>         - No data has been written to the OFD since the last call to
> >>           fdatasync() for that OFD

Errors can happen because some inode metadata (in practice probably only
inode time stamps) may still need to be written out.

So in the cases described above (except for fsync()) you may get delayed
errors on close. But since in all those cases no data is lost, I don't
think 99.9% of applications care at all...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Florian Weimer @ 2026-01-26  8:51 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: The 8472, Rich Felker, Alejandro Colomar, Vincent Lefevre,
	Jan Kara, Alexander Viro, Christian Brauner, linux-fsdevel,
	linux-api, GNU libc development
In-Reply-To: <de07d292-99d8-44e8-b7d6-c491ac5fe5be@app.fastmail.com>

* Zack Weinberg:

> In particular, I really hope delayed errors *aren’t* ever reported
> when you close a file descriptor that *isn’t* the last reference
> to its open file description, because the thread-safe way to close
> stdout without losing write errors[2] depends on that not happening.

> [2] https://stackoverflow.com/a/50865617 (third code block)

Are you sure about that?  It means that errors are never reported if a
shell script redirects standard output over multiple commands.

Thanks,
Florian


^ permalink raw reply

* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Leon Hwang @ 2026-01-26  2:03 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
	Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
	Amery Hung, Rong Tao, LKML, Linux API,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <CAEf4BzYhhf7Jd6DDr2XVf=3gKeMMmrkWW9Sr49QxuW6QudSKig@mail.gmail.com>



On 24/1/26 02:52, Andrii Nakryiko wrote:
> On Thu, Jan 22, 2026 at 8:19 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 23/1/26 12:12, Alexei Starovoitov wrote:
>>> On Thu, Jan 22, 2026 at 8:07 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>>
>>>>
>>>> On 23/1/26 11:55, Alexei Starovoitov wrote:
>>>>> On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>>>
>>>>>>
>>>>>> +static int probe_bpf_syscall_common_attrs(int token_fd)
>>>>>> +{
>>>>>> +       int ret;
>>>>>> +
>>>>>> +       ret = probe_sys_bpf_ext();
>>>>>> +       return ret > 0;
>>>>>> +}
>>>>>
>>>>> When you look at the above, what thoughts come to mind?
>>>>>
>>>>> ... and please don't use ai for answers.
>>>>
>>>> My initial thought was whether probe_fd() is needed here to handle and
>>>> close a returned fd, since the return value of probe_sys_bpf_ext() isn’t
>>>> obvious from the call site.
> 
> Have you looked at how probes are called (in feat_supported()?) They
> all follow the same contract: > 0 (normally just 1) means feature is
> supported, 0 means feature is not supported, and <0 means something
> went wrong. Libbpf will log an error and will assume feature is not
> supported.
> 

I’ve looked at feat_supported().

Even though I was aware of the probe contract, I should have thought it
through more carefully in the context of feat_supported() and
probe_sys_bpf_ext(). With that in mind, your suggestion makes sense now.

> probe_sys_bpf_ext() should either follow that convention or drop the
> probe_ prefix altogether to avoid confusion. And then
> probe_bpf_syscall_common_attrs() is necessary only as a wrapper around
> probe_sys_bpf_ext() to ignore mandatory (but unused) token_fd argument
> (so to make it "pluggable" into feat_supported() framework).
> 
> So, just make probe_sys_bpf_ext() follow probe contract as described,
> and then just:
> 
> static int probe_bpf_syscall_common_attr(int token_fd)
> {
>     return probe_sys_bpf_ext();
> }
> 

I’ll make probe_sys_bpf_ext() follow the standard probe convention, and
keep probe_bpf_syscall_common_attrs() as a thin wrapper to ignore the
mandatory (but unused) token_fd argument, so it plugs cleanly into
feat_supported() framework.

> Alternatively, just make probe_sys_bpf_ext() take token_fd (but ignore
> it), and just use probe_sys_bpf_ext() directly for feat_supported().
> 
> 
> probe_fd() is not suitable here because it's for a common case when we
> expect syscall to succeed and create fd, in which case that successful
> fd represents successful feature detection. This is not the case here,
> so probe_fd() is not what you should use.
> 

Agreed as well that probe_fd() is not suitable here, since this probe is
not expected to return a successful FD.

Thanks for the detailed explanation.

Thanks,
Leon

>>>
>>> Fair enough, but then collapse it into one helper if FD is a concern.
>>> My question was about stylistic/taste preferences.
>>
>> Understood, thanks for the clarification.
>>
>> I’ll rework it with the stylistic preference in mind.
>>
>> Thanks,
>> Leon
>>


^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Zack Weinberg @ 2026-01-25 15:37 UTC (permalink / raw)
  To: The 8472, Rich Felker
  Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
	Christian Brauner, linux-fsdevel, linux-api, GNU libc development
In-Reply-To: <7654b75b-6697-4aad-93fc-29fa9b734bdb@infinite-source.de>

On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> On 24/01/2026 22:39, Rich Felker wrote:
>> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>>
>>> [...]
>>>
>>>> ERRORS
>>>>          EBADF  The fd argument was not a valid, open file descriptor.
>>>
>>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>>> on close[0], that makes it more difficult to reliably detect bugs relating
>>> to double-closes of file descriptors.
>>
>> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
>> it?
>
> Not when I brought it up last time, no[0]
>
> [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/

It seems to me that Antonio Muscemi’s point is valid for *most* errno
codes.  Like, a whole lot of them exist just to give more information
*to a human user* about the cause of an unrecoverable error.  Take
the list of “error codes that indicate a delayed error from a previous
write(2) operation,” from a little later in the draft, for instance:
there’s no plausible way for a *program* to react differently to
EFBIG, EDQUOT, and ENOSPC, but we expect that the *user* will want
to react differently, so we want different error messages for each,
so they’re different error codes.  It’s not a problem if the kernel
produces an error code of this type that wasn’t in the official
documented list, because the program doesn’t need to treat it specially.

But EBADF is different; it has the very specific meaning “user space
passed an invalid file descriptor to a system call,” which almost
always indicates a *bug in the program*, and allowing that meaning to
be diluted is not OK.  It’s getting off topic for this conversation,
but there’s a short list of other errno codes that indicate a specific
situation that the *program* should respond to in a specific way
(EAGAIN, EINTR, EINPROGRESS, EFAULT, and EPIPE are the only ones
I can think of) and maybe it would spark a more constructive
conversation on the kernel side if we presented a *comprehensive*
list of errno codes that FUSE servers shouldn’t be allowed to produce
with a specific rationale for each.

>>     Delayed errors reported by close()
>>
>>         In a variety of situations, most notably when writing to a file
>>         that is hosted on a network file server, write(2) operations may
>>         “optimistically” return successfully as soon as the write has
>>         been queued for processing.
>>
>>         close(2) waits for confirmation that *most* of the processing
>>         for previous writes to a file has been completed, and reports
>>         any errors that the earlier write() calls *would have* reported,
>>         if they hadn’t returned optimistically.  Especially, close()
>>         will report “disk full” (ENOSPC) and “disk quota exceeded”
>>         (EDQUOT) errors that write() didn’t wait for.
>
> The Rust standard library team is also interested in this topic, there
> is lively discussion[1] whether it makes sense to surface errors from
> close at all. Our current default is to ignore them.
> It is my understanding that errors may not have happened yet at
> the time of close due to delayed writeback or additional descriptors
> pointing to the description, e.g. in a forked child, and thus
> close() is not a reliable mechanism for error detection and
> fsync() is the only available option.
>
> [1] https://github.com/rust-lang/libs-team/issues/705

This is something I care about a lot as well, but I currently don’t
have an *opinion*.  To form an informed opinion, I need the answers
to these questions:

>>      [QUERY: Do delayed errors ever happen in any of these situations?
>>
>>         - The fd is not the last reference to the open file description
>>
>>         - The OFD was opened with O_RDONLY
>>
>>         - The OFD was opened with O_RDWR but has never actually
>>           been written to
>>
>>         - No data has been written to the OFD since the last call to
>>           fsync() for that OFD
>>
>>         - No data has been written to the OFD since the last call to
>>           fdatasync() for that OFD
>>
>>         If we can give some guidance about when people don’t need to
>>         worry about delayed errors, it would be helpful.]

In particular, I really hope delayed errors *aren’t* ever reported
when you close a file descriptor that *isn’t* the last reference
to its open file description, because the thread-safe way to close
stdout without losing write errors[2] depends on that not happening.

And whether the Rust stdlib can legitimately say “leaving aside the
additional cost of calling fsync(), you do not *need* the error return
from close() because you can call fsync() first,” depends on whether
it’s actually true that you *won’t* ever get a delayed error from
close() if you called fsync() first and didn’t do any more output in
between (assume the fd has no duplicates here).  I would not be
surprised at all if those FUSE guys insisted on their right to make

    char msg[] = "soon I will be invincible\n";
    int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666);
    write(fd, msg, sizeof(msg) - 1);
    fsync(fd);
    close(fd);

return an error *only* from the close, not the write or the fsync.
And I also wouldn’t be surprised at all to find production NFS or
SMB servers that did that.

[2] https://stackoverflow.com/a/50865617 (third code block)

zw

^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: The 8472 @ 2026-01-24 21:57 UTC (permalink / raw)
  To: Rich Felker
  Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
	Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
	GNU libc development
In-Reply-To: <20260124213934.GI6263@brightrain.aerifal.cx>

On 24/01/2026 22:39, Rich Felker wrote:
> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>
>> [...]
>>
>>> ERRORS
>>>          EBADF  The fd argument was not a valid, open file descriptor.
>>
>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>> on close[0], that makes it more difficult to reliably detect bugs relating
>> to double-closes of file descriptors.
> 
> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
> it?

Not when I brought it up last time, no[0]

> I wonder if that could even have security implications. I think
> you could detect these fraudulent EBADFs (albeit not under conditions
> where there's a race bug) by performing fcntl/F_GETFD before close and
> knowing the EBADF from close is fake is fcntl didn't EBADF, but that
> seems like an unreasonable cost to work around FUSE behaving badly.
> 
> Rich

That's pretty much the workaround[1] we use, but due to the extra syscall it's
only done in debug builds.

[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/rust/blob/021fc25b7a48f6051bee1e1f06c7a277e4de1cc9/library/std/src/sys/fs/unix.rs#L981-L999

^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Rich Felker @ 2026-01-24 21:39 UTC (permalink / raw)
  To: The 8472
  Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
	Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
	GNU libc development
In-Reply-To: <0f60995f-370f-4c2d-aaa6-731716657f9d@infinite-source.de>

On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
> On 23/01/2026 01:33, Zack Weinberg wrote:
> 
> [...]
> 
> > ERRORS
> >         EBADF  The fd argument was not a valid, open file descriptor.
> 
> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
> on close[0], that makes it more difficult to reliably detect bugs relating
> to double-closes of file descriptors.

Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
it? I wonder if that could even have security implications. I think
you could detect these fraudulent EBADFs (albeit not under conditions
where there's a race bug) by performing fcntl/F_GETFD before close and
knowing the EBADF from close is fake is fcntl didn't EBADF, but that
seems like an unreasonable cost to work around FUSE behaving badly.

Rich

^ permalink raw reply

* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: The 8472 @ 2026-01-24 19:34 UTC (permalink / raw)
  To: Zack Weinberg, Alejandro Colomar
  Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
	Rich Felker, linux-fsdevel, linux-api, GNU libc development
In-Reply-To: <1ec25e49-841e-4b04-911d-66e3b9ff4471@app.fastmail.com>

On 23/01/2026 01:33, Zack Weinberg wrote:

[...]

> ERRORS
>         EBADF  The fd argument was not a valid, open file descriptor.

Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
on close[0], that makes it more difficult to reliably detect bugs relating
to double-closes of file descriptors.

[...]

>     Delayed errors reported by close()
> 
>         In a variety of situations, most notably when writing to a file
>         that is hosted on a network file server, write(2) operations may
>         “optimistically” return successfully as soon as the write has
>         been queued for processing.
> 
>         close(2) waits for confirmation that *most* of the processing
>         for previous writes to a file has been completed, and reports
>         any errors that the earlier write() calls *would have* reported,
>         if they hadn’t returned optimistically.  Especially, close()
>         will report “disk full” (ENOSPC) and “disk quota exceeded”
>         (EDQUOT) errors that write() didn’t wait for.
> 
>         (To wait for *all* processing to complete, it is necessary to
>         use fsync(2) as well.)
> 
>         Because of these delayed errors, it’s important to check the
>         return value of close() and handle any errors it reports.
>         Ignoring delayed errors can cause silent loss of data.
> 
>         However, when handling delayed errors, keep in mind that the
>         close() call should *not* be repeated.  When close() has a
>         delayed error to report, it still closes the file before
>         returning.  The file descriptor number might already have been
>         reused for some other file, especially in multithreaded
>         programs.  To make another attempt at the failed writes, it’s
>         necessary to reopen the file and start all over again.
> 
>      [QUERY: Do delayed errors ever happen in any of these situations?
> 
>         - The fd is not the last reference to the open file description
> 
>         - The OFD was opened with O_RDONLY
> 
>         - The OFD was opened with O_RDWR but has never actually
>           been written to
> 
>         - No data has been written to the OFD since the last call to
>           fsync() for that OFD
> 
>         - No data has been written to the OFD since the last call to
>           fdatasync() for that OFD
> 
>         If we can give some guidance about when people don’t need to
>         worry about delayed errors, it would be helpful.]
> 

The Rust standard library team is also interested in this topic, there
is lively discussion[1] whether it makes sense to surface errors from
close at all. Our current default is to ignore them.
It is my understanding that errors may not have happened yet at
the time of close due to delayed writeback or additional descriptors
pointing to the description, e.g. in a forked child, and thus
close() is not a reliable mechanism for error detection and
fsync() is the only available option.

Some users do care specifically about the unusual behavior
on NFS, and don't want to use a heavy hammer like fsync. It's unfortunate
that there's no middle ground to get errors on an open file descriptor
or initiate the NFS flush behavior without a full fsync.


[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/libs-team/issues/705


^ permalink raw reply

* Re: [PATCH 0/2] mount: add OPEN_TREE_NAMESPACE
From: Askar Safin @ 2026-01-24 10:13 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andy Lutomirski, Jeff Layton, amir73il, cyphar, jack, josef,
	linux-fsdevel, viro, Lennart Poettering, David Howells,
	Yunkai Zhang, cgel.zte, Menglong Dong, linux-kernel, initramfs,
	containers, linux-api, news, lwn, Jonathan Corbet, Rob Landley,
	Christoph Hellwig
In-Reply-To: <20260123-autofrei-einspannen-7e65a6100e6e@brauner>

On Fri, Jan 23, 2026 at 1:23 PM Christian Brauner <brauner@kernel.org> wrote:
> The current patchset makes nullfs unconditional. As each mount

Oops, I missed that "fs: use nullfs unconditionally as the real
rootfs" is present in vfs.all.

-- 
Askar Safin

^ permalink raw reply

* Re: [RESEND PATCH bpf-next v6 2/9] libbpf: Add support for extended bpf syscall
From: Andrii Nakryiko @ 2026-01-23 18:54 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
	Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
	Amery Hung, Rong Tao, linux-kernel, linux-api, linux-kselftest,
	kernel-patches-bot
In-Reply-To: <d8f37588-2b7d-447a-ae4f-dc81e1b573c5@linux.dev>

On Thu, Jan 22, 2026 at 5:41 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 23/1/26 08:53, Andrii Nakryiko wrote:
> > On Tue, Jan 20, 2026 at 7:26 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> To support the extended BPF syscall introduced in the previous commit,
> >> introduce the following internal APIs:
> >>
> >> * 'sys_bpf_ext()'
> >> * 'sys_bpf_ext_fd()'
> >>   They wrap the raw 'syscall()' interface to support passing extended
> >>   attributes.
> >> * 'probe_sys_bpf_ext()'
> >>   Check whether current kernel supports the BPF syscall common attributes.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >>  tools/lib/bpf/bpf.c             | 32 ++++++++++++++++++++++++++++++++
> >>  tools/lib/bpf/features.c        |  8 ++++++++
> >>  tools/lib/bpf/libbpf_internal.h |  3 +++
> >>  3 files changed, 43 insertions(+)
> >>
> >> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >> index 21b57a629916..ed9c6eaeb656 100644
> >> --- a/tools/lib/bpf/bpf.c
> >> +++ b/tools/lib/bpf/bpf.c
> >> @@ -69,6 +69,38 @@ static inline __u64 ptr_to_u64(const void *ptr)
> >>         return (__u64) (unsigned long) ptr;
> >>  }
> >>
> >> +static inline int sys_bpf_ext(enum bpf_cmd cmd, union bpf_attr *attr,
> >> +                             unsigned int size,
> >> +                             struct bpf_common_attr *attr_common,
> >> +                             unsigned int size_common)
> >> +{
> >> +       cmd = attr_common ? (cmd | BPF_COMMON_ATTRS) : (cmd & ~BPF_COMMON_ATTRS);
> >> +       return syscall(__NR_bpf, cmd, attr, size, attr_common, size_common);
> >> +}
> >> +
> >> +static inline int sys_bpf_ext_fd(enum bpf_cmd cmd, union bpf_attr *attr,
> >> +                                unsigned int size,
> >> +                                struct bpf_common_attr *attr_common,
> >> +                                unsigned int size_common)
> >> +{
> >> +       int fd;
> >> +
> >> +       fd = sys_bpf_ext(cmd, attr, size, attr_common, size_common);
> >> +       return ensure_good_fd(fd);
> >> +}
> >> +
> >> +int probe_sys_bpf_ext(void)
> >> +{
> >> +       const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
> >> +       union bpf_attr attr;
> >> +
> >> +       memset(&attr, 0, attr_sz);
> >> +       /* This syscall() will return error always. */
> >
> > I'll cite myself from the last review:
> >
> >> But fd should really not be >= 0, and if it is -- it's some problem,
> >> so I'd return an error in that case to keep us aware, which is why I'm
> >> saying I'd just return inside if (fd >= 0) { }
> >
> > I didn't say let's just ignore syscall return with (void) cast and
> > happily check errno no matter what, did I? Drop the comment, and
> > handle fd >= 0 case explicitly, please.
> >
>
> My mistake — sorry for the misunderstanding.
>
> You’re right; the return value should not be ignored. In the next
> revision, I’ll handle the fd >= 0 case explicitly and drop the comment.
> The logic will be updated along the lines of:
>
> fd = syscall(__NR_bpf, BPF_PROG_LOAD | BPF_COMMON_ATTRS,
>              &attr, attr_sz, NULL, sizeof(struct bpf_common_attr));
> if (fd >= 0) {
>         close(fd);
>         return 0;
> }
> return errno == EFAULT;
>

well no, it should be

fd = syscall(...);
if (fd >= 0) {
    close(fd);
    return -EINVAL;
}

return errno == EFAULT ? 1 : 0;

> Thanks,
> Leon
>
>

^ permalink raw reply

* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Andrii Nakryiko @ 2026-01-23 18:52 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Christian Brauner, Seth Forshee, Yuichiro Tsuji,
	Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
	Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
	Amery Hung, Rong Tao, LKML, Linux API,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <419976da-f296-4418-8dfe-8ad50a9f8cb5@linux.dev>

On Thu, Jan 22, 2026 at 8:19 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 23/1/26 12:12, Alexei Starovoitov wrote:
> > On Thu, Jan 22, 2026 at 8:07 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> On 23/1/26 11:55, Alexei Starovoitov wrote:
> >>> On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>>>
> >>>>
> >>>> +static int probe_bpf_syscall_common_attrs(int token_fd)
> >>>> +{
> >>>> +       int ret;
> >>>> +
> >>>> +       ret = probe_sys_bpf_ext();
> >>>> +       return ret > 0;
> >>>> +}
> >>>
> >>> When you look at the above, what thoughts come to mind?
> >>>
> >>> ... and please don't use ai for answers.
> >>
> >> My initial thought was whether probe_fd() is needed here to handle and
> >> close a returned fd, since the return value of probe_sys_bpf_ext() isn’t
> >> obvious from the call site.

Have you looked at how probes are called (in feat_supported()?) They
all follow the same contract: > 0 (normally just 1) means feature is
supported, 0 means feature is not supported, and <0 means something
went wrong. Libbpf will log an error and will assume feature is not
supported.

probe_sys_bpf_ext() should either follow that convention or drop the
probe_ prefix altogether to avoid confusion. And then
probe_bpf_syscall_common_attrs() is necessary only as a wrapper around
probe_sys_bpf_ext() to ignore mandatory (but unused) token_fd argument
(so to make it "pluggable" into feat_supported() framework).

So, just make probe_sys_bpf_ext() follow probe contract as described,
and then just:

static int probe_bpf_syscall_common_attr(int token_fd)
{
    return probe_sys_bpf_ext();
}

Alternatively, just make probe_sys_bpf_ext() take token_fd (but ignore
it), and just use probe_sys_bpf_ext() directly for feat_supported().


probe_fd() is not suitable here because it's for a common case when we
expect syscall to succeed and create fd, in which case that successful
fd represents successful feature detection. This is not the case here,
so probe_fd() is not what you should use.

> >
> > Fair enough, but then collapse it into one helper if FD is a concern.
> > My question was about stylistic/taste preferences.
>
> Understood, thanks for the clarification.
>
> I’ll rework it with the stylistic preference in mind.
>
> Thanks,
> Leon
>

^ permalink raw reply

* Re: [PATCH v7 07/16] ext4: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-01-23 15:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko, glaubitz,
	frank.li, Theodore Tso, adilger.kernel, Carlos Maiolino,
	Steve French, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
	Trond Myklebust, Anna Schumaker, Jaegeuk Kim, Chao Yu,
	Hans de Goede, senozhatsky, Chuck Lever
In-Reply-To: <20260123002904.GM5945@frogsfrogsfrogs>



On Thu, Jan 22, 2026, at 7:29 PM, Darrick J. Wong wrote:
> On Thu, Jan 22, 2026 at 11:03:02AM -0500, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> Report ext4's case sensitivity behavior via the FS_XFLAG_CASEFOLD
>> flag. ext4 always preserves case at rest.
>> 
>> Case sensitivity is a per-directory setting in ext4. If the queried
>> inode is a casefolded directory, report case-insensitive; otherwise
>> report case-sensitive (standard POSIX behavior).
>> 
>> Reviewed-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  fs/ext4/ioctl.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>> 
>> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
>> index 7ce0fc40aec2..462da7aadc80 100644
>> --- a/fs/ext4/ioctl.c
>> +++ b/fs/ext4/ioctl.c
>> @@ -996,6 +996,13 @@ int ext4_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
>>  	if (ext4_has_feature_project(inode->i_sb))
>>  		fa->fsx_projid = from_kprojid(&init_user_ns, ei->i_projid);
>>  
>> +	/*
>> +	 * Case folding is a directory attribute in ext4. Set FS_XFLAG_CASEFOLD
>> +	 * for directories with the casefold attribute; all other inodes use
>> +	 * standard case-sensitive semantics.
>> +	 */
>> +	if (IS_CASEFOLDED(inode))
>> +		fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
>
> Curious.  Shouldn't the VFS set FS_XFLAG_CASEFOLD if the VFS casefolding
> flag is set?
>
> OTOH, there are more filesystems that apparently support casefolding
> (given the size of this patchset) than actually set S_CASEFOLD.  I think
> I'm ignorant of something here...

I'm not clear if there's a review action needed. Help?


-- 
Chuck Lever

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox