* [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
To support the extended BPF syscall introduced in the previous commit,
introduce the following internal APIs:
* 'sys_bpf_ext()'
* 'sys_bpf_ext_fd()'
They wrap the raw 'syscall()' interface to support passing extended
attributes.
* 'probe_sys_bpf_ext()'
Check whether current kernel supports the BPF syscall common attributes.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
tools/lib/bpf/bpf.c | 36 +++++++++++++++++++++++++++++++++
tools/lib/bpf/features.c | 11 ++++++++++
tools/lib/bpf/libbpf_internal.h | 3 +++
3 files changed, 50 insertions(+)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 21b57a629916..fc87552b1378 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -69,6 +69,42 @@ static inline __u64 ptr_to_u64(const void *ptr)
return (__u64) (unsigned long) ptr;
}
+static inline int sys_bpf_ext(enum bpf_cmd cmd, union bpf_attr *attr,
+ unsigned int size,
+ struct bpf_common_attr *attr_common,
+ unsigned int size_common)
+{
+ cmd = attr_common ? (cmd | BPF_COMMON_ATTRS) : (cmd & ~BPF_COMMON_ATTRS);
+ return syscall(__NR_bpf, cmd, attr, size, attr_common, size_common);
+}
+
+static inline int sys_bpf_ext_fd(enum bpf_cmd cmd, union bpf_attr *attr,
+ unsigned int size,
+ struct bpf_common_attr *attr_common,
+ unsigned int size_common)
+{
+ int fd;
+
+ fd = sys_bpf_ext(cmd, attr, size, attr_common, size_common);
+ return ensure_good_fd(fd);
+}
+
+int probe_sys_bpf_ext(void)
+{
+ const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
+ union bpf_attr attr;
+ int fd;
+
+ memset(&attr, 0, attr_sz);
+ fd = syscall(__NR_bpf, BPF_PROG_LOAD | BPF_COMMON_ATTRS, &attr, attr_sz, NULL,
+ sizeof(struct bpf_common_attr));
+ if (fd >= 0) {
+ close(fd);
+ return -EINVAL;
+ }
+ return errno == EFAULT ? 1 : -errno;
+}
+
static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
unsigned int size)
{
diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
index b842b83e2480..e82c2afead43 100644
--- a/tools/lib/bpf/features.c
+++ b/tools/lib/bpf/features.c
@@ -506,6 +506,14 @@ static int probe_kern_arg_ctx_tag(int token_fd)
return probe_fd(prog_fd);
}
+static int probe_bpf_syscall_common_attrs(int token_fd)
+{
+ int ret;
+
+ ret = probe_sys_bpf_ext();
+ return ret > 0;
+}
+
typedef int (*feature_probe_fn)(int /* token_fd */);
static struct kern_feature_cache feature_cache;
@@ -581,6 +589,9 @@ static struct kern_feature_desc {
[FEAT_BTF_QMARK_DATASEC] = {
"BTF DATASEC names starting from '?'", probe_kern_btf_qmark_datasec,
},
+ [FEAT_BPF_SYSCALL_COMMON_ATTRS] = {
+ "BPF syscall common attributes support", probe_bpf_syscall_common_attrs,
+ },
};
bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index fc59b21b51b5..aa16be869c4f 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -392,6 +392,8 @@ enum kern_feature_id {
FEAT_ARG_CTX_TAG,
/* Kernel supports '?' at the front of datasec names */
FEAT_BTF_QMARK_DATASEC,
+ /* Kernel supports BPF syscall common attributes */
+ FEAT_BPF_SYSCALL_COMMON_ATTRS,
__FEAT_CNT,
};
@@ -757,4 +759,5 @@ int probe_fd(int fd);
#define SHA256_DWORD_SIZE SHA256_DIGEST_LENGTH / sizeof(__u64)
void libbpf_sha256(const void *data, size_t len, __u8 out[SHA256_DIGEST_LENGTH]);
+int probe_sys_bpf_ext(void);
#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
--
2.52.0
^ permalink raw reply related
* [PATCH bpf-next v7 3/9] bpf: Refactor reporting log_true_size for prog_load
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
The next commit will add support for reporting logs via extended common
attributes, including 'log_true_size'.
To prepare for that, refactor the 'log_true_size' reporting logic by
introducing a new struct bpf_log_attr to encapsulate log-related behavior:
* bpf_prog_load_log_attr_init(): initialize the log fields, which will
support extended common attributes in the next commit.
* bpf_log_attr_finalize(): handle log finalization and write back
'log_true_size' to userspace.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/bpf.h | 19 ++++++++++++++++-
include/linux/bpf_verifier.h | 11 ++++++++++
kernel/bpf/log.c | 40 ++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 9 +++++++-
kernel/bpf/verifier.c | 19 ++++++-----------
5 files changed, 83 insertions(+), 15 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5936f8e2996f..3a525a7e8747 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2867,8 +2867,25 @@ int bpf_get_file_flag(int flags);
int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size,
size_t actual_size);
+struct bpf_attrs {
+ const void *attr;
+ bpfptr_t uattr;
+ u32 size;
+};
+
+static inline void bpf_attrs_init(struct bpf_attrs *attrs, const void *attr, bpfptr_t uattr,
+ u32 size)
+{
+ memset(attrs, 0, sizeof(*attrs));
+ attrs->attr = attr;
+ attrs->uattr = uattr;
+ attrs->size = size;
+}
+
/* verify correctness of eBPF program */
-int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size);
+struct bpf_log_attr;
+int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr,
+ struct bpf_log_attr *log_attr);
#ifndef CONFIG_BPF_JIT_ALWAYS_ON
void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 8355b585cd18..4a0c5ef296b9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -631,6 +631,17 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
return log && log->level;
}
+struct bpf_log_attr {
+ u64 log_buf;
+ u32 log_size;
+ u32 log_level;
+ struct bpf_attrs *attrs;
+ u32 offsetof_log_true_size;
+};
+
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
+int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
+
#define BPF_MAX_SUBPROGS 256
struct bpf_subprog_arg_info {
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index a0c3b35de2ce..457b724c4176 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -863,3 +863,43 @@ void print_insn_state(struct bpf_verifier_env *env, const struct bpf_verifier_st
}
print_verifier_state(env, vstate, frameno, false);
}
+
+static int bpf_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs, u64 log_buf,
+ u32 log_size, u32 log_level, int offsetof_log_true_size)
+{
+ memset(log_attr, 0, sizeof(*log_attr));
+ log_attr->log_buf = log_buf;
+ log_attr->log_size = log_size;
+ log_attr->log_level = log_level;
+ log_attr->attrs = attrs;
+ log_attr->offsetof_log_true_size = offsetof_log_true_size;
+ return 0;
+}
+
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+{
+ const union bpf_attr *attr = attrs->attr;
+
+ return bpf_log_attr_init(log_attr, attrs, attr->log_buf, attr->log_size, attr->log_level,
+ offsetof(union bpf_attr, log_true_size));
+}
+
+int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
+{
+ u32 log_true_size, off;
+ size_t size;
+ int err;
+
+ if (!log)
+ return 0;
+
+ err = bpf_vlog_finalize(log, &log_true_size);
+
+ size = sizeof(log_true_size);
+ off = log_attr->offsetof_log_true_size;
+ if (log_attr->attrs && log_attr->attrs->size >= off + size &&
+ copy_to_bpfptr_offset(log_attr->attrs->uattr, off, &log_true_size, size))
+ err = -EFAULT;
+
+ return err;
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3564b5bf3689..8468baf545c5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2871,6 +2871,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
struct bpf_prog *prog, *dst_prog = NULL;
struct btf *attach_btf = NULL;
struct bpf_token *token = NULL;
+ struct bpf_log_attr log_attr;
+ struct bpf_attrs attrs;
bool bpf_cap;
int err;
char license[128];
@@ -3082,8 +3084,13 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
if (err)
goto free_prog_sec;
+ bpf_attrs_init(&attrs, attr, uattr, uattr_size);
+ err = bpf_prog_load_log_attr_init(&log_attr, &attrs);
+ if (err < 0)
+ goto free_used_maps;
+
/* run eBPF verifier */
- err = bpf_check(&prog, attr, uattr, uattr_size);
+ err = bpf_check(&prog, attr, uattr, &log_attr);
if (err < 0)
goto free_used_maps;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c7f5234d5fd2..03d56d3d3f89 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -25593,12 +25593,12 @@ static int compute_scc(struct bpf_verifier_env *env)
return err;
}
-int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
+int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr,
+ struct bpf_log_attr *log_attr)
{
u64 start_time = ktime_get_ns();
struct bpf_verifier_env *env;
int i, len, ret = -EINVAL, err;
- u32 log_true_size;
bool is_priv;
BTF_TYPE_EMIT(enum bpf_features);
@@ -25645,9 +25645,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
/* user could have requested verbose verifier output
* and supplied buffer to store the verification trace
*/
- ret = bpf_vlog_init(&env->log, attr->log_level,
- (char __user *) (unsigned long) attr->log_buf,
- attr->log_size);
+ ret = bpf_vlog_init(&env->log, log_attr->log_level,
+ u64_to_user_ptr(log_attr->log_buf),
+ log_attr->log_size);
if (ret)
goto err_unlock;
@@ -25797,17 +25797,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
env->prog->aux->verified_insns = env->insn_processed;
/* preserve original error even if log finalization is successful */
- err = bpf_vlog_finalize(&env->log, &log_true_size);
+ err = bpf_log_attr_finalize(log_attr, &env->log);
if (err)
ret = err;
- if (uattr_size >= offsetofend(union bpf_attr, log_true_size) &&
- copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, log_true_size),
- &log_true_size, sizeof(log_true_size))) {
- ret = -EFAULT;
- goto err_release_maps;
- }
-
if (ret)
goto err_release_maps;
--
2.52.0
^ permalink raw reply related
* [PATCH bpf-next v7 4/9] bpf: Add syscall common attributes support for prog_load
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
The log buffer of common attributes would be confusing with the one in
'union bpf_attr' for BPF_PROG_LOAD.
In order to clarify the usage of these two log buffers, they both can be
used for logging if:
* They are same, including 'log_buf', 'log_level' and 'log_size'.
* One of them is missing, then another one will be used for logging.
If they both have 'log_buf' but they are not same totally, return -EINVAL.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/bpf_verifier.h | 4 +++-
kernel/bpf/log.c | 29 ++++++++++++++++++++++++++---
kernel/bpf/syscall.c | 9 ++++++---
3 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 4a0c5ef296b9..7eb024e83d2d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -637,9 +637,11 @@ struct bpf_log_attr {
u32 log_level;
struct bpf_attrs *attrs;
u32 offsetof_log_true_size;
+ struct bpf_attrs *attrs_common;
};
-int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+ struct bpf_attrs *attrs_common);
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
#define BPF_MAX_SUBPROGS 256
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 457b724c4176..c0b816e84384 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -865,23 +865,41 @@ void print_insn_state(struct bpf_verifier_env *env, const struct bpf_verifier_st
}
static int bpf_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs, u64 log_buf,
- u32 log_size, u32 log_level, int offsetof_log_true_size)
+ u32 log_size, u32 log_level, int offsetof_log_true_size,
+ struct bpf_attrs *attrs_common)
{
+ const struct bpf_common_attr *common = attrs_common ? attrs_common->attr : NULL;
+
memset(log_attr, 0, sizeof(*log_attr));
log_attr->log_buf = log_buf;
log_attr->log_size = log_size;
log_attr->log_level = log_level;
log_attr->attrs = attrs;
log_attr->offsetof_log_true_size = offsetof_log_true_size;
+ log_attr->attrs_common = attrs_common;
+
+ if (log_buf && common && common->log_buf &&
+ (log_buf != common->log_buf ||
+ log_size != common->log_size ||
+ log_level != common->log_level))
+ return -EINVAL;
+
+ if (!log_buf && common && common->log_buf) {
+ log_attr->log_buf = common->log_buf;
+ log_attr->log_size = common->log_size;
+ log_attr->log_level = common->log_level;
+ }
+
return 0;
}
-int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+ struct bpf_attrs *attrs_common)
{
const union bpf_attr *attr = attrs->attr;
return bpf_log_attr_init(log_attr, attrs, attr->log_buf, attr->log_size, attr->log_level,
- offsetof(union bpf_attr, log_true_size));
+ offsetof(union bpf_attr, log_true_size), attrs_common);
}
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
@@ -901,5 +919,10 @@ int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log
copy_to_bpfptr_offset(log_attr->attrs->uattr, off, &log_true_size, size))
err = -EFAULT;
+ off = offsetof(struct bpf_common_attr, log_true_size);
+ if (log_attr->attrs_common && log_attr->attrs_common->size >= off + size &&
+ copy_to_bpfptr_offset(log_attr->attrs_common->uattr, off, &log_true_size, size))
+ err = -EFAULT;
+
return err;
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8468baf545c5..8f3e489667d7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2865,7 +2865,8 @@ static int bpf_prog_mark_insn_arrays_ready(struct bpf_prog *prog)
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD keyring_id
-static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
+static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size,
+ struct bpf_attrs *attrs_common)
{
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog, *dst_prog = NULL;
@@ -3085,7 +3086,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
goto free_prog_sec;
bpf_attrs_init(&attrs, attr, uattr, uattr_size);
- err = bpf_prog_load_log_attr_init(&log_attr, &attrs);
+ err = bpf_prog_load_log_attr_init(&log_attr, &attrs, attrs_common);
if (err < 0)
goto free_used_maps;
@@ -6174,6 +6175,7 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
bpfptr_t uattr_common, unsigned int size_common)
{
struct bpf_common_attr attr_common;
+ struct bpf_attrs attrs_common;
union bpf_attr attr;
int err;
@@ -6225,7 +6227,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
err = map_freeze(&attr);
break;
case BPF_PROG_LOAD:
- err = bpf_prog_load(&attr, uattr, size);
+ bpf_attrs_init(&attrs_common, &attr_common, uattr_common, size_common);
+ err = bpf_prog_load(&attr, uattr, size, &attrs_common);
break;
case BPF_OBJ_PIN:
err = bpf_obj_pin(&attr);
--
2.52.0
^ permalink raw reply related
* [PATCH bpf-next v7 5/9] bpf: Refactor reporting btf_log_true_size for btf_load
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
In the next commit, it will be able to report logs via extended common
attributes, which will report 'log_true_size' via the extended common
attributes meanwhile.
Therefore, refactor the way of 'btf_log_true_size' reporting in order to
report 'log_true_size' via the extended common attributes easily.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/bpf_verifier.h | 1 +
include/linux/btf.h | 3 ++-
kernel/bpf/btf.c | 32 +++++++++-----------------------
kernel/bpf/log.c | 9 +++++++++
kernel/bpf/syscall.c | 10 +++++++++-
5 files changed, 30 insertions(+), 25 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7eb024e83d2d..28e22a03ac84 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -642,6 +642,7 @@ struct bpf_log_attr {
int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
struct bpf_attrs *attrs_common);
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
#define BPF_MAX_SUBPROGS 256
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 48108471c5b1..2812caa6c60e 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -145,7 +145,8 @@ const char *btf_get_name(const struct btf *btf);
void btf_get(struct btf *btf);
void btf_put(struct btf *btf);
const struct btf_header *btf_header(const struct btf *btf);
-int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz);
+struct bpf_log_attr;
+int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_log_attr *log_attr);
struct btf *btf_get_by_fd(int fd);
int btf_get_info_by_fd(const struct btf *btf,
const union bpf_attr *attr,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index d10b3404260f..136fdd8f73b2 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5856,25 +5856,11 @@ static int btf_check_type_tags(struct btf_verifier_env *env,
return 0;
}
-static int finalize_log(struct bpf_verifier_log *log, bpfptr_t uattr, u32 uattr_size)
-{
- u32 log_true_size;
- int err;
-
- err = bpf_vlog_finalize(log, &log_true_size);
-
- if (uattr_size >= offsetofend(union bpf_attr, btf_log_true_size) &&
- copy_to_bpfptr_offset(uattr, offsetof(union bpf_attr, btf_log_true_size),
- &log_true_size, sizeof(log_true_size)))
- err = -EFAULT;
-
- return err;
-}
-
-static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
+static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr,
+ struct bpf_log_attr *log_attr)
{
bpfptr_t btf_data = make_bpfptr(attr->btf, uattr.is_kernel);
- char __user *log_ubuf = u64_to_user_ptr(attr->btf_log_buf);
+ char __user *log_ubuf = u64_to_user_ptr(log_attr->log_buf);
struct btf_struct_metas *struct_meta_tab;
struct btf_verifier_env *env = NULL;
struct btf *btf = NULL;
@@ -5891,8 +5877,8 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
/* user could have requested verbose verifier output
* and supplied buffer to store the verification trace
*/
- err = bpf_vlog_init(&env->log, attr->btf_log_level,
- log_ubuf, attr->btf_log_size);
+ err = bpf_vlog_init(&env->log, log_attr->log_level,
+ log_ubuf, log_attr->log_size);
if (err)
goto errout_free;
@@ -5953,7 +5939,7 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
}
}
- err = finalize_log(&env->log, uattr, uattr_size);
+ err = bpf_log_attr_finalize(log_attr, &env->log);
if (err)
goto errout_free;
@@ -5965,7 +5951,7 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
btf_free_struct_meta_tab(btf);
errout:
/* overwrite err with -ENOSPC or -EFAULT */
- ret = finalize_log(&env->log, uattr, uattr_size);
+ ret = bpf_log_attr_finalize(log_attr, &env->log);
if (ret)
err = ret;
errout_free:
@@ -8134,12 +8120,12 @@ static int __btf_new_fd(struct btf *btf)
return anon_inode_getfd("btf", &btf_fops, btf, O_RDONLY | O_CLOEXEC);
}
-int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
+int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_log_attr *log_attr)
{
struct btf *btf;
int ret;
- btf = btf_parse(attr, uattr, uattr_size);
+ btf = btf_parse(attr, uattr, log_attr);
if (IS_ERR(btf))
return PTR_ERR(btf);
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index c0b816e84384..f1ed24157d71 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -902,6 +902,15 @@ int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs
offsetof(union bpf_attr, log_true_size), attrs_common);
}
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+{
+ const union bpf_attr *attr = attrs->attr;
+
+ return bpf_log_attr_init(log_attr, attrs, attr->btf_log_buf, attr->btf_log_size,
+ attr->btf_log_level, offsetof(union bpf_attr, btf_log_true_size),
+ NULL);
+}
+
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
{
u32 log_true_size, off;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8f3e489667d7..6241ef35c164 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5436,6 +5436,9 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
{
struct bpf_token *token = NULL;
+ struct bpf_log_attr log_attr;
+ struct bpf_attrs attrs;
+ int err;
if (CHECK_ATTR(BPF_BTF_LOAD))
return -EINVAL;
@@ -5443,6 +5446,11 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_
if (attr->btf_flags & ~BPF_F_TOKEN_FD)
return -EINVAL;
+ bpf_attrs_init(&attrs, attr, uattr, uattr_size);
+ err = bpf_btf_load_log_attr_init(&log_attr, &attrs);
+ if (err)
+ return err;
+
if (attr->btf_flags & BPF_F_TOKEN_FD) {
token = bpf_token_get_from_fd(attr->btf_token_fd);
if (IS_ERR(token))
@@ -5460,7 +5468,7 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_
bpf_token_put(token);
- return btf_new_fd(attr, uattr, uattr_size);
+ return btf_new_fd(attr, uattr, &log_attr);
}
#define BPF_BTF_GET_FD_BY_ID_LAST_FIELD fd_by_id_token_fd
--
2.52.0
^ permalink raw reply related
* [PATCH bpf-next v7 6/9] bpf: Add syscall common attributes support for btf_load
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
Since bpf_log_attr_init() now supports struct bpf_common_attr, pass the
common attributes to it to enable syscall common attributes support for
BPF_BTF_LOAD.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/bpf_verifier.h | 3 ++-
kernel/bpf/log.c | 5 +++--
kernel/bpf/syscall.c | 8 +++++---
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 28e22a03ac84..732bc4baee1c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -642,7 +642,8 @@ struct bpf_log_attr {
int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
struct bpf_attrs *attrs_common);
-int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs);
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+ struct bpf_attrs *attrs_common);
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
#define BPF_MAX_SUBPROGS 256
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index f1ed24157d71..3cccb0c5e482 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -902,13 +902,14 @@ int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs
offsetof(union bpf_attr, log_true_size), attrs_common);
}
-int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs)
+int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
+ struct bpf_attrs *attrs_common)
{
const union bpf_attr *attr = attrs->attr;
return bpf_log_attr_init(log_attr, attrs, attr->btf_log_buf, attr->btf_log_size,
attr->btf_log_level, offsetof(union bpf_attr, btf_log_true_size),
- NULL);
+ attrs_common);
}
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6241ef35c164..d17cb243c5b3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5433,7 +5433,8 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
#define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
-static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
+static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size,
+ struct bpf_attrs *attrs_common)
{
struct bpf_token *token = NULL;
struct bpf_log_attr log_attr;
@@ -5447,7 +5448,7 @@ static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_
return -EINVAL;
bpf_attrs_init(&attrs, attr, uattr, uattr_size);
- err = bpf_btf_load_log_attr_init(&log_attr, &attrs);
+ err = bpf_btf_load_log_attr_init(&log_attr, &attrs, attrs_common);
if (err)
return err;
@@ -6281,7 +6282,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
err = bpf_raw_tracepoint_open(&attr);
break;
case BPF_BTF_LOAD:
- err = bpf_btf_load(&attr, uattr, size);
+ bpf_attrs_init(&attrs_common, &attr_common, uattr_common, size_common);
+ err = bpf_btf_load(&attr, uattr, size, &attrs_common);
break;
case BPF_BTF_GET_FD_BY_ID:
err = bpf_btf_get_fd_by_id(&attr);
--
2.52.0
^ permalink raw reply related
* [PATCH bpf-next v7 7/9] bpf: Add syscall common attributes support for map_create
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
Currently, many BPF_MAP_CREATE failures return -EINVAL without providing
any explanation to userspace.
With extended BPF syscall support, detailed error messages can now be
reported via the log buffer, allowing users to understand the specific
reason for a failed map creation.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
include/linux/bpf_verifier.h | 2 ++
kernel/bpf/log.c | 30 +++++++++++++++++
kernel/bpf/syscall.c | 65 ++++++++++++++++++++++++++++++------
3 files changed, 87 insertions(+), 10 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 732bc4baee1c..917293a552b6 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -644,6 +644,8 @@ int bpf_prog_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs
struct bpf_attrs *attrs_common);
int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *attrs,
struct bpf_attrs *attrs_common);
+struct bpf_verifier_log *bpf_log_attr_create_vlog(struct bpf_log_attr *log_attr,
+ struct bpf_attrs *attrs_common);
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log);
#define BPF_MAX_SUBPROGS 256
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 3cccb0c5e482..d7933a412c36 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -912,6 +912,36 @@ int bpf_btf_load_log_attr_init(struct bpf_log_attr *log_attr, struct bpf_attrs *
attrs_common);
}
+struct bpf_verifier_log *bpf_log_attr_create_vlog(struct bpf_log_attr *log_attr,
+ struct bpf_attrs *attrs_common)
+{
+ const struct bpf_common_attr *common = attrs_common->attr;
+ struct bpf_verifier_log *log;
+ int err;
+
+ memset(log_attr, 0, sizeof(*log_attr));
+ log_attr->log_buf = common->log_buf;
+ log_attr->log_size = common->log_size;
+ log_attr->log_level = common->log_level;
+ log_attr->attrs_common = attrs_common;
+
+ if (!log_attr->log_buf)
+ return NULL;
+
+ log = kzalloc(sizeof(*log), GFP_KERNEL);
+ if (!log)
+ return ERR_PTR(-ENOMEM);
+
+ err = bpf_vlog_init(log, log_attr->log_level, u64_to_user_ptr(log_attr->log_buf),
+ log_attr->log_size);
+ if (err) {
+ kfree(log);
+ return ERR_PTR(err);
+ }
+
+ return log;
+}
+
int bpf_log_attr_finalize(struct bpf_log_attr *log_attr, struct bpf_verifier_log *log)
{
u32 log_true_size, off;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d17cb243c5b3..0cfa4029a829 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1370,7 +1370,7 @@ static bool bpf_net_capable(void)
#define BPF_MAP_CREATE_LAST_FIELD excl_prog_hash_size
/* called via syscall */
-static int map_create(union bpf_attr *attr, bpfptr_t uattr)
+static int __map_create(union bpf_attr *attr, bpfptr_t uattr, struct bpf_verifier_log *log)
{
const struct bpf_map_ops *ops;
struct bpf_token *token = NULL;
@@ -1382,8 +1382,10 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
int err;
err = CHECK_ATTR(BPF_MAP_CREATE);
- if (err)
+ if (err) {
+ bpf_log(log, "Invalid attr.\n");
return -EINVAL;
+ }
/* check BPF_F_TOKEN_FD flag, remember if it's set, and then clear it
* to avoid per-map type checks tripping on unknown flag
@@ -1392,17 +1394,25 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
attr->map_flags &= ~BPF_F_TOKEN_FD;
if (attr->btf_vmlinux_value_type_id) {
- if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS ||
- attr->btf_key_type_id || attr->btf_value_type_id)
+ if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS) {
+ bpf_log(log, "btf_vmlinux_value_type_id can only be used with struct_ops maps.\n");
return -EINVAL;
+ }
+ if (attr->btf_key_type_id || attr->btf_value_type_id) {
+ bpf_log(log, "btf_vmlinux_value_type_id is mutually exclusive with btf_key_type_id and btf_value_type_id.\n");
+ return -EINVAL;
+ }
} else if (attr->btf_key_type_id && !attr->btf_value_type_id) {
+ bpf_log(log, "Invalid btf_value_type_id.\n");
return -EINVAL;
}
if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
attr->map_type != BPF_MAP_TYPE_ARENA &&
- attr->map_extra != 0)
+ attr->map_extra != 0) {
+ bpf_log(log, "Invalid map_extra.\n");
return -EINVAL;
+ }
f_flags = bpf_get_file_flag(attr->map_flags);
if (f_flags < 0)
@@ -1410,13 +1420,17 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
if (numa_node != NUMA_NO_NODE &&
((unsigned int)numa_node >= nr_node_ids ||
- !node_online(numa_node)))
+ !node_online(numa_node))) {
+ bpf_log(log, "Invalid numa_node.\n");
return -EINVAL;
+ }
/* find map type and init map: hashtable vs rbtree vs bloom vs ... */
map_type = attr->map_type;
- if (map_type >= ARRAY_SIZE(bpf_map_types))
+ if (map_type >= ARRAY_SIZE(bpf_map_types)) {
+ bpf_log(log, "Invalid map_type.\n");
return -EINVAL;
+ }
map_type = array_index_nospec(map_type, ARRAY_SIZE(bpf_map_types));
ops = bpf_map_types[map_type];
if (!ops)
@@ -1434,8 +1448,10 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
if (token_flag) {
token = bpf_token_get_from_fd(attr->map_token_fd);
- if (IS_ERR(token))
+ if (IS_ERR(token)) {
+ bpf_log(log, "Invalid map_token_fd.\n");
return PTR_ERR(token);
+ }
/* if current token doesn't grant map creation permissions,
* then we can't use this token, so ignore it and rely on
@@ -1518,8 +1534,10 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
err = bpf_obj_name_cpy(map->name, attr->map_name,
sizeof(attr->map_name));
- if (err < 0)
+ if (err < 0) {
+ bpf_log(log, "Invalid map_name.\n");
goto free_map;
+ }
preempt_disable();
map->cookie = gen_cookie_next(&bpf_map_cookie);
@@ -1542,6 +1560,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
btf = btf_get_by_fd(attr->btf_fd);
if (IS_ERR(btf)) {
+ bpf_log(log, "Invalid btf_fd.\n");
err = PTR_ERR(btf);
goto free_map;
}
@@ -1569,6 +1588,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
bpfptr_t uprog_hash = make_bpfptr(attr->excl_prog_hash, uattr.is_kernel);
if (attr->excl_prog_hash_size != SHA256_DIGEST_SIZE) {
+ bpf_log(log, "Invalid excl_prog_hash_size.\n");
err = -EINVAL;
goto free_map;
}
@@ -1584,6 +1604,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
goto free_map;
}
} else if (attr->excl_prog_hash_size) {
+ bpf_log(log, "Invalid excl_prog_hash_size.\n");
err = -EINVAL;
goto free_map;
}
@@ -1622,6 +1643,29 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
return err;
}
+static int map_create(union bpf_attr *attr, bpfptr_t uattr, struct bpf_attrs *attrs_common)
+{
+ struct bpf_verifier_log *log;
+ struct bpf_log_attr log_attr;
+ int err, ret;
+
+ log = bpf_log_attr_create_vlog(&log_attr, attrs_common);
+ if (IS_ERR(log))
+ return PTR_ERR(log);
+
+ err = __map_create(attr, uattr, log);
+ if (err >= 0)
+ goto free;
+
+ ret = bpf_log_attr_finalize(&log_attr, log);
+ if (ret)
+ err = ret;
+
+free:
+ kfree(log);
+ return err;
+}
+
void bpf_map_inc(struct bpf_map *map)
{
atomic64_inc(&map->refcnt);
@@ -6218,7 +6262,8 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size,
switch (cmd) {
case BPF_MAP_CREATE:
- err = map_create(&attr, uattr);
+ bpf_attrs_init(&attrs_common, &attr_common, uattr_common, size_common);
+ err = map_create(&attr, uattr, &attrs_common);
break;
case BPF_MAP_LOOKUP_ELEM:
err = map_lookup_elem(&attr);
--
2.52.0
^ permalink raw reply related
* [PATCH bpf-next v7 8/9] libbpf: Add common attr support for map_create
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
With the previous commit adding common attribute support for
BPF_MAP_CREATE, users can now retrieve detailed error messages when map
creation fails via the log_buf field.
Introduce struct bpf_log_opts with the following fields:
log_buf, log_size, log_level, and log_true_size.
Extend bpf_map_create_opts with a new field log_opts, allowing users to
capture and inspect log messages on map creation failures.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
tools/lib/bpf/bpf.c | 16 +++++++++++++++-
tools/lib/bpf/bpf.h | 17 ++++++++++++++++-
2 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index fc87552b1378..eb9127af6d04 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -209,6 +209,9 @@ int bpf_map_create(enum bpf_map_type map_type,
const struct bpf_map_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, excl_prog_hash_size);
+ const size_t attr_common_sz = sizeof(struct bpf_common_attr);
+ struct bpf_common_attr attr_common;
+ struct bpf_log_opts *log_opts;
union bpf_attr attr;
int fd;
@@ -242,7 +245,18 @@ int bpf_map_create(enum bpf_map_type map_type,
attr.excl_prog_hash = ptr_to_u64(OPTS_GET(opts, excl_prog_hash, NULL));
attr.excl_prog_hash_size = OPTS_GET(opts, excl_prog_hash_size, 0);
- fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
+ log_opts = OPTS_GET(opts, log_opts, NULL);
+ if (log_opts && feat_supported(NULL, FEAT_BPF_SYSCALL_COMMON_ATTRS)) {
+ memset(&attr_common, 0, attr_common_sz);
+ attr_common.log_buf = ptr_to_u64(OPTS_GET(log_opts, log_buf, NULL));
+ attr_common.log_size = OPTS_GET(log_opts, log_size, 0);
+ attr_common.log_level = OPTS_GET(log_opts, log_level, 0);
+ fd = sys_bpf_ext_fd(BPF_MAP_CREATE, &attr, attr_sz, &attr_common, attr_common_sz);
+ OPTS_SET(log_opts, log_true_size, attr_common.log_true_size);
+ } else {
+ fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
+ OPTS_SET(log_opts, log_true_size, 0);
+ }
return libbpf_err_errno(fd);
}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 2c8e88ddb674..59673f094f86 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -37,6 +37,18 @@ extern "C" {
LIBBPF_API int libbpf_set_memlock_rlim(size_t memlock_bytes);
+struct bpf_log_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+
+ char *log_buf;
+ __u32 log_size;
+ __u32 log_level;
+ __u32 log_true_size;
+
+ size_t :0;
+};
+#define bpf_log_opts__last_field log_true_size
+
struct bpf_map_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -57,9 +69,12 @@ struct bpf_map_create_opts {
const void *excl_prog_hash;
__u32 excl_prog_hash_size;
+
+ struct bpf_log_opts *log_opts;
+
size_t :0;
};
-#define bpf_map_create_opts__last_field excl_prog_hash_size
+#define bpf_map_create_opts__last_field log_opts
LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
const char *map_name,
--
2.52.0
^ permalink raw reply related
* [PATCH bpf-next v7 9/9] selftests/bpf: Add tests to verify map create failure log
From: Leon Hwang @ 2026-01-23 3:24 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Leon Hwang, Willem de Bruijn, Jason Xing,
Tao Chen, Mykyta Yatsenko, Kumar Kartikeya Dwivedi,
Anton Protopopov, Amery Hung, Rong Tao, linux-kernel, linux-api,
linux-kselftest, kernel-patches-bot
In-Reply-To: <20260123032445.125259-1-leon.hwang@linux.dev>
Add tests to verify that the kernel reports the expected error messages
when map creation fails.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
.../selftests/bpf/prog_tests/map_init.c | 168 ++++++++++++++++++
1 file changed, 168 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/map_init.c b/tools/testing/selftests/bpf/prog_tests/map_init.c
index 14a31109dd0e..89e6daf2fcfd 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_init.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_init.c
@@ -212,3 +212,171 @@ void test_map_init(void)
if (test__start_subtest("pcpu_lru_map_init"))
test_pcpu_lru_map_init();
}
+
+#define BPF_LOG_FIXED 8
+
+static void test_map_create(enum bpf_map_type map_type, const char *map_name,
+ struct bpf_map_create_opts *opts, const char *exp_msg)
+{
+ const int key_size = 4, value_size = 4, max_entries = 1;
+ char log_buf[128];
+ int fd;
+ LIBBPF_OPTS(bpf_log_opts, log_opts);
+
+ log_buf[0] = '\0';
+ log_opts.log_buf = log_buf;
+ log_opts.log_size = sizeof(log_buf);
+ log_opts.log_level = BPF_LOG_FIXED;
+ opts->log_opts = &log_opts;
+ fd = bpf_map_create(map_type, map_name, key_size, value_size, max_entries, opts);
+ if (!ASSERT_LT(fd, 0, "bpf_map_create")) {
+ close(fd);
+ return;
+ }
+
+ ASSERT_STREQ(log_buf, exp_msg, "log_buf");
+ ASSERT_EQ(log_opts.log_true_size, strlen(exp_msg) + 1, "log_true_size");
+}
+
+static void test_map_create_array(struct bpf_map_create_opts *opts, const char *exp_msg)
+{
+ test_map_create(BPF_MAP_TYPE_ARRAY, "test_map_create", opts, exp_msg);
+}
+
+static void test_invalid_vmlinux_value_type_id_struct_ops(void)
+{
+ const char *msg = "btf_vmlinux_value_type_id can only be used with struct_ops maps.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .btf_vmlinux_value_type_id = 1,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_vmlinux_value_type_id_kv_type_id(void)
+{
+ const char *msg = "btf_vmlinux_value_type_id is mutually exclusive with btf_key_type_id and btf_value_type_id.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .btf_vmlinux_value_type_id = 1,
+ .btf_key_type_id = 1,
+ );
+
+ test_map_create(BPF_MAP_TYPE_STRUCT_OPS, "test_map_create", &opts, msg);
+}
+
+static void test_invalid_value_type_id(void)
+{
+ const char *msg = "Invalid btf_value_type_id.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .btf_key_type_id = 1,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_map_extra(void)
+{
+ const char *msg = "Invalid map_extra.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .map_extra = 1,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_numa_node(void)
+{
+ const char *msg = "Invalid numa_node.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .map_flags = BPF_F_NUMA_NODE,
+ .numa_node = 0xFF,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_map_type(void)
+{
+ const char *msg = "Invalid map_type.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts);
+
+ test_map_create(__MAX_BPF_MAP_TYPE, "test_map_create", &opts, msg);
+}
+
+static void test_invalid_token_fd(void)
+{
+ const char *msg = "Invalid map_token_fd.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .map_flags = BPF_F_TOKEN_FD,
+ .token_fd = 0xFF,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+static void test_invalid_map_name(void)
+{
+ const char *msg = "Invalid map_name.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts);
+
+ test_map_create(BPF_MAP_TYPE_ARRAY, "test-!@#", &opts, msg);
+}
+
+static void test_invalid_btf_fd(void)
+{
+ const char *msg = "Invalid btf_fd.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .btf_fd = -1,
+ .btf_key_type_id = 1,
+ .btf_value_type_id = 1,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+static void test_excl_prog_hash_size_1(void)
+{
+ const char *msg = "Invalid excl_prog_hash_size.\n";
+ const char *hash = "DEADCODE";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .excl_prog_hash = hash,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+static void test_excl_prog_hash_size_2(void)
+{
+ const char *msg = "Invalid excl_prog_hash_size.\n";
+ LIBBPF_OPTS(bpf_map_create_opts, opts,
+ .excl_prog_hash_size = 1,
+ );
+
+ test_map_create_array(&opts, msg);
+}
+
+void test_map_create_failure(void)
+{
+ if (test__start_subtest("invalid_vmlinux_value_type_id_struct_ops"))
+ test_invalid_vmlinux_value_type_id_struct_ops();
+ if (test__start_subtest("invalid_vmlinux_value_type_id_kv_type_id"))
+ test_invalid_vmlinux_value_type_id_kv_type_id();
+ if (test__start_subtest("invalid_value_type_id"))
+ test_invalid_value_type_id();
+ if (test__start_subtest("invalid_map_extra"))
+ test_invalid_map_extra();
+ if (test__start_subtest("invalid_numa_node"))
+ test_invalid_numa_node();
+ if (test__start_subtest("invalid_map_type"))
+ test_invalid_map_type();
+ if (test__start_subtest("invalid_token_fd"))
+ test_invalid_token_fd();
+ if (test__start_subtest("invalid_map_name"))
+ test_invalid_map_name();
+ if (test__start_subtest("invalid_btf_fd"))
+ test_invalid_btf_fd();
+ if (test__start_subtest("invalid_excl_prog_hash_size_1"))
+ test_excl_prog_hash_size_1();
+ if (test__start_subtest("invalid_excl_prog_hash_size_2"))
+ test_excl_prog_hash_size_2();
+}
--
2.52.0
^ permalink raw reply related
* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Alexei Starovoitov @ 2026-01-23 3:55 UTC (permalink / raw)
To: Leon Hwang
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
Amery Hung, Rong Tao, LKML, Linux API,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <20260123032445.125259-3-leon.hwang@linux.dev>
On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
> +static int probe_bpf_syscall_common_attrs(int token_fd)
> +{
> + int ret;
> +
> + ret = probe_sys_bpf_ext();
> + return ret > 0;
> +}
When you look at the above, what thoughts come to mind?
... and please don't use ai for answers.
^ permalink raw reply
* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Leon Hwang @ 2026-01-23 4:06 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
Amery Hung, Rong Tao, LKML, Linux API,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <CAADnVQJLz+nMHCZXUgy2MOxwFczEHNbG8ZUgfZeUY4yXFUKcNw@mail.gmail.com>
On 23/1/26 11:55, Alexei Starovoitov wrote:
> On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>> +static int probe_bpf_syscall_common_attrs(int token_fd)
>> +{
>> + int ret;
>> +
>> + ret = probe_sys_bpf_ext();
>> + return ret > 0;
>> +}
>
> When you look at the above, what thoughts come to mind?
>
> ... and please don't use ai for answers.
My initial thought was whether probe_fd() is needed here to handle and
close a returned fd, since the return value of probe_sys_bpf_ext() isn’t
obvious from the call site.
Thanks,
Leon
^ permalink raw reply
* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Alexei Starovoitov @ 2026-01-23 4:12 UTC (permalink / raw)
To: Leon Hwang
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
Amery Hung, Rong Tao, LKML, Linux API,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <a0ce1dab-7d7e-4b04-a033-4f0611090d34@linux.dev>
On Thu, Jan 22, 2026 at 8:07 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 23/1/26 11:55, Alexei Starovoitov wrote:
> > On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >> +static int probe_bpf_syscall_common_attrs(int token_fd)
> >> +{
> >> + int ret;
> >> +
> >> + ret = probe_sys_bpf_ext();
> >> + return ret > 0;
> >> +}
> >
> > When you look at the above, what thoughts come to mind?
> >
> > ... and please don't use ai for answers.
>
> My initial thought was whether probe_fd() is needed here to handle and
> close a returned fd, since the return value of probe_sys_bpf_ext() isn’t
> obvious from the call site.
Fair enough, but then collapse it into one helper if FD is a concern.
My question was about stylistic/taste preferences.
^ permalink raw reply
* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Leon Hwang @ 2026-01-23 4:19 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
Amery Hung, Rong Tao, LKML, Linux API,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <CAADnVQ+HJkOikzE3KPhOkd1KNugs7=1dZKY1mfog-ez8noyrDA@mail.gmail.com>
On 23/1/26 12:12, Alexei Starovoitov wrote:
> On Thu, Jan 22, 2026 at 8:07 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 23/1/26 11:55, Alexei Starovoitov wrote:
>>> On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>>
>>>> +static int probe_bpf_syscall_common_attrs(int token_fd)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + ret = probe_sys_bpf_ext();
>>>> + return ret > 0;
>>>> +}
>>>
>>> When you look at the above, what thoughts come to mind?
>>>
>>> ... and please don't use ai for answers.
>>
>> My initial thought was whether probe_fd() is needed here to handle and
>> close a returned fd, since the return value of probe_sys_bpf_ext() isn’t
>> obvious from the call site.
>
> Fair enough, but then collapse it into one helper if FD is a concern.
> My question was about stylistic/taste preferences.
Understood, thanks for the clarification.
I’ll rework it with the stylistic preference in mind.
Thanks,
Leon
^ permalink raw reply
* Re: [PATCH 0/2] mount: add OPEN_TREE_NAMESPACE
From: Christian Brauner @ 2026-01-23 10:23 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Jeff Layton, Askar Safin, amir73il, cyphar, jack, josef,
linux-fsdevel, viro, Lennart Poettering, David Howells,
Yunkai Zhang, cgel.zte, Menglong Dong, linux-kernel, initramfs,
containers, linux-api, news, lwn, Jonathan Corbet, Rob Landley,
emily, Christoph Hellwig
In-Reply-To: <CALCETrUZC+sdfpVqqjeC_pqmd+-W84Rq7ron8Vx9MaSSohhJ2g@mail.gmail.com>
On Wed, Jan 21, 2026 at 10:00:19AM -0800, Andy Lutomirski wrote:
> > On Jan 19, 2026, at 2:21 PM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Mon, 2026-01-19 at 11:05 -0800, Andy Lutomirski wrote:
> >>> On Mon, Jan 19, 2026 at 10:56 AM Askar Safin <safinaskar@gmail.com> wrote:
> >>>
> >>> Christian Brauner <brauner@kernel.org>:
> >>>> Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to
> >>>> OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of
> >>>> returning a file descriptor referring to that mount tree
> >>>> OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor
> >>>> to a new mount namespace. In that new mount namespace the copied mount
> >>>> tree has been mounted on top of a copy of the real rootfs.
> >>>
> >>> I want to point at security benefits of this.
> >>>
> >>> [[ TL;DR: [1] and [2] are very big changes to how mount namespaces work.
> >>> I like them, and I think they should get wider exposure. ]]
> >>>
> >>> If this patchset ([1]) and [2] both land (they are both in "next" now and
> >>> likely will be submitted to mainline soon) and "nullfs_rootfs" is passed on
> >>> command line, then mount namespace created by open_tree(OPEN_TREE_NAMESPACE) will
> >>> usually contain exactly 2 mounts: nullfs and whatever was passed to
> >>> open_tree(OPEN_TREE_NAMESPACE).
> >>>
> >>> This means that even if attacker somehow is able to unmount its root and
> >>> get access to underlying mounts, then the only underlying thing they will
> >>> get is nullfs.
> >>>
> >>> Also this means that other mounts are not only hidden in new namespace, they
> >>> are fully absent. This prevents attacks discussed here: [3], [4].
> >>>
> >>> Also this means that (assuming we have both [1] and [2] and "nullfs_rootfs"
> >>> is passed), there is no anymore hidden writable mount shared by all containers,
> >>> potentially available to attackers. This is concern raised in [5]:
> >>>
> >>>> You want rootfs to be a NULLFS instead of ramfs. You don't seem to want it to
> >>>> actually _be_ a filesystem. Even with your "fix", containers could communicate
> >>>> with each _other_ through it if it becomes accessible. If a container can get
> >>>> access to an empty initramfs and write into it, it can ask/answer the question
> >>>> "Are there any other containers on this machine running stux24" and then coordinate.
> >>
> >> I think this new OPEN_TREE_NAMESPACE is nifty, but I don't think the
> >> path that gives it sensible behavior should be conditional like this.
> >> Either make it *always* mount on top of nullfs (regardless of boot
> >> options) or find some way to have it actually be the root. I assume
> >> the latter is challenging for some reason.
> >>
> >
> > I think that's the plan. I suggested the same to Christian last week,
> > and he was amenable to removing the option and just always doing a
> > nullfs_rootfs mount.
> >
> > We think that older runtimes should still "just work" with this scheme.
> > Out of an abundance of caution, we _might_ want a command-line option
> > to make it go back to old way, in case we find some userland stuff that
> > doesn't like this for some reason, but hopefully we won't even need
> > that.
>
> What I mean is: even if for some reason the kernel is running in a
> mode where the *initial* rootfs is a real fs, I think it would be nice
> for OPEN_TREE_NAMESPACE to use nullfs.
The current patchset makes nullfs unconditional. As each mount
namespaces creates a new copy of the namespace root of the namespace it
was created from all mount namespace have nullfs as namespace root.
So every OPEN_TREE_NAMESPACE/FSMOUNT_NAMESPACE will be mounted on top of
nullfs as we always take the namespace root. If we have to make nullfs
conditional then yes, we could still do that - althoug it would be ugly
in various ways.
I would love to keep nullfs unconditional because it means I can wipe a
whole class of MNT_LOCKED nonsense from the face of the earth
afterwards.
^ permalink raw reply
* Re: [PATCH v7 01/16] fs: Add case sensitivity flags to file_kattr
From: Jan Kara @ 2026-01-23 12:51 UTC (permalink / raw)
To: Chuck Lever
Cc: Al Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-ext4,
linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever
In-Reply-To: <20260122160311.1117669-2-cel@kernel.org>
On Thu 22-01-26 11:02:56, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Enable upper layers such as NFSD to retrieve case sensitivity
> information from file systems by adding FS_XFLAG_CASEFOLD and
> FS_XFLAG_CASENONPRESERVING flags.
>
> Filesystems report case-insensitive or case-nonpreserving behavior
> by setting these flags directly in fa->fsx_xflags. The default
> (flags unset) indicates POSIX semantics: case-sensitive and
> case-preserving. These flags are read-only; userspace cannot set
> them via ioctl.
>
> Remove struct file_kattr initialization from fileattr_fill_xflags()
> and fileattr_fill_flags(). Callers at ioctl/syscall entry points
> zero-initialize the struct themselves, which allows them to pass
> hints (flags_valid, fsx_valid) to the filesystem's ->fileattr_get()
> callback via the fa argument. Filesystem handlers that invoke these
> fill functions can now set flags directly in fa->fsx_xflags before
> calling them, without the fill functions zeroing those values.
>
> Case sensitivity information is exported to userspace via the
> fa_xflags field in the FS_IOC_FSGETXATTR ioctl and file_getattr()
> system call.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This scheme looks good. But AFAICT declared 'fa' needs to be zeroed-out
also in file_getattr()? Otherwise the patch looks good to me.
Honza
> @@ -323,7 +319,7 @@ int ioctl_setflags(struct file *file, unsigned int __user *argp)
> {
> struct mnt_idmap *idmap = file_mnt_idmap(file);
> struct dentry *dentry = file->f_path.dentry;
> - struct file_kattr fa;
> + struct file_kattr fa = {};
> unsigned int flags;
> int err;
>
> @@ -355,7 +351,7 @@ int ioctl_fssetxattr(struct file *file, void __user *argp)
> {
> struct mnt_idmap *idmap = file_mnt_idmap(file);
> struct dentry *dentry = file->f_path.dentry;
> - struct file_kattr fa;
> + struct file_kattr fa = {};
> int err;
>
> err = copy_fsxattr_from_user(&fa, argp);
> @@ -434,7 +430,7 @@ SYSCALL_DEFINE5(file_setattr, int, dfd, const char __user *, filename,
> struct filename *name __free(putname) = NULL;
> unsigned int lookup_flags = 0;
> struct file_attr fattr;
> - struct file_kattr fa;
> + struct file_kattr fa = {};
> int error;
>
> BUILD_BUG_ON(sizeof(struct file_attr) < FILE_ATTR_SIZE_VER0);
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 59eaad774371..f0417c4d1fca 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -496,7 +496,7 @@ xfs_ioc_fsgetxattra(
> xfs_inode_t *ip,
> void __user *arg)
> {
> - struct file_kattr fa;
> + struct file_kattr fa = {};
>
> xfs_ilock(ip, XFS_ILOCK_SHARED);
> xfs_fill_fsxattr(ip, XFS_ATTR_FORK, &fa);
> diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
> index f89dcfad3f8f..709de829659f 100644
> --- a/include/linux/fileattr.h
> +++ b/include/linux/fileattr.h
> @@ -16,7 +16,8 @@
>
> /* Read-only inode flags */
> #define FS_XFLAG_RDONLY_MASK \
> - (FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR)
> + (FS_XFLAG_PREALLOC | FS_XFLAG_HASATTR | \
> + FS_XFLAG_CASEFOLD | FS_XFLAG_CASENONPRESERVING)
>
> /* Flags to indicate valid value of fsx_ fields */
> #define FS_XFLAG_VALUES_MASK \
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 66ca526cf786..919148beaa8c 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -253,6 +253,8 @@ struct file_attr {
> #define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
> #define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
> #define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
> +#define FS_XFLAG_CASEFOLD 0x00020000 /* case-insensitive lookups */
> +#define FS_XFLAG_CASENONPRESERVING 0x00040000 /* case not preserved */
> #define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
>
> /* the read-only stuff doesn't really belong here, but any other place is
> --
> 2.52.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Zack Weinberg @ 2026-01-23 14:05 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
In-Reply-To: <aXLGdWGTrYo1s6v7@devuan>
On Thu, Jan 22, 2026, at 8:02 PM, Alejandro Colomar wrote:
> On Thu, Jan 22, 2026 at 07:33:58PM -0500, Zack Weinberg wrote:
> [...]
>
>> (Alejandro, do you have a preference between -man
>> and -mdoc markup?)
>
> Strong preference for man(7).
OK.
>> close(),” below.
>
> Punctuation like commas should go outside of the quotes (yes, I know
> some styles do that, but we don't).
Will correct.
>> HISTORY
>> The close() system call was present in Unix V7.
>
> That would be simply stated as:
>
> V7.
Looking at other really old system calls (fork(), open(), read(), _exit(), link()),
they all say "SVr4, 4.3BSD, POSIX.1-2001" and that's what this one said too,
before I changed it. I think I'll put it back the way it was.
zw
^ permalink raw reply
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Alejandro Colomar @ 2026-01-23 14:44 UTC (permalink / raw)
To: Al Viro
Cc: Zack Weinberg, Vincent Lefevre, Jan Kara, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
In-Reply-To: <20260123013859.GI3183987@ZenIV>
[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]
Hi Al,
On Fri, Jan 23, 2026 at 01:38:59AM +0000, Al Viro wrote:
> On Fri, Jan 23, 2026 at 02:02:53AM +0100, Alejandro Colomar wrote:
> > > HISTORY
> > > The close() system call was present in Unix V7.
> >
> > That would be simply stated as:
> >
> > V7.
> >
> > We could also document the first POSIX standard, as not all Unix APIs
> > were standardized at the same time. Thus:
> >
> > V7, POSIX.1-1988.
> >
> > Thanks!
>
> 11/3/71 SYS CLOSE (II)
> NAME close -- close a file
> SYNOPSIS (file descriptor in r0)
> sys close / close = 6.
> DESCRIPTION Given a file descriptor such as returned from an open or
> creat call, close closes the associated file. A close of
> all files is automatic on exit, but since processes are
> limited to 10 simultaneously open files, close is
> necessary to programs which deal with many files.
> FILES
> SEE ALSO creat, open
> DIAGNOSTICS The error bit (c—bit) is set for an unknown file
> descriptor.
> BUGS
> OWNER ken, dmr
>
> That's V1 manual. In V3 we already get EBADF on unopened descriptor;
> in _all_ cases there close(N) ends up with descriptor N not opened.
Thanks! Then it should actually be
V1, POSIX.1-1988.
Let's not document the history change from V3, as those details are
better documented as part of the V3 manual and reading the sources.
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v7 07/16] ext4: Report case sensitivity in fileattr_get
From: Chuck Lever @ 2026-01-23 15:49 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko, glaubitz,
frank.li, Theodore Tso, adilger.kernel, Carlos Maiolino,
Steve French, Paulo Alcantara, Ronnie Sahlberg, Shyam Prasad N,
Trond Myklebust, Anna Schumaker, Jaegeuk Kim, Chao Yu,
Hans de Goede, senozhatsky, Chuck Lever
In-Reply-To: <20260123002904.GM5945@frogsfrogsfrogs>
On Thu, Jan 22, 2026, at 7:29 PM, Darrick J. Wong wrote:
> On Thu, Jan 22, 2026 at 11:03:02AM -0500, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Report ext4's case sensitivity behavior via the FS_XFLAG_CASEFOLD
>> flag. ext4 always preserves case at rest.
>>
>> Case sensitivity is a per-directory setting in ext4. If the queried
>> inode is a casefolded directory, report case-insensitive; otherwise
>> report case-sensitive (standard POSIX behavior).
>>
>> Reviewed-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/ext4/ioctl.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
>> index 7ce0fc40aec2..462da7aadc80 100644
>> --- a/fs/ext4/ioctl.c
>> +++ b/fs/ext4/ioctl.c
>> @@ -996,6 +996,13 @@ int ext4_fileattr_get(struct dentry *dentry, struct file_kattr *fa)
>> if (ext4_has_feature_project(inode->i_sb))
>> fa->fsx_projid = from_kprojid(&init_user_ns, ei->i_projid);
>>
>> + /*
>> + * Case folding is a directory attribute in ext4. Set FS_XFLAG_CASEFOLD
>> + * for directories with the casefold attribute; all other inodes use
>> + * standard case-sensitive semantics.
>> + */
>> + if (IS_CASEFOLDED(inode))
>> + fa->fsx_xflags |= FS_XFLAG_CASEFOLD;
>
> Curious. Shouldn't the VFS set FS_XFLAG_CASEFOLD if the VFS casefolding
> flag is set?
>
> OTOH, there are more filesystems that apparently support casefolding
> (given the size of this patchset) than actually set S_CASEFOLD. I think
> I'm ignorant of something here...
I'm not clear if there's a review action needed. Help?
--
Chuck Lever
^ permalink raw reply
* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Andrii Nakryiko @ 2026-01-23 18:52 UTC (permalink / raw)
To: Leon Hwang
Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
Amery Hung, Rong Tao, LKML, Linux API,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <419976da-f296-4418-8dfe-8ad50a9f8cb5@linux.dev>
On Thu, Jan 22, 2026 at 8:19 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 23/1/26 12:12, Alexei Starovoitov wrote:
> > On Thu, Jan 22, 2026 at 8:07 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >>
> >>
> >> On 23/1/26 11:55, Alexei Starovoitov wrote:
> >>> On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>>>
> >>>>
> >>>> +static int probe_bpf_syscall_common_attrs(int token_fd)
> >>>> +{
> >>>> + int ret;
> >>>> +
> >>>> + ret = probe_sys_bpf_ext();
> >>>> + return ret > 0;
> >>>> +}
> >>>
> >>> When you look at the above, what thoughts come to mind?
> >>>
> >>> ... and please don't use ai for answers.
> >>
> >> My initial thought was whether probe_fd() is needed here to handle and
> >> close a returned fd, since the return value of probe_sys_bpf_ext() isn’t
> >> obvious from the call site.
Have you looked at how probes are called (in feat_supported()?) They
all follow the same contract: > 0 (normally just 1) means feature is
supported, 0 means feature is not supported, and <0 means something
went wrong. Libbpf will log an error and will assume feature is not
supported.
probe_sys_bpf_ext() should either follow that convention or drop the
probe_ prefix altogether to avoid confusion. And then
probe_bpf_syscall_common_attrs() is necessary only as a wrapper around
probe_sys_bpf_ext() to ignore mandatory (but unused) token_fd argument
(so to make it "pluggable" into feat_supported() framework).
So, just make probe_sys_bpf_ext() follow probe contract as described,
and then just:
static int probe_bpf_syscall_common_attr(int token_fd)
{
return probe_sys_bpf_ext();
}
Alternatively, just make probe_sys_bpf_ext() take token_fd (but ignore
it), and just use probe_sys_bpf_ext() directly for feat_supported().
probe_fd() is not suitable here because it's for a common case when we
expect syscall to succeed and create fd, in which case that successful
fd represents successful feature detection. This is not the case here,
so probe_fd() is not what you should use.
> >
> > Fair enough, but then collapse it into one helper if FD is a concern.
> > My question was about stylistic/taste preferences.
>
> Understood, thanks for the clarification.
>
> I’ll rework it with the stylistic preference in mind.
>
> Thanks,
> Leon
>
^ permalink raw reply
* Re: [RESEND PATCH bpf-next v6 2/9] libbpf: Add support for extended bpf syscall
From: Andrii Nakryiko @ 2026-01-23 18:54 UTC (permalink / raw)
To: Leon Hwang
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
Amery Hung, Rong Tao, linux-kernel, linux-api, linux-kselftest,
kernel-patches-bot
In-Reply-To: <d8f37588-2b7d-447a-ae4f-dc81e1b573c5@linux.dev>
On Thu, Jan 22, 2026 at 5:41 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 23/1/26 08:53, Andrii Nakryiko wrote:
> > On Tue, Jan 20, 2026 at 7:26 AM Leon Hwang <leon.hwang@linux.dev> wrote:
> >>
> >> To support the extended BPF syscall introduced in the previous commit,
> >> introduce the following internal APIs:
> >>
> >> * 'sys_bpf_ext()'
> >> * 'sys_bpf_ext_fd()'
> >> They wrap the raw 'syscall()' interface to support passing extended
> >> attributes.
> >> * 'probe_sys_bpf_ext()'
> >> Check whether current kernel supports the BPF syscall common attributes.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >> tools/lib/bpf/bpf.c | 32 ++++++++++++++++++++++++++++++++
> >> tools/lib/bpf/features.c | 8 ++++++++
> >> tools/lib/bpf/libbpf_internal.h | 3 +++
> >> 3 files changed, 43 insertions(+)
> >>
> >> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >> index 21b57a629916..ed9c6eaeb656 100644
> >> --- a/tools/lib/bpf/bpf.c
> >> +++ b/tools/lib/bpf/bpf.c
> >> @@ -69,6 +69,38 @@ static inline __u64 ptr_to_u64(const void *ptr)
> >> return (__u64) (unsigned long) ptr;
> >> }
> >>
> >> +static inline int sys_bpf_ext(enum bpf_cmd cmd, union bpf_attr *attr,
> >> + unsigned int size,
> >> + struct bpf_common_attr *attr_common,
> >> + unsigned int size_common)
> >> +{
> >> + cmd = attr_common ? (cmd | BPF_COMMON_ATTRS) : (cmd & ~BPF_COMMON_ATTRS);
> >> + return syscall(__NR_bpf, cmd, attr, size, attr_common, size_common);
> >> +}
> >> +
> >> +static inline int sys_bpf_ext_fd(enum bpf_cmd cmd, union bpf_attr *attr,
> >> + unsigned int size,
> >> + struct bpf_common_attr *attr_common,
> >> + unsigned int size_common)
> >> +{
> >> + int fd;
> >> +
> >> + fd = sys_bpf_ext(cmd, attr, size, attr_common, size_common);
> >> + return ensure_good_fd(fd);
> >> +}
> >> +
> >> +int probe_sys_bpf_ext(void)
> >> +{
> >> + const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
> >> + union bpf_attr attr;
> >> +
> >> + memset(&attr, 0, attr_sz);
> >> + /* This syscall() will return error always. */
> >
> > I'll cite myself from the last review:
> >
> >> But fd should really not be >= 0, and if it is -- it's some problem,
> >> so I'd return an error in that case to keep us aware, which is why I'm
> >> saying I'd just return inside if (fd >= 0) { }
> >
> > I didn't say let's just ignore syscall return with (void) cast and
> > happily check errno no matter what, did I? Drop the comment, and
> > handle fd >= 0 case explicitly, please.
> >
>
> My mistake — sorry for the misunderstanding.
>
> You’re right; the return value should not be ignored. In the next
> revision, I’ll handle the fd >= 0 case explicitly and drop the comment.
> The logic will be updated along the lines of:
>
> fd = syscall(__NR_bpf, BPF_PROG_LOAD | BPF_COMMON_ATTRS,
> &attr, attr_sz, NULL, sizeof(struct bpf_common_attr));
> if (fd >= 0) {
> close(fd);
> return 0;
> }
> return errno == EFAULT;
>
well no, it should be
fd = syscall(...);
if (fd >= 0) {
close(fd);
return -EINVAL;
}
return errno == EFAULT ? 1 : 0;
> Thanks,
> Leon
>
>
^ permalink raw reply
* Re: [PATCH 0/2] mount: add OPEN_TREE_NAMESPACE
From: Askar Safin @ 2026-01-24 10:13 UTC (permalink / raw)
To: Christian Brauner
Cc: Andy Lutomirski, Jeff Layton, amir73il, cyphar, jack, josef,
linux-fsdevel, viro, Lennart Poettering, David Howells,
Yunkai Zhang, cgel.zte, Menglong Dong, linux-kernel, initramfs,
containers, linux-api, news, lwn, Jonathan Corbet, Rob Landley,
Christoph Hellwig
In-Reply-To: <20260123-autofrei-einspannen-7e65a6100e6e@brauner>
On Fri, Jan 23, 2026 at 1:23 PM Christian Brauner <brauner@kernel.org> wrote:
> The current patchset makes nullfs unconditional. As each mount
Oops, I missed that "fs: use nullfs unconditionally as the real
rootfs" is present in vfs.all.
--
Askar Safin
^ permalink raw reply
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: The 8472 @ 2026-01-24 19:34 UTC (permalink / raw)
To: Zack Weinberg, Alejandro Colomar
Cc: Vincent Lefevre, Jan Kara, Alexander Viro, Christian Brauner,
Rich Felker, linux-fsdevel, linux-api, GNU libc development
In-Reply-To: <1ec25e49-841e-4b04-911d-66e3b9ff4471@app.fastmail.com>
On 23/01/2026 01:33, Zack Weinberg wrote:
[...]
> ERRORS
> EBADF The fd argument was not a valid, open file descriptor.
Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
on close[0], that makes it more difficult to reliably detect bugs relating
to double-closes of file descriptors.
[...]
> Delayed errors reported by close()
>
> In a variety of situations, most notably when writing to a file
> that is hosted on a network file server, write(2) operations may
> “optimistically” return successfully as soon as the write has
> been queued for processing.
>
> close(2) waits for confirmation that *most* of the processing
> for previous writes to a file has been completed, and reports
> any errors that the earlier write() calls *would have* reported,
> if they hadn’t returned optimistically. Especially, close()
> will report “disk full” (ENOSPC) and “disk quota exceeded”
> (EDQUOT) errors that write() didn’t wait for.
>
> (To wait for *all* processing to complete, it is necessary to
> use fsync(2) as well.)
>
> Because of these delayed errors, it’s important to check the
> return value of close() and handle any errors it reports.
> Ignoring delayed errors can cause silent loss of data.
>
> However, when handling delayed errors, keep in mind that the
> close() call should *not* be repeated. When close() has a
> delayed error to report, it still closes the file before
> returning. The file descriptor number might already have been
> reused for some other file, especially in multithreaded
> programs. To make another attempt at the failed writes, it’s
> necessary to reopen the file and start all over again.
>
> [QUERY: Do delayed errors ever happen in any of these situations?
>
> - The fd is not the last reference to the open file description
>
> - The OFD was opened with O_RDONLY
>
> - The OFD was opened with O_RDWR but has never actually
> been written to
>
> - No data has been written to the OFD since the last call to
> fsync() for that OFD
>
> - No data has been written to the OFD since the last call to
> fdatasync() for that OFD
>
> If we can give some guidance about when people don’t need to
> worry about delayed errors, it would be helpful.]
>
The Rust standard library team is also interested in this topic, there
is lively discussion[1] whether it makes sense to surface errors from
close at all. Our current default is to ignore them.
It is my understanding that errors may not have happened yet at
the time of close due to delayed writeback or additional descriptors
pointing to the description, e.g. in a forked child, and thus
close() is not a reliable mechanism for error detection and
fsync() is the only available option.
Some users do care specifically about the unusual behavior
on NFS, and don't want to use a heavy hammer like fsync. It's unfortunate
that there's no middle ground to get errors on an open file descriptor
or initiate the NFS flush behavior without a full fsync.
[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/libs-team/issues/705
^ permalink raw reply
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Rich Felker @ 2026-01-24 21:39 UTC (permalink / raw)
To: The 8472
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
In-Reply-To: <0f60995f-370f-4c2d-aaa6-731716657f9d@infinite-source.de>
On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
> On 23/01/2026 01:33, Zack Weinberg wrote:
>
> [...]
>
> > ERRORS
> > EBADF The fd argument was not a valid, open file descriptor.
>
> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
> on close[0], that makes it more difficult to reliably detect bugs relating
> to double-closes of file descriptors.
Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
it? I wonder if that could even have security implications. I think
you could detect these fraudulent EBADFs (albeit not under conditions
where there's a race bug) by performing fcntl/F_GETFD before close and
knowing the EBADF from close is fake is fcntl didn't EBADF, but that
seems like an unreasonable cost to work around FUSE behaving badly.
Rich
^ permalink raw reply
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: The 8472 @ 2026-01-24 21:57 UTC (permalink / raw)
To: Rich Felker
Cc: Zack Weinberg, Alejandro Colomar, Vincent Lefevre, Jan Kara,
Alexander Viro, Christian Brauner, linux-fsdevel, linux-api,
GNU libc development
In-Reply-To: <20260124213934.GI6263@brightrain.aerifal.cx>
On 24/01/2026 22:39, Rich Felker wrote:
> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>
>> [...]
>>
>>> ERRORS
>>> EBADF The fd argument was not a valid, open file descriptor.
>>
>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>> on close[0], that makes it more difficult to reliably detect bugs relating
>> to double-closes of file descriptors.
>
> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
> it?
Not when I brought it up last time, no[0]
> I wonder if that could even have security implications. I think
> you could detect these fraudulent EBADFs (albeit not under conditions
> where there's a race bug) by performing fcntl/F_GETFD before close and
> knowing the EBADF from close is fake is fcntl didn't EBADF, but that
> seems like an unreasonable cost to work around FUSE behaving badly.
>
> Rich
That's pretty much the workaround[1] we use, but due to the extra syscall it's
only done in debug builds.
[0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
[1] https://github.com/rust-lang/rust/blob/021fc25b7a48f6051bee1e1f06c7a277e4de1cc9/library/std/src/sys/fs/unix.rs#L981-L999
^ permalink raw reply
* Re: [RFC v1] man/man2/close.2: CAVEATS: Document divergence from POSIX.1-2024
From: Zack Weinberg @ 2026-01-25 15:37 UTC (permalink / raw)
To: The 8472, Rich Felker
Cc: Alejandro Colomar, Vincent Lefevre, Jan Kara, Alexander Viro,
Christian Brauner, linux-fsdevel, linux-api, GNU libc development
In-Reply-To: <7654b75b-6697-4aad-93fc-29fa9b734bdb@infinite-source.de>
On Sat, Jan 24, 2026, at 4:57 PM, The 8472 wrote:
> On 24/01/2026 22:39, Rich Felker wrote:
>> On Sat, Jan 24, 2026 at 08:34:01PM +0100, The 8472 wrote:
>>> On 23/01/2026 01:33, Zack Weinberg wrote:
>>>
>>> [...]
>>>
>>>> ERRORS
>>>> EBADF The fd argument was not a valid, open file descriptor.
>>>
>>> Unfortunately EBADF from FUSE is passed through unfiltered by the kernel
>>> on close[0], that makes it more difficult to reliably detect bugs relating
>>> to double-closes of file descriptors.
>>
>> Wow, that's a nasty bug. Are the kernel folks not amenable to fixing
>> it?
>
> Not when I brought it up last time, no[0]
>
> [0] https://lore.kernel.org/linux-fsdevel/1b946a20-5e8a-497e-96ef-f7b1e037edcb@infinite-source.de/
It seems to me that Antonio Muscemi’s point is valid for *most* errno
codes. Like, a whole lot of them exist just to give more information
*to a human user* about the cause of an unrecoverable error. Take
the list of “error codes that indicate a delayed error from a previous
write(2) operation,” from a little later in the draft, for instance:
there’s no plausible way for a *program* to react differently to
EFBIG, EDQUOT, and ENOSPC, but we expect that the *user* will want
to react differently, so we want different error messages for each,
so they’re different error codes. It’s not a problem if the kernel
produces an error code of this type that wasn’t in the official
documented list, because the program doesn’t need to treat it specially.
But EBADF is different; it has the very specific meaning “user space
passed an invalid file descriptor to a system call,” which almost
always indicates a *bug in the program*, and allowing that meaning to
be diluted is not OK. It’s getting off topic for this conversation,
but there’s a short list of other errno codes that indicate a specific
situation that the *program* should respond to in a specific way
(EAGAIN, EINTR, EINPROGRESS, EFAULT, and EPIPE are the only ones
I can think of) and maybe it would spark a more constructive
conversation on the kernel side if we presented a *comprehensive*
list of errno codes that FUSE servers shouldn’t be allowed to produce
with a specific rationale for each.
>> Delayed errors reported by close()
>>
>> In a variety of situations, most notably when writing to a file
>> that is hosted on a network file server, write(2) operations may
>> “optimistically” return successfully as soon as the write has
>> been queued for processing.
>>
>> close(2) waits for confirmation that *most* of the processing
>> for previous writes to a file has been completed, and reports
>> any errors that the earlier write() calls *would have* reported,
>> if they hadn’t returned optimistically. Especially, close()
>> will report “disk full” (ENOSPC) and “disk quota exceeded”
>> (EDQUOT) errors that write() didn’t wait for.
>
> The Rust standard library team is also interested in this topic, there
> is lively discussion[1] whether it makes sense to surface errors from
> close at all. Our current default is to ignore them.
> It is my understanding that errors may not have happened yet at
> the time of close due to delayed writeback or additional descriptors
> pointing to the description, e.g. in a forked child, and thus
> close() is not a reliable mechanism for error detection and
> fsync() is the only available option.
>
> [1] https://github.com/rust-lang/libs-team/issues/705
This is something I care about a lot as well, but I currently don’t
have an *opinion*. To form an informed opinion, I need the answers
to these questions:
>> [QUERY: Do delayed errors ever happen in any of these situations?
>>
>> - The fd is not the last reference to the open file description
>>
>> - The OFD was opened with O_RDONLY
>>
>> - The OFD was opened with O_RDWR but has never actually
>> been written to
>>
>> - No data has been written to the OFD since the last call to
>> fsync() for that OFD
>>
>> - No data has been written to the OFD since the last call to
>> fdatasync() for that OFD
>>
>> If we can give some guidance about when people don’t need to
>> worry about delayed errors, it would be helpful.]
In particular, I really hope delayed errors *aren’t* ever reported
when you close a file descriptor that *isn’t* the last reference
to its open file description, because the thread-safe way to close
stdout without losing write errors[2] depends on that not happening.
And whether the Rust stdlib can legitimately say “leaving aside the
additional cost of calling fsync(), you do not *need* the error return
from close() because you can call fsync() first,” depends on whether
it’s actually true that you *won’t* ever get a delayed error from
close() if you called fsync() first and didn’t do any more output in
between (assume the fd has no duplicates here). I would not be
surprised at all if those FUSE guys insisted on their right to make
char msg[] = "soon I will be invincible\n";
int fd = open("/test-fuse-fs/test.txt", O_WRONLY, 0666);
write(fd, msg, sizeof(msg) - 1);
fsync(fd);
close(fd);
return an error *only* from the close, not the write or the fsync.
And I also wouldn’t be surprised at all to find production NFS or
SMB servers that did that.
[2] https://stackoverflow.com/a/50865617 (third code block)
zw
^ permalink raw reply
* Re: [PATCH bpf-next v7 2/9] libbpf: Add support for extended bpf syscall
From: Leon Hwang @ 2026-01-26 2:03 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
Christian Brauner, Seth Forshee, Yuichiro Tsuji,
Andrey Albershteyn, Willem de Bruijn, Jason Xing, Tao Chen,
Mykyta Yatsenko, Kumar Kartikeya Dwivedi, Anton Protopopov,
Amery Hung, Rong Tao, LKML, Linux API,
open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot
In-Reply-To: <CAEf4BzYhhf7Jd6DDr2XVf=3gKeMMmrkWW9Sr49QxuW6QudSKig@mail.gmail.com>
On 24/1/26 02:52, Andrii Nakryiko wrote:
> On Thu, Jan 22, 2026 at 8:19 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 23/1/26 12:12, Alexei Starovoitov wrote:
>>> On Thu, Jan 22, 2026 at 8:07 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>
>>>>
>>>>
>>>> On 23/1/26 11:55, Alexei Starovoitov wrote:
>>>>> On Thu, Jan 22, 2026 at 7:25 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>>>>>
>>>>>>
>>>>>> +static int probe_bpf_syscall_common_attrs(int token_fd)
>>>>>> +{
>>>>>> + int ret;
>>>>>> +
>>>>>> + ret = probe_sys_bpf_ext();
>>>>>> + return ret > 0;
>>>>>> +}
>>>>>
>>>>> When you look at the above, what thoughts come to mind?
>>>>>
>>>>> ... and please don't use ai for answers.
>>>>
>>>> My initial thought was whether probe_fd() is needed here to handle and
>>>> close a returned fd, since the return value of probe_sys_bpf_ext() isn’t
>>>> obvious from the call site.
>
> Have you looked at how probes are called (in feat_supported()?) They
> all follow the same contract: > 0 (normally just 1) means feature is
> supported, 0 means feature is not supported, and <0 means something
> went wrong. Libbpf will log an error and will assume feature is not
> supported.
>
I’ve looked at feat_supported().
Even though I was aware of the probe contract, I should have thought it
through more carefully in the context of feat_supported() and
probe_sys_bpf_ext(). With that in mind, your suggestion makes sense now.
> probe_sys_bpf_ext() should either follow that convention or drop the
> probe_ prefix altogether to avoid confusion. And then
> probe_bpf_syscall_common_attrs() is necessary only as a wrapper around
> probe_sys_bpf_ext() to ignore mandatory (but unused) token_fd argument
> (so to make it "pluggable" into feat_supported() framework).
>
> So, just make probe_sys_bpf_ext() follow probe contract as described,
> and then just:
>
> static int probe_bpf_syscall_common_attr(int token_fd)
> {
> return probe_sys_bpf_ext();
> }
>
I’ll make probe_sys_bpf_ext() follow the standard probe convention, and
keep probe_bpf_syscall_common_attrs() as a thin wrapper to ignore the
mandatory (but unused) token_fd argument, so it plugs cleanly into
feat_supported() framework.
> Alternatively, just make probe_sys_bpf_ext() take token_fd (but ignore
> it), and just use probe_sys_bpf_ext() directly for feat_supported().
>
>
> probe_fd() is not suitable here because it's for a common case when we
> expect syscall to succeed and create fd, in which case that successful
> fd represents successful feature detection. This is not the case here,
> so probe_fd() is not what you should use.
>
Agreed as well that probe_fd() is not suitable here, since this probe is
not expected to return a successful FD.
Thanks for the detailed explanation.
Thanks,
Leon
>>>
>>> Fair enough, but then collapse it into one helper if FD is a concern.
>>> My question was about stylistic/taste preferences.
>>
>> Understood, thanks for the clarification.
>>
>> I’ll rework it with the stylistic preference in mind.
>>
>> Thanks,
>> Leon
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox