Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next v3 1/2] bpf: add bpf_init_inode_xattr kfunc for atomic inode labeling
From: David Windsor @ 2026-06-18 22:50 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: viro, brauner, jack, ast, daniel, john.fastabend, andrii, eddyz87,
	memxor, martin.lau, song, yonghong.song, jolsa, emil, kpsingh,
	mattbobrowski, paul, jmorris, serge, zohar, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, stephen.smalley.work, omosnace,
	casey, shuah, linux-kernel, linux-fsdevel, bpf,
	linux-security-module, linux-integrity, selinux, linux-kselftest,
	martin.lau, clm, ihor.solodrai
In-Reply-To: <e9ec622a8a49be299cbbfd459e2fc10693bcd65c6dbf00c9e2645281564e5579@mail.kernel.org>

On Thu, Jun 18, 2026 at 5:22 PM <bot+bpf-ci@kernel.org> wrote:
>
> [ ... ]
> >
> > +struct xattr_ctx {
> > +     struct xattr *xattrs;
> > +     int *xattr_count;
> > +};
> > +
>
> Paul Moore suggested in v2 renaming this to 'struct lsm_xattrs' to avoid
> overloading the term 'ctx' in the LSM space. The struct was renamed from
> 'lsm_xattr_ctx' to 'xattr_ctx' in v3, but 'ctx' is still in the name. Was
> there a reason to keep 'ctx' in the name, or would renaming to 'lsm_xattrs'
> as Paul suggested be preferable?
>

Indeed he did; I mistakenly inverted his original request.

Will send a fix for this, as well as converting GFP_KERNEL to GFP_NOFS
in the allocation in __bpf_init_inode_xattr.


> >  /* Default (no) options for the capable function */
> >  #define CAP_OPT_NONE 0x0
> >  /* If capable should audit the security request */
>
> [ ... ]
>
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index 564071a92d7d..86a8e188b900 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -113,6 +113,9 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
> >  }
> >  #endif
> >
> > +BTF_ID_LIST_SINGLE(bpf_lsm_inode_init_security_btf_ids, func,
> > +                bpf_lsm_inode_init_security)
> > +
> >  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
> >                       const struct bpf_prog *prog)
> >  {
> > @@ -137,6 +140,12 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
> >               return -EINVAL;
> >       }
> >
> > +     /* bpf reserves a fixed number of xattr slots for itself.
> > +      * Set the attach limit so the trampoline rejects excess attaches.
> > +      */
>
> This isn't a bug, but a subsystem pattern flags this as potentially
> concerning: the BPF subsystem comment-style rule for files under kernel/bpf/
> requires multi-line comments to have the opening /* on its own line, with
> comment text starting on the next line. The preferred form would be:
>
>         /*
>          * bpf reserves a fixed number of xattr slots for itself.
>          * Set the attach limit so the trampoline rejects excess attaches.
>          */
>
> > +     if (btf_id == bpf_lsm_inode_init_security_btf_ids[0])
> > +             prog->aux->attach_limit = BPF_LSM_INODE_INIT_XATTRS;
> > +
> >       return 0;
> >  }
> >
>
> [ ... ]
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27788616397

^ permalink raw reply

* Re: [PATCH] KEYS: avoid filesystem reclaim while holding keyring->sem
From: Eric Biggers @ 2026-06-18 22:32 UTC (permalink / raw)
  To: Mohammed EL Kadiri
  Cc: dhowells, jarkko, paul, jmorris, serge, keyrings,
	linux-security-module, linux-kernel, stable, syzkaller-bugs,
	syzbot+f55b043dacf43776b50c
In-Reply-To: <20260614150041.21172-1-med08elkadiri@gmail.com>

On Sun, Jun 14, 2026 at 04:00:41PM +0100, Mohammed EL Kadiri wrote:
> __key_link_begin() runs with keyring->sem held and calls
> assoc_array_insert(), which does GFP_KERNEL allocations.  Those
> allocations may enter filesystem reclaim, evict an fscrypt-protected
> inode, and reach keyring_clear() via fscrypt_put_master_key() --
> taking a keyring semaphore of the same lockdep class and closing a
> keyring->sem -> fs_reclaim -> keyring->sem cycle reported by syzbot.
> 
> Wrap the assoc_array_insert() call with memalloc_nofs_save() /
> memalloc_nofs_restore() so reclaim cannot recurse into the keys
> subsystem while keyring->sem is held.
> 
> Reported-by: syzbot+f55b043dacf43776b50c@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f55b043dacf43776b50c
> Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mohammed EL Kadiri <med08elkadiri@gmail.com>

My patch "fscrypt: Replace mk_users keyring with simple list"
(https://lore.kernel.org/linux-fscrypt/20260618221921.87896-1-ebiggers@kernel.org/)
fixes this lockdep false positive by making fscrypt no longer use
'struct key' keyrings to keep track of user claims to fscrypt master
keys.  That eliminates the need to clear such keyrings during filesystem
reclaim.

So this patch to security/keys/keyring.c isn't needed, unless there's
another reason for it.

- Eric

^ permalink raw reply

* Re: [PATCH bpf-next v3 1/2] bpf: add bpf_init_inode_xattr kfunc for atomic inode labeling
From: bot+bpf-ci @ 2026-06-18 21:22 UTC (permalink / raw)
  To: dwindsor, viro, brauner, jack, ast, daniel, john.fastabend,
	andrii, eddyz87, memxor, martin.lau, song, yonghong.song, jolsa,
	emil, kpsingh, mattbobrowski, paul, jmorris, serge, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg,
	stephen.smalley.work, omosnace, casey, shuah
  Cc: linux-kernel, linux-fsdevel, bpf, linux-security-module,
	linux-integrity, selinux, linux-kselftest, dwindsor, ast, andrii,
	daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260618203411.73917-2-dwindsor@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

> diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
> index 768aca2dc0f0..7abc3f3d1a67 100644
> --- a/fs/bpf_fs_kfuncs.c
> +++ b/fs/bpf_fs_kfuncs.c

[ ... ]

> diff --git a/include/linux/security.h b/include/linux/security.h
> index 153e9043058f..1f8e84e7dd7e 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -68,6 +68,11 @@ struct watch;
>  struct watch_notification;
>  struct lsm_ctx;
>
> +struct xattr_ctx {
> +	struct xattr *xattrs;
> +	int *xattr_count;
> +};
> +

Paul Moore suggested in v2 renaming this to 'struct lsm_xattrs' to avoid
overloading the term 'ctx' in the LSM space. The struct was renamed from
'lsm_xattr_ctx' to 'xattr_ctx' in v3, but 'ctx' is still in the name. Was
there a reason to keep 'ctx' in the name, or would renaming to 'lsm_xattrs'
as Paul suggested be preferable?

>  /* Default (no) options for the capable function */
>  #define CAP_OPT_NONE 0x0
>  /* If capable should audit the security request */

[ ... ]

> diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> index 564071a92d7d..86a8e188b900 100644
> --- a/kernel/bpf/bpf_lsm.c
> +++ b/kernel/bpf/bpf_lsm.c
> @@ -113,6 +113,9 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
>  }
>  #endif
>
> +BTF_ID_LIST_SINGLE(bpf_lsm_inode_init_security_btf_ids, func,
> +		   bpf_lsm_inode_init_security)
> +
>  int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
>  			const struct bpf_prog *prog)
>  {
> @@ -137,6 +140,12 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
>  		return -EINVAL;
>  	}
>
> +	/* bpf reserves a fixed number of xattr slots for itself.
> +	 * Set the attach limit so the trampoline rejects excess attaches.
> +	 */

This isn't a bug, but a subsystem pattern flags this as potentially
concerning: the BPF subsystem comment-style rule for files under kernel/bpf/
requires multi-line comments to have the opening /* on its own line, with
comment text starting on the next line. The preferred form would be:

	/*
	 * bpf reserves a fixed number of xattr slots for itself.
	 * Set the attach limit so the trampoline rejects excess attaches.
	 */

> +	if (btf_id == bpf_lsm_inode_init_security_btf_ids[0])
> +		prog->aux->attach_limit = BPF_LSM_INODE_INIT_XATTRS;
> +
>  	return 0;
>  }
>

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27788616397

^ permalink raw reply

* [PATCH bpf-next v3 2/2] selftests/bpf: add tests for bpf_init_inode_xattr kfunc
From: David Windsor @ 2026-06-18 20:34 UTC (permalink / raw)
  To: viro, brauner, jack, ast, daniel, john.fastabend, andrii, eddyz87,
	memxor, martin.lau, song, yonghong.song, jolsa, emil, kpsingh,
	mattbobrowski, paul, jmorris, serge, zohar, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, stephen.smalley.work, omosnace,
	casey, shuah
  Cc: linux-kernel, linux-fsdevel, bpf, linux-security-module,
	linux-integrity, selinux, linux-kselftest, David Windsor
In-Reply-To: <20260618203411.73917-1-dwindsor@gmail.com>

Test bpf atomic inode xattr labeling in inode_init_security.

Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 tools/testing/selftests/bpf/bpf_kfuncs.h      |   5 +
 .../selftests/bpf/prog_tests/fs_kfuncs.c      | 105 +++++++++++++++++-
 .../bpf/progs/test_init_inode_xattr.c         |  31 ++++++
 3 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_init_inode_xattr.c

diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h
index ae71e9b69051..69d3641ee2d8 100644
--- a/tools/testing/selftests/bpf/bpf_kfuncs.h
+++ b/tools/testing/selftests/bpf/bpf_kfuncs.h
@@ -92,4 +92,9 @@ extern int bpf_set_dentry_xattr(struct dentry *dentry, const char *name__str,
 				const struct bpf_dynptr *value_p, int flags) __ksym __weak;
 extern int bpf_remove_dentry_xattr(struct dentry *dentry, const char *name__str) __ksym __weak;
 
+struct xattr_ctx;
+extern int bpf_init_inode_xattr(struct xattr_ctx *xattr_ctx,
+				const char *name__str,
+				const struct bpf_dynptr *value_p) __ksym __weak;
+
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c b/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
index 43a26ec69a8e..0898898fb125 100644
--- a/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
+++ b/tools/testing/selftests/bpf/prog_tests/fs_kfuncs.c
@@ -9,9 +9,10 @@
 #include <test_progs.h>
 #include "test_get_xattr.skel.h"
 #include "test_set_remove_xattr.skel.h"
+#include "test_init_inode_xattr.skel.h"
 #include "test_fsverity.skel.h"
 
-static const char testfile[] = "/tmp/test_progs_fs_kfuncs";
+static const char testfile[] = "/tmp/labelme";
 
 static void test_get_xattr(const char *name, const char *value, bool allow_access)
 {
@@ -268,6 +269,102 @@ static void test_fsverity(void)
 	remove(testfile);
 }
 
+static void test_init_inode_xattr(void)
+{
+	struct test_init_inode_xattr *skel = NULL;
+	int fd = -1, err;
+	char value_out[64];
+	const char *testfile_new = "/tmp/test_progs_fs_kfuncs_new";
+
+	skel = test_init_inode_xattr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_init_inode_xattr__open_and_load"))
+		return;
+
+	skel->bss->monitored_pid = getpid();
+	err = test_init_inode_xattr__attach(skel);
+	if (!ASSERT_OK(err, "test_init_inode_xattr__attach"))
+		goto out;
+
+	/* Trigger inode_init_security */
+	fd = open(testfile_new, O_CREAT | O_RDWR, 0644);
+	if (!ASSERT_GE(fd, 0, "create_file"))
+		goto out;
+
+	ASSERT_EQ(skel->data->init_result, 0, "init_result");
+
+	/* initxattrs prepends "security." to the name. */
+	err = getxattr(testfile_new, "security.bpf.test_label", value_out,
+		       sizeof(value_out));
+	if (err < 0 && errno == ENODATA) {
+		printf("%s:SKIP:filesystem did not apply LSM xattrs\n",
+		       __func__);
+		test__skip();
+		goto out;
+	}
+	if (!ASSERT_GE(err, 0, "getxattr"))
+		goto out;
+
+	ASSERT_EQ(err, (int)sizeof(skel->data->xattr_value), "xattr_size");
+	ASSERT_EQ(strncmp(value_out, "unconfined_u:object_r:user_home_t:s0",
+			  sizeof("unconfined_u:object_r:user_home_t:s0")), 0,
+		  "xattr_value");
+
+out:
+	close(fd);
+	test_init_inode_xattr__destroy(skel);
+	remove(testfile_new);
+}
+
+/* Keep in sync with BPF_LSM_INODE_INIT_XATTRS in include/linux/bpf_lsm.h. */
+#define INIT_INODE_XATTR_MAX 4
+
+/* At most INIT_INODE_XATTR_MAX programs can attach to inode_init_security. */
+static void test_init_inode_xattr_attach_cap(void)
+{
+	struct test_init_inode_xattr *skel[INIT_INODE_XATTR_MAX + 1] = {};
+	struct bpf_link *link[INIT_INODE_XATTR_MAX + 1] = {};
+	struct bpf_link *extra = NULL;
+	int i, err;
+
+	/* Fill all available xattr slots */
+	for (i = 0; i < INIT_INODE_XATTR_MAX; i++) {
+		skel[i] = test_init_inode_xattr__open_and_load();
+		if (!ASSERT_OK_PTR(skel[i], "open_and_load"))
+			goto out;
+
+		link[i] = bpf_program__attach_lsm(skel[i]->progs.test_init_inode_xattr);
+		if (!ASSERT_OK_PTR(link[i], "attach_within_cap"))
+			goto out;
+	}
+
+	skel[INIT_INODE_XATTR_MAX] = test_init_inode_xattr__open_and_load();
+	if (!ASSERT_OK_PTR(skel[INIT_INODE_XATTR_MAX], "open_and_load_extra"))
+		goto out;
+
+	/* New additions fail with -E2BIG */
+	extra = bpf_program__attach_lsm(skel[INIT_INODE_XATTR_MAX]->progs.test_init_inode_xattr);
+	err = -errno;
+	if (!ASSERT_ERR_PTR(extra, "attach_over_cap_should_fail")) {
+		bpf_link__destroy(extra);
+		goto out;
+	}
+	ASSERT_EQ(err, -E2BIG, "attach_over_cap_errno");
+
+	bpf_link__destroy(link[0]);
+	link[0] = NULL; /* avoid double free in cleanup */
+
+	/* Freeing a slot lets the extra program attach */
+	extra = bpf_program__attach_lsm(skel[INIT_INODE_XATTR_MAX]->progs.test_init_inode_xattr);
+	ASSERT_OK_PTR(extra, "attach_after_detach");
+
+out:
+	bpf_link__destroy(extra);
+	for (i = 0; i <= INIT_INODE_XATTR_MAX; i++) {
+		bpf_link__destroy(link[i]);
+		test_init_inode_xattr__destroy(skel[i]);
+	}
+}
+
 void test_fs_kfuncs(void)
 {
 	/* Matches xattr_names in progs/test_get_xattr.c */
@@ -286,6 +383,12 @@ void test_fs_kfuncs(void)
 	if (test__start_subtest("set_remove_xattr"))
 		test_set_remove_xattr();
 
+	if (test__start_subtest("init_inode_xattr"))
+		test_init_inode_xattr();
+
+	if (test__start_subtest("init_inode_xattr_attach_cap"))
+		test_init_inode_xattr_attach_cap();
+
 	if (test__start_subtest("fsverity"))
 		test_fsverity();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_init_inode_xattr.c b/tools/testing/selftests/bpf/progs/test_init_inode_xattr.c
new file mode 100644
index 000000000000..6f0e8b02ff88
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_init_inode_xattr.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Cisco Systems, Inc. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include "bpf_kfuncs.h"
+
+char _license[] SEC("license") = "GPL";
+
+__u32 monitored_pid;
+int init_result = -1;
+
+static const char xattr_name[] = "bpf.test_label";
+char xattr_value[] = "unconfined_u:object_r:user_home_t:s0";
+
+SEC("lsm.s/inode_init_security")
+int BPF_PROG(test_init_inode_xattr, struct inode *inode, struct inode *dir,
+	     const struct qstr *qstr, struct xattr_ctx *xattr_ctx)
+{
+	struct bpf_dynptr value_ptr;
+	__u32 pid;
+
+	pid = bpf_get_current_pid_tgid() >> 32;
+	if (pid != monitored_pid)
+		return 0;
+
+	bpf_dynptr_from_mem(xattr_value, sizeof(xattr_value), 0, &value_ptr);
+	init_result = bpf_init_inode_xattr(xattr_ctx, xattr_name, &value_ptr);
+
+	return 0;
+}
-- 
2.53.0


^ permalink raw reply related

* [PATCH bpf-next v3 1/2] bpf: add bpf_init_inode_xattr kfunc for atomic inode labeling
From: David Windsor @ 2026-06-18 20:34 UTC (permalink / raw)
  To: viro, brauner, jack, ast, daniel, john.fastabend, andrii, eddyz87,
	memxor, martin.lau, song, yonghong.song, jolsa, emil, kpsingh,
	mattbobrowski, paul, jmorris, serge, zohar, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, stephen.smalley.work, omosnace,
	casey, shuah
  Cc: linux-kernel, linux-fsdevel, bpf, linux-security-module,
	linux-integrity, selinux, linux-kselftest, David Windsor
In-Reply-To: <20260618203411.73917-1-dwindsor@gmail.com>

Add bpf_init_inode_xattr() kfunc for BPF LSM programs to atomically set
xattrs via the inode_init_security hook using lsm_get_xattr_slot().

The inode_init_security hook previously took the xattr array and count
as two separate output parameters (struct xattr *xattrs, int
*xattr_count), which BPF programs cannot write to. Pass the xattr state
as a single context object (struct xattr_ctx) instead, and have
bpf_init_inode_xattr() take that context directly. Update the existing
in-tree callers of inode_init_security to take and forward the new
xattr_ctx.

A previous attempt [1] required a kmalloc string output protocol for
the xattr name. Since commit 6bcdfd2cac55 ("security: Allow all LSMs to
provide xattrs for inode_init_security hook") [2], the xattr name is no
longer allocated; it is a static constant.

Because we rely on the hook-specific ctx layout, the kfunc is
restricted to lsm/inode_init_security. Restrict the xattr names that
may be set via this kfunc to the bpf.* namespace.

Link: https://kernsec.org/pipermail/linux-security-module-archive/2022-October/034878.html [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6bcdfd2cac55 [2]
Suggested-by: Song Liu <song@kernel.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 fs/bpf_fs_kfuncs.c                | 106 +++++++++++++++++++++++++++++-
 include/linux/bpf.h               |   1 +
 include/linux/bpf_lsm.h           |   3 +
 include/linux/evm.h               |   9 +--
 include/linux/lsm_hook_defs.h     |   4 +-
 include/linux/lsm_hooks.h         |  16 ++---
 include/linux/security.h          |   5 ++
 kernel/bpf/bpf_lsm.c              |  10 +++
 kernel/bpf/trampoline.c           |   3 +
 security/bpf/hooks.c              |   1 +
 security/integrity/evm/evm_main.c |   8 ++-
 security/security.c               |   7 +-
 security/selinux/hooks.c          |   4 +-
 security/smack/smack_lsm.c        |  27 ++++----
 14 files changed, 166 insertions(+), 38 deletions(-)

diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 768aca2dc0f0..7abc3f3d1a67 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -10,6 +10,7 @@
 #include <linux/fsnotify.h>
 #include <linux/file.h>
 #include <linux/kernfs.h>
+#include <linux/lsm_hooks.h>
 #include <linux/mm.h>
 #include <linux/xattr.h>
 
@@ -374,6 +375,97 @@ __bpf_kfunc struct inode *bpf_real_inode(struct dentry *dentry)
 	return d_real_inode(dentry);
 }
 
+static int bpf_xattrs_used(const struct xattr_ctx *ctx)
+{
+	const size_t prefix_len = sizeof(XATTR_BPF_LSM_SUFFIX) - 1;
+	int i, n = 0;
+
+	for (i = 0; i < *ctx->xattr_count; i++) {
+		const char *name = ctx->xattrs[i].name;
+
+		if (name && !strncmp(name, XATTR_BPF_LSM_SUFFIX, prefix_len))
+			n++;
+	}
+	return n;
+}
+
+static int __bpf_init_inode_xattr(struct xattr_ctx *xattr_ctx,
+				  const char *name__str,
+				  const struct bpf_dynptr *value_p)
+{
+	struct bpf_dynptr_kern *value_ptr = (struct bpf_dynptr_kern *)value_p;
+	size_t name_len;
+	void *xattr_value;
+	struct xattr *xattr;
+	struct xattr *xattrs;
+	int *xattr_count;
+	const void *value;
+	u32 value_len;
+
+	if (!xattr_ctx || !name__str)
+		return -EINVAL;
+
+	xattrs = xattr_ctx->xattrs;
+	xattr_count = xattr_ctx->xattr_count;
+	if (!xattrs || !xattr_count)
+		return -EINVAL;
+	if (bpf_xattrs_used(xattr_ctx) >= BPF_LSM_INODE_INIT_XATTRS)
+		return -ENOSPC;
+
+	name_len = strlen(name__str);
+	if (name_len == 0 || name_len > XATTR_NAME_MAX)
+		return -EINVAL;
+	if (strncmp(name__str, XATTR_BPF_LSM_SUFFIX,
+		    sizeof(XATTR_BPF_LSM_SUFFIX) - 1))
+		return -EPERM;
+
+	value_len = __bpf_dynptr_size(value_ptr);
+	if (value_len == 0 || value_len > XATTR_SIZE_MAX)
+		return -EINVAL;
+
+	value = __bpf_dynptr_data(value_ptr, value_len);
+	if (!value)
+		return -EINVAL;
+
+	/* Combine xattr value + name into one allocation. */
+	xattr_value = kmalloc(value_len + name_len + 1, GFP_KERNEL);
+	if (!xattr_value)
+		return -ENOMEM;
+
+	memcpy(xattr_value, value, value_len);
+	memcpy(xattr_value + value_len, name__str, name_len);
+	((char *)xattr_value)[value_len + name_len] = '\0';
+
+	xattr = lsm_get_xattr_slot(xattr_ctx);
+	if (!xattr) {
+		kfree(xattr_value);
+		return -ENOSPC;
+	}
+
+	xattr->value = xattr_value;
+	xattr->name = (const char *)xattr_value + value_len;
+	xattr->value_len = value_len;
+
+	return 0;
+}
+
+/**
+ * bpf_init_inode_xattr - set an xattr on a new inode from inode_init_security
+ * @xattr_ctx: inode_init_security xattr state from the hook context
+ * @name__str: xattr name (e.g., "bpf.file_label")
+ * @value_p: dynptr containing the xattr value
+ *
+ * Only callable from lsm/inode_init_security programs.
+ *
+ * Return: 0 on success, negative error on failure.
+ */
+__bpf_kfunc int bpf_init_inode_xattr(struct xattr_ctx *xattr_ctx,
+				     const char *name__str,
+				     const struct bpf_dynptr *value_p)
+{
+	return __bpf_init_inode_xattr(xattr_ctx, name__str, value_p);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_fs_kfunc_set_ids)
@@ -385,13 +477,25 @@ BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_set_dentry_xattr, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_remove_dentry_xattr, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_real_inode, KF_SLEEPABLE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_init_inode_xattr, KF_SLEEPABLE)
 BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
 
+BTF_ID_LIST(bpf_lsm_inode_init_security_btf_ids)
+BTF_ID(func, bpf_lsm_inode_init_security)
+
+BTF_ID_LIST(bpf_init_inode_xattr_btf_ids)
+BTF_ID(func, bpf_init_inode_xattr)
+
 static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
 {
 	if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
-	    prog->type == BPF_PROG_TYPE_LSM)
+	    prog->type == BPF_PROG_TYPE_LSM) {
+		/* bpf_init_inode_xattr only attaches to inode_init_security. */
+		if (kfunc_id == bpf_init_inode_xattr_btf_ids[0] &&
+		    prog->aux->attach_btf_id != bpf_lsm_inode_init_security_btf_ids[0])
+			return -EACCES;
 		return 0;
+	}
 	return -EACCES;
 }
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7719f6528445..f14bfcda78db 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1752,6 +1752,7 @@ struct bpf_prog_aux {
 	u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
 	u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
 	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
+	u32 attach_limit; /* max concurrent attachments (0 = unlimited) */
 	u32 attach_st_ops_member_off;
 	u32 ctx_arg_info_size;
 	u32 max_rdonly_access;
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index 143775a27a2a..b655c708818e 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -19,6 +19,9 @@
 #include <linux/lsm_hook_defs.h>
 #undef LSM_HOOK
 
+/* max bpf xattrs per inode */
+#define BPF_LSM_INODE_INIT_XATTRS 4
+
 struct bpf_storage_blob {
 	struct bpf_local_storage __rcu *storage;
 };
diff --git a/include/linux/evm.h b/include/linux/evm.h
index 913f4573b203..0aa151288b36 100644
--- a/include/linux/evm.h
+++ b/include/linux/evm.h
@@ -12,6 +12,8 @@
 #include <linux/integrity.h>
 #include <linux/xattr.h>
 
+struct xattr_ctx;
+
 #ifdef CONFIG_EVM
 extern int evm_set_key(void *key, size_t keylen);
 extern enum integrity_status evm_verifyxattr(struct dentry *dentry,
@@ -21,8 +23,8 @@ extern enum integrity_status evm_verifyxattr(struct dentry *dentry,
 int evm_fix_hmac(struct dentry *dentry, const char *xattr_name,
 		 const char *xattr_value, size_t xattr_value_len);
 int evm_inode_init_security(struct inode *inode, struct inode *dir,
-			    const struct qstr *qstr, struct xattr *xattrs,
-			    int *xattr_count);
+			    const struct qstr *qstr,
+			    struct xattr_ctx *xattr_ctx);
 extern bool evm_revalidate_status(const char *xattr_name);
 extern int evm_protected_xattr_if_enabled(const char *req_xattr_name);
 extern int evm_read_protected_xattrs(struct dentry *dentry, u8 *buffer,
@@ -63,8 +65,7 @@ static inline int evm_fix_hmac(struct dentry *dentry, const char *xattr_name,
 
 static inline int evm_inode_init_security(struct inode *inode, struct inode *dir,
 					  const struct qstr *qstr,
-					  struct xattr *xattrs,
-					  int *xattr_count)
+					  struct xattr_ctx *xattr_ctx)
 {
 	return 0;
 }
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 65c9609ec207..f62780fbeb9e 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -116,8 +116,8 @@ LSM_HOOK(int, 0, inode_alloc_security, struct inode *inode)
 LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode)
 LSM_HOOK(void, LSM_RET_VOID, inode_free_security_rcu, void *inode_security)
 LSM_HOOK(int, -EOPNOTSUPP, inode_init_security, struct inode *inode,
-	 struct inode *dir, const struct qstr *qstr, struct xattr *xattrs,
-	 int *xattr_count)
+	 struct inode *dir, const struct qstr *qstr,
+	 struct xattr_ctx *xattr_ctx)
 LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
 	 const struct qstr *name, const struct inode *context_inode)
 LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index b4f8cad53ddb..710e48caaeba 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -200,20 +200,18 @@ extern struct lsm_static_calls_table static_calls_table __ro_after_init;
 
 /**
  * lsm_get_xattr_slot - Return the next available slot and increment the index
- * @xattrs: array storing LSM-provided xattrs
- * @xattr_count: number of already stored xattrs (updated)
+ * @ctx: xattr state shared by inode_init_security hooks
  *
- * Retrieve the first available slot in the @xattrs array to fill with an xattr,
- * and increment @xattr_count.
+ * Retrieve the first available slot in the @ctx->xattrs array to fill with an
+ * xattr, and increment @ctx->xattr_count.
  *
- * Return: The slot to fill in @xattrs if non-NULL, NULL otherwise.
+ * Return: The slot to fill in @ctx->xattrs if non-NULL, NULL otherwise.
  */
-static inline struct xattr *lsm_get_xattr_slot(struct xattr *xattrs,
-					       int *xattr_count)
+static inline struct xattr *lsm_get_xattr_slot(struct xattr_ctx *ctx)
 {
-	if (unlikely(!xattrs))
+	if (unlikely(!ctx || !ctx->xattrs || !ctx->xattr_count))
 		return NULL;
-	return &xattrs[(*xattr_count)++];
+	return &ctx->xattrs[(*ctx->xattr_count)++];
 }
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index 153e9043058f..1f8e84e7dd7e 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -68,6 +68,11 @@ struct watch;
 struct watch_notification;
 struct lsm_ctx;
 
+struct xattr_ctx {
+	struct xattr *xattrs;
+	int *xattr_count;
+};
+
 /* Default (no) options for the capable function */
 #define CAP_OPT_NONE 0x0
 /* If capable should audit the security request */
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 564071a92d7d..86a8e188b900 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -113,6 +113,9 @@ void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
 }
 #endif
 
+BTF_ID_LIST_SINGLE(bpf_lsm_inode_init_security_btf_ids, func,
+		   bpf_lsm_inode_init_security)
+
 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog)
 {
@@ -137,6 +140,12 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 		return -EINVAL;
 	}
 
+	/* bpf reserves a fixed number of xattr slots for itself.
+	 * Set the attach limit so the trampoline rejects excess attaches.
+	 */
+	if (btf_id == bpf_lsm_inode_init_security_btf_ids[0])
+		prog->aux->attach_limit = BPF_LSM_INODE_INIT_XATTRS;
+
 	return 0;
 }
 
@@ -315,6 +324,7 @@ BTF_ID(func, bpf_lsm_inode_create)
 BTF_ID(func, bpf_lsm_inode_free_security)
 BTF_ID(func, bpf_lsm_inode_getattr)
 BTF_ID(func, bpf_lsm_inode_getxattr)
+BTF_ID(func, bpf_lsm_inode_init_security)
 BTF_ID(func, bpf_lsm_inode_mknod)
 BTF_ID(func, bpf_lsm_inode_need_killpriv)
 BTF_ID(func, bpf_lsm_inode_post_setxattr)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 1a721fc4bef5..b41b02173e24 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -859,6 +859,9 @@ static int bpf_trampoline_add_prog(struct bpf_trampoline *tr,
 	}
 	if (cnt >= BPF_MAX_TRAMP_LINKS)
 		return -E2BIG;
+	if (node->link->prog->aux->attach_limit &&
+	    tr->progs_cnt[kind] >= node->link->prog->aux->attach_limit)
+		return -E2BIG;
 	if (!hlist_unhashed(&node->tramp_hlist))
 		/* prog already linked */
 		return -EBUSY;
diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c
index 40efde233f3a..d7c44c5c0e30 100644
--- a/security/bpf/hooks.c
+++ b/security/bpf/hooks.c
@@ -30,6 +30,7 @@ static int __init bpf_lsm_init(void)
 
 struct lsm_blob_sizes bpf_lsm_blob_sizes __ro_after_init = {
 	.lbs_inode = sizeof(struct bpf_storage_blob),
+	.lbs_xattr_count = BPF_LSM_INODE_INIT_XATTRS,
 };
 
 DEFINE_LSM(bpf) = {
diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
index b59e3f121b8a..e0a05162accc 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -1062,14 +1062,16 @@ static int evm_inode_copy_up_xattr(struct dentry *src, const char *name)
  * evm_inode_init_security - initializes security.evm HMAC value
  */
 int evm_inode_init_security(struct inode *inode, struct inode *dir,
-			    const struct qstr *qstr, struct xattr *xattrs,
-			    int *xattr_count)
+			    const struct qstr *qstr,
+			    struct xattr_ctx *xattr_ctx)
 {
 	struct evm_xattr *xattr_data;
 	struct xattr *xattr, *evm_xattr;
+	struct xattr *xattrs;
 	bool evm_protected_xattrs = false;
 	int rc;
 
+	xattrs = xattr_ctx ? xattr_ctx->xattrs : NULL;
 	if (!(evm_initialized & EVM_INIT_HMAC) || !xattrs)
 		return 0;
 
@@ -1087,7 +1089,7 @@ int evm_inode_init_security(struct inode *inode, struct inode *dir,
 	if (!evm_protected_xattrs)
 		return 0;
 
-	evm_xattr = lsm_get_xattr_slot(xattrs, xattr_count);
+	evm_xattr = lsm_get_xattr_slot(xattr_ctx);
 	/*
 	 * Array terminator (xattr name = NULL) must be the first non-filled
 	 * xattr slot.
diff --git a/security/security.c b/security/security.c
index 71aea8fdf014..8f82a1352356 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1334,6 +1334,7 @@ int security_inode_init_security(struct inode *inode, struct inode *dir,
 {
 	struct lsm_static_call *scall;
 	struct xattr *new_xattrs = NULL;
+	struct xattr_ctx xattr_ctx;
 	int ret = -EOPNOTSUPP, xattr_count = 0;
 
 	if (unlikely(IS_PRIVATE(inode)))
@@ -1349,10 +1350,12 @@ int security_inode_init_security(struct inode *inode, struct inode *dir,
 		if (!new_xattrs)
 			return -ENOMEM;
 	}
+	xattr_ctx.xattrs = new_xattrs;
+	xattr_ctx.xattr_count = &xattr_count;
 
 	lsm_for_each_hook(scall, inode_init_security) {
-		ret = scall->hl->hook.inode_init_security(inode, dir, qstr, new_xattrs,
-						  &xattr_count);
+		ret = scall->hl->hook.inode_init_security(inode, dir, qstr,
+							  &xattr_ctx);
 		if (ret && ret != -EOPNOTSUPP)
 			goto out;
 		/*
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 1a713d96206f..faa8a6b9c45b 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2962,7 +2962,7 @@ static int selinux_dentry_create_files_as(struct dentry *dentry, int mode,
 
 static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
 				       const struct qstr *qstr,
-				       struct xattr *xattrs, int *xattr_count)
+				       struct xattr_ctx *xattr_ctx)
 {
 	const struct cred_security_struct *crsec = selinux_cred(current_cred());
 	struct superblock_security_struct *sbsec;
@@ -2992,7 +2992,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
 	    !(sbsec->flags & SBLABEL_MNT))
 		return -EOPNOTSUPP;
 
-	xattr = lsm_get_xattr_slot(xattrs, xattr_count);
+	xattr = lsm_get_xattr_slot(xattr_ctx);
 	if (xattr) {
 		rc = security_sid_to_context_force(newsid,
 						   &context, &clen);
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index ff115068c5c0..8ed5648a0116 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -981,10 +981,10 @@ smk_rule_transmutes(struct smack_known *subject,
 }
 
 static int
-xattr_dupval(struct xattr *xattrs, int *xattr_count,
+xattr_dupval(struct xattr_ctx *xattr_ctx,
 	     const char *name, const void *value, unsigned int vallen)
 {
-	struct xattr * const xattr = lsm_get_xattr_slot(xattrs, xattr_count);
+	struct xattr * const xattr = lsm_get_xattr_slot(xattr_ctx);
 
 	if (!xattr)
 		return 0;
@@ -1003,14 +1003,13 @@ xattr_dupval(struct xattr *xattrs, int *xattr_count,
  * @inode: the newly created inode
  * @dir: containing directory object
  * @qstr: unused
- * @xattrs: where to put the attributes
- * @xattr_count: current number of LSM-provided xattrs (updated)
+ * @xattr_ctx: where to put attributes and update count
  *
  * Returns 0 if it all works out, -ENOMEM if there's no memory
  */
 static int smack_inode_init_security(struct inode *inode, struct inode *dir,
 				     const struct qstr *qstr,
-				     struct xattr *xattrs, int *xattr_count)
+				     struct xattr_ctx *xattr_ctx)
 {
 	struct task_smack *tsp = smack_cred(current_cred());
 	struct inode_smack * const issp = smack_inode(inode);
@@ -1057,21 +1056,19 @@ static int smack_inode_init_security(struct inode *inode, struct inode *dir,
 		if (S_ISDIR(inode->i_mode)) {
 			transflag = SMK_INODE_TRANSMUTE;
 
-			if (xattr_dupval(xattrs, xattr_count,
-				XATTR_SMACK_TRANSMUTE,
-				TRANS_TRUE,
-				TRANS_TRUE_SIZE
-			))
+			if (xattr_dupval(xattr_ctx,
+					 XATTR_SMACK_TRANSMUTE,
+					 TRANS_TRUE,
+					 TRANS_TRUE_SIZE))
 				rc = -ENOMEM;
 		}
 	}
 
 	if (rc == 0)
-		if (xattr_dupval(xattrs, xattr_count,
-			    XATTR_SMACK_SUFFIX,
-			    issp->smk_inode->smk_known,
-		     strlen(issp->smk_inode->smk_known)
-		))
+		if (xattr_dupval(xattr_ctx,
+				 XATTR_SMACK_SUFFIX,
+				 issp->smk_inode->smk_known,
+				 strlen(issp->smk_inode->smk_known)))
 			rc = -ENOMEM;
 instant_inode:
 	issp->smk_flags |= (SMK_INODE_INSTANT | transflag);
-- 
2.53.0


^ permalink raw reply related

* [PATCH bpf-next v3 0/2] bpf: add bpf_init_inode_xattr kfunc for atomic inode labeling
From: David Windsor @ 2026-06-18 20:34 UTC (permalink / raw)
  To: viro, brauner, jack, ast, daniel, john.fastabend, andrii, eddyz87,
	memxor, martin.lau, song, yonghong.song, jolsa, emil, kpsingh,
	mattbobrowski, paul, jmorris, serge, zohar, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, stephen.smalley.work, omosnace,
	casey, shuah
  Cc: linux-kernel, linux-fsdevel, bpf, linux-security-module,
	linux-integrity, selinux, linux-kselftest, David Windsor

Many in-kernel LSMs (SELinux, Smack, IMA) store security labels in
extended attributes. For these LSMs, atomic labeling during inode
creation is critical: if the inode becomes accessible before its xattr
is set, it is briefly unlabeled, which can disrupt LSMs making policy
decisions based on file labels.

Existing LSMs solve this by setting xattrs directly in the
inode_init_security hook, which runs before the inode becomes
accessible. BPF LSM programs currently lack this capability because
the hook uses an output parameter (xattr_count) that BPF programs
cannot write to, and existing kfuncs like bpf_set_dentry_xattr
require a dentry that isn't available until after the inode is
accessible.

This series introduces the bpf_init_inode_xattr() kfunc, which takes
the combined inode_init_security xattr context argument to access
xattrs and xattr_count, and internally writes to xattr_count via
lsm_get_xattr_slot().

v3:
  - rename struct lsm_xattr_ctx to struct xattr_ctx (Paul)
  - increase BPF_LSM_INODE_INIT_XATTRS to 4 (Song)
  - enforce per-hook attachment cap at attach time to prevent
    runtime rejection (Paul)
  - add init_inode_xattr_attach_cap selftest

v2:
  - pass the xattr state as a combined context object and drop the
    verifier fixup path (Kumar)
  - restrict bpf_init_inode_xattr labels to bpf.* namespace (Matt)
  - cap bpf_init_inode_xattr() at BPF_LSM_INODE_INIT_XATTRS slots per
    invocation (AI)

Link: https://lore.kernel.org/all/20260503211835.16103-1-dwindsor@gmail.com/ [v2]

David Windsor (2):
  bpf: add bpf_init_inode_xattr kfunc for atomic inode labeling
  selftests/bpf: add tests for bpf_init_inode_xattr kfunc

 fs/bpf_fs_kfuncs.c                            | 106 +++++++++++++++++-
 include/linux/bpf.h                           |   1 +
 include/linux/bpf_lsm.h                       |   3 +
 include/linux/evm.h                           |   9 +-
 include/linux/lsm_hook_defs.h                 |   4 +-
 include/linux/lsm_hooks.h                     |  16 ++-
 include/linux/security.h                      |   5 +
 kernel/bpf/bpf_lsm.c                          |  10 ++
 kernel/bpf/trampoline.c                       |   3 +
 security/bpf/hooks.c                          |   1 +
 security/integrity/evm/evm_main.c             |   8 +-
 security/security.c                           |   7 +-
 security/selinux/hooks.c                      |   4 +-
 security/smack/smack_lsm.c                    |  27 ++---
 tools/testing/selftests/bpf/bpf_kfuncs.h      |   5 +
 .../selftests/bpf/prog_tests/fs_kfuncs.c      | 105 ++++++++++++++++-
 .../bpf/progs/test_init_inode_xattr.c         |  31 +++++
 17 files changed, 306 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_init_inode_xattr.c


base-commit: e771677c937da5808f7b6c1f0e4a97ec1a84f8a8
-- 
2.53.0


^ permalink raw reply

* Re: Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD
From: Bryam Vargas @ 2026-06-18 20:11 UTC (permalink / raw)
  To: Günther Noack; +Cc: Mickaël Salaün, linux-security-module
In-Reply-To: <ajPhlNSgUcmBoFcM@google.com>

Günther,

Thanks, and #65 looks right.

On the approach: it's a Landlock-only change either way, both hooks already
exist, so no io_uring core churn.

Coarse (block ring creation) can hang off security_uring_allowed(), the existing
io_uring_setup() gate. That matches the creation-control direction Mickaël raised
-- the socket-creation work he said would suit io_uring too -- and it's a fine
default, since most sandboxes don't need io_uring. One caveat: it overlaps
kernel.io_uring_disabled and a seccomp filter on io_uring_setup, so the
Landlock-specific win is mainly composing it in a ruleset.

Fine-grained (gate device uring_cmd) is the only one that closes the asymmetry I
reported. It uses security_uring_cmd() -- the hook SELinux and Smack already have
and we don't -- and needs no new right: gate device files on the existing
IOCTL_DEV, mirroring hook_file_ioctl_common(). All-or-nothing per device, since
cmd_op is a private number space.

So I'd go coarse-first as you suggest, and keep the uring_cmd gate as the granular
step; it's little code and reuses an existing right. Happy to prototype either
once you and Mickaël settle on the shape; I'll hold until then.

Bryam

^ permalink raw reply

* Re: [PATCH v5 7/8] vfs: Replace security_sb_mount/security_move_mount with granular hooks
From: Bryam Vargas @ 2026-06-18 19:33 UTC (permalink / raw)
  To: Song Liu
  Cc: Christian Brauner, Al Viro, Stephen Smalley, Ondrej Mosnacek,
	Mickaël Salaün, John Johansen, Paul Moore, James Morris,
	Serge Hallyn, linux-security-module, linux-fsdevel, linux-kernel

Song,

> +	err = security_mount_change_type(path, ms_flags);

This gates the propagation change on the mount(2) path. The same change on
the newer mount_setattr(2)/open_tree_attr(2) path is left open:
do_mount_setattr() -> mount_setattr_commit() calls change_mnt_propagation()
for the propagation and writes the MNT_NOEXEC/NOSUID/NODEV/READONLY flags --
the same work do_change_type() and do_reconfigure_mnt() do, but with no
hook. security_sb_mount() never reached that path either, so the gap isn't
new. But once this series checks the mount(2) propagation and remount
paths, mount_setattr(2) is the one path left without a check.

It's reachable. A Landlock domain denies mount(2) for the confined task, so
mount(MS_PRIVATE) and a remount clearing noexec both return -EPERM -- but
mount_setattr(propagation=MS_PRIVATE) and
mount_setattr(attr_clr=MOUNT_ATTR_NOEXEC) succeed, and the task then runs a
binary on a mount the policy marked noexec. A SELinux/AppArmor policy that
denies the mount has the same gap. With this series applied,
do_mount_setattr() still carries no security_ call, so the divergence
stands.

Adding the propagation hook and a reconfigure hook in
mount_setattr_commit() would cover mount_setattr too. Happy to send that as
a patch if you want it folded in.

Bryam

^ permalink raw reply

* Re: [PATCH v5 7/8] vfs: Replace security_sb_mount/security_move_mount with granular hooks
From: Christian Brauner @ 2026-06-18 14:02 UTC (permalink / raw)
  To: Song Liu
  Cc: Christian Brauner, linux-security-module, linux-fsdevel, selinux,
	apparmor, paul, jmorris, serge, viro, jack, john.johansen,
	stephen.smalley.work, omosnace, mic, gnoack, takedakn,
	penguin-kernel, herton, kernel-team
In-Reply-To: <CAPhsuW7Wn8GYrsrRhEFXQH5buaP+pdTKc0UV8Mn0B3OnNN-44g@mail.gmail.com>

On 2026-06-18 18:56:42+08:00, Song Liu wrote:
> On Wed, Jun 17, 2026 at 9:53 PM Christian Brauner <brauner@kernel.org> wrote:
> 
> > On Thu, May 28, 2026 at 11:26:06AM -0700, Song Liu wrote:
> 
> [...]
> 
> > >
> >
> > This again is racy as it is called outside of the namespace semaphore:
> >
> >         err = security_mount_bind(&old_path, path, recurse);
> >         if (err)
> >                 return err;
> >
> >         if (mnt_ns_loop(old_path.dentry))
> >                 return -EINVAL;
> >
> >         LOCK_MOUNT(mp, path);
> >         if (IS_ERR(mp.parent))
> >                 return PTR_ERR(mp.parent);
> >
> > After LOCK_MOUNT @path might point to a completely different mount then
> > the one you performed your security checks on.
> 
> I thought we agreed at LSF/MM/BPF 2026 to add the LSM hooks
> before taking namespace semaphore, so that it is possible for LSMs
> to defend against DoS attacks on namespace semaphore? Did I
> miss/misunderstand something?

I think there was a misunderstanding. What I pointed out was that it's a
trade-off. If we do call security hooks under the namespace semaphore or
mount lock than anything that's called under there must take care to not
cause deadlocks - which is especially easy to do with mount lock and
even with the namespace semaphore it may get hairy (automounts etc). The
dos thing is another worry but if an LSM does stupid things we tell it
to not do stupid things and to go away.

But as the hooks are done right now they are meaningless from a security
perspective. You might have a policy that allows mounting on dentry_a
and deny mounting on dentry_b: before LOCK_MOUNT*() you may see dentry_a
and allow the mount but after LOCK_MOUNT*() someone raced you and shoved
a dentry_b mount onto dentry_b and now you allow overmounting dentry_b
which your policy didn't allow -> hosed.

> > Placement of this hook suffers from the same issue as the bind mount
> > hook. Here it's worse because the security layer isn't even informed
> > about MOVE_MOUNT_BENEATH which completely alters the mount relationship.
> 
> Current hook security_move_mount doesn't handle
> MOVE_MOUNT_BENEATH. But we can add mflags to security_mount_move().
> Do we need anything other than mflags?

I think you either need to pass three mounts (source, target, top_mnt)
where for non-mount beneath target == top_mnt or you need two separate
hooks. Because for MOVE_MOUNT_BENEATH you may want to have a tri-part
policy: source, target, top_mnt.

^ permalink raw reply

* Re: Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD
From: Günther Noack @ 2026-06-18 12:16 UTC (permalink / raw)
  To: Bryam Vargas; +Cc: Mickaël Salaün, linux-security-module
In-Reply-To: <20260617230237.14718-1-hexlabsecurity@proton.me>

Hello Bryam!

On Wed, Jun 17, 2026 at 11:02:41PM +0000, Bryam Vargas wrote:
> Thanks Günther, and thanks for filing #64.
> 
> Straight to your two questions:
> 
> 1. Block: you're right. blkdev_uring_cmd() has a single case, BLOCK_URING_CMD_DISCARD,
>    and the blkdev.h note that it's a separate number space is fair, so I'm not arguing
>    it should be a generic ioctl multiplexer. The "others, through other devices" are on
>    NVMe: the namespace char dev takes NVME_URING_CMD_IO / _IO_VEC, and AFAICT a
>    write-capable confined task can reach IO passthrough (write, DSM/discard) with no
>    capability, since nvme_cmd_allowed() only wants FMODE_WRITE there.
> 
>    Correction to my own report: I overstated the ceiling. The NVMe admin ops
>    (format, sanitize, firmware, security-send) sit behind capable(CAP_SYS_ADMIN)
>    in nvme_cmd_allowed(), so a Landlocked unprivileged task can't reach them. The
>    A:H / 8.4 figure was wrong; only namespace IO is in scope for a confined task.
> 
> 2. Truncate: correct, no sidestep, and none looks possible. I went through every
>    f_op->uring_cmd provider (block, NVMe, btrfs encoded I/O, FUSE, ublk, sockets, ...)
>    and none change file size; truncate(2)/ftruncate(2) keep their own hook. Please
>    ignore the "and truncate where relevant" line in my suggested direction, it was
>    speculative.
> 
> On framing: I'm happy to call this a coverage gap rather than a bypass. IOCTL_DEV was
> never documented to cover io_uring, so nothing it promised is broken. The one hard fact
> is the asymmetry: ioctl(2) BLKDISCARD is denied (IOCTL_DEV, and it's not in
> is_masked_device_ioctl()), the same op via uring_cmd isn't, and SELinux/Smack already
> hook security_uring_cmd while Landlock doesn't. Whether that's worth a hook or just the
> doc clarification Mickaël mentioned is your call.

Agreed, a coverage gap is in my mind the right way to think about it.

I filed this issue about that gap:
https://github.com/landlock-lsm/linux/issues/65

Even though that's technically a feature request, you are quite
right pointing it out.

As I'm saying on that issue description as well, there are in principle
multiple ways of blocking such a feature.  It is possible to block it at
the fine-grained layer in uring_cmd, but maybe a more practical way to
go about it would be to block the creation of an io_uring itself, since
most sandboxed processes do not normally make use of that feature.

(IMHO, we have already made a similar mistake in networking, where we
first built restrictions for individual TCP operations, but left all the
other protocols unrestricted.  Maybe the better approach is to start
with the coarser restriction that addresses the majority of use cases
and then provide more granular controls later.)

I'd be interested to hear people's opinions.

(Mickaël, if you feel this is the wrong approach to frame this as
feature request, also please speak up.)


> If you do want one, I can send an RFC for an all-or-nothing "IOCTL_DEV for any uring_cmd
> on a device file" hook (cmd_op is a private number space, so porting
> is_masked_device_ioctl() wouldn't be right). Otherwise I'll drop the provider detail
> into #64 and leave it at the doc fix.

I'd be happy to review your patches for the issue.  But let's find a
consensus on the overall approach first -- that will hopefully also save
you from going to much in circles in the implementation.

Thanks!
—Günther

^ permalink raw reply

* Re: [PATCH v5 7/8] vfs: Replace security_sb_mount/security_move_mount with granular hooks
From: Song Liu @ 2026-06-18 10:56 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, paul,
	jmorris, serge, viro, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <20260617-laufbahn-eifrig-charmant-a48f357a0c52@brauner>

On Wed, Jun 17, 2026 at 9:53 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Thu, May 28, 2026 at 11:26:06AM -0700, Song Liu wrote:
[...]
> >
> > +     err = security_mount_bind(&old_path, path, recurse);
> > +     if (err)
> > +             return err;
>
> This again is racy as it is called outside of the namespace semaphore:
>
>         err = security_mount_bind(&old_path, path, recurse);
>         if (err)
>                 return err;
>
>         if (mnt_ns_loop(old_path.dentry))
>                 return -EINVAL;
>
>         LOCK_MOUNT(mp, path);
>         if (IS_ERR(mp.parent))
>                 return PTR_ERR(mp.parent);
>
> After LOCK_MOUNT @path might point to a completely different mount then
> the one you performed your security checks on.

I thought we agreed at LSF/MM/BPF 2026 to add the LSM hooks
before taking namespace semaphore, so that it is possible for LSMs
to defend against DoS attacks on namespace semaphore? Did I
miss/misunderstand something?

> > +
> >       if (mnt_ns_loop(old_path.dentry))
> >               return -EINVAL;
> >
[...]
> >
> >       err = parse_monolithic_mount_data(fc, data);
> > +     if (!err)
> > +             err = security_mount_remount(fc, path, mnt_flags, flags,
> > +                                         data);
> >       if (!err) {
> >               down_write(&sb->s_umount);
> >               err = -EPERM;
> > @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
> >       if (err)
> >               return err;
> >
> > +     err = security_mount_move(&old_path, path);
> > +     if (err)
> > +             return err;
>
> Placement of this hook suffers from the same issue as the bind mount
> hook. Here it's worse because the security layer isn't even informed
> about MOVE_MOUNT_BENEATH which completely alters the mount relationship.

Current hook security_move_mount doesn't handle
MOVE_MOUNT_BENEATH. But we can add mflags to security_mount_move().
Do we need anything other than mflags?

Thanks,
Song

^ permalink raw reply

* Re: [RFC PATCH 1/2] landlock: fix TCP Fast Open connection bypass
From: Bryam Vargas @ 2026-06-18  1:25 UTC (permalink / raw)
  To: Matthieu Buffet
  Cc: Mickaël Salaün, Günther Noack, Mikhail Ivanov,
	Paul Moore, Eric Dumazet, Neal Cardwell, linux-security-module,
	netdev, linux-kernel
In-Reply-To: <20260617180526.15627-2-matthieu@buffet.re>

Thanks Matthieu, your #41, so no competing patch from me. I built your v0
(Landlock + MPTCP) and ran an A/B: without it, a confined task with CONNECT_TCP
denied still reaches the port via sendto(MSG_FASTOPEN); with it, that path is now
denied too, on IPv4 and IPv6.

Tested-by: Bryam Vargas <hexlabsecurity@proton.me>

One scope note, since you mention MPTCP: an MPTCP socket isn't covered.
sk_is_tcp() is false for the mptcp parent (sk_protocol is IPPROTO_MPTCP), so
neither the new sendmsg hook nor the existing socket_connect one mediates it. On
the patched kernel my MPTCP arm still reaches the blocked port via both connect()
and MSG_FASTOPEN. If MPTCP is meant to be in scope for CONNECT_TCP, the guard
wants `|| sk->sk_protocol == IPPROTO_MPTCP` (not sk_is_mptcp(), which is the
subflow flag).

Bryam

^ permalink raw reply

* Re: Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD
From: Bryam Vargas @ 2026-06-17 23:02 UTC (permalink / raw)
  To: Günther Noack; +Cc: Mickaël Salaün, linux-security-module
In-Reply-To: <ajJtTHyqWTmX7lHo@google.com>

Thanks Günther, and thanks for filing #64.

Straight to your two questions:

1. Block: you're right. blkdev_uring_cmd() has a single case, BLOCK_URING_CMD_DISCARD,
   and the blkdev.h note that it's a separate number space is fair, so I'm not arguing
   it should be a generic ioctl multiplexer. The "others, through other devices" are on
   NVMe: the namespace char dev takes NVME_URING_CMD_IO / _IO_VEC, and AFAICT a
   write-capable confined task can reach IO passthrough (write, DSM/discard) with no
   capability, since nvme_cmd_allowed() only wants FMODE_WRITE there.

   Correction to my own report: I overstated the ceiling. The NVMe admin ops
   (format, sanitize, firmware, security-send) sit behind capable(CAP_SYS_ADMIN)
   in nvme_cmd_allowed(), so a Landlocked unprivileged task can't reach them. The
   A:H / 8.4 figure was wrong; only namespace IO is in scope for a confined task.

2. Truncate: correct, no sidestep, and none looks possible. I went through every
   f_op->uring_cmd provider (block, NVMe, btrfs encoded I/O, FUSE, ublk, sockets, ...)
   and none change file size; truncate(2)/ftruncate(2) keep their own hook. Please
   ignore the "and truncate where relevant" line in my suggested direction, it was
   speculative.

On framing: I'm happy to call this a coverage gap rather than a bypass. IOCTL_DEV was
never documented to cover io_uring, so nothing it promised is broken. The one hard fact
is the asymmetry: ioctl(2) BLKDISCARD is denied (IOCTL_DEV, and it's not in
is_masked_device_ioctl()), the same op via uring_cmd isn't, and SELinux/Smack already
hook security_uring_cmd while Landlock doesn't. Whether that's worth a hook or just the
doc clarification Mickaël mentioned is your call.

If you do want one, I can send an RFC for an all-or-nothing "IOCTL_DEV for any uring_cmd
on a device file" hook (cmd_op is a private number space, so porting
is_masked_device_ioctl() wouldn't be right). Otherwise I'll drop the provider detail
into #64 and leave it at the doc fix.

Bryam

^ permalink raw reply

* [RFC PATCH 2/2] selftests/landlock: Add test for TCP fast open
From: Matthieu Buffet @ 2026-06-17 18:05 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack,
	linux-security-module, Mikhail Ivanov, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-kernel, netdev, Matthieu Buffet
In-Reply-To: <20260617180526.15627-1-matthieu@buffet.re>

Enforce that TCP Fast Open is controlled by
LANDLOCK_ACCESS_NET_CONNECT_TCP. Semantics of connect() and
sendmsg(MSG_FASTOPEN) should be identical from Landlock's perspective.
Also enforce error code consistency, since UDP sockets ignore
the MSG_FASTOPEN flag while Unix sockets reject it.

Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
---
 tools/testing/selftests/landlock/net_test.c | 155 ++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index 0c256e7c8675..177ed28e70f6 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -258,6 +258,64 @@ static int connect_variant(const int sock_fd,
 	return connect_variant_addrlen(sock_fd, srv, get_addrlen(srv, false));
 }
 
+static int sendto_variant_addrlen(const int sock_fd,
+				  const struct service_fixture *const srv,
+				  const socklen_t addrlen, void *buf,
+				  size_t len, size_t flags)
+{
+	const struct sockaddr *dst = NULL;
+	ssize_t ret;
+
+	/*
+        * We never want our processes to be killed by SIGPIPE: we check return
+        * codes and errno, so that we have actual error messages.
+        */
+	flags |= MSG_NOSIGNAL;
+
+	if (srv != NULL) {
+		switch (srv->protocol.domain) {
+		case AF_UNSPEC:
+		case AF_INET:
+			dst = (const struct sockaddr *)&srv->ipv4_addr;
+			break;
+
+		case AF_INET6:
+			dst = (const struct sockaddr *)&srv->ipv6_addr;
+			break;
+
+		case AF_UNIX:
+			dst = (const struct sockaddr *)&srv->unix_addr;
+			break;
+
+		default:
+			errno = EAFNOSUPPORT;
+			return -errno;
+		}
+	}
+
+	ret = sendto(sock_fd, buf, len, flags, dst, addrlen);
+	if (ret < 0)
+		return -errno;
+
+	/* errno is not set in cases of partial writes. */
+	if (ret != len)
+		return -EINTR;
+
+	return 0;
+}
+
+static int sendto_variant(const int sock_fd,
+			  const struct service_fixture *const srv, void *buf,
+			  size_t len, size_t flags)
+{
+	socklen_t addrlen = 0;
+
+	if (srv != NULL)
+		addrlen = get_addrlen(srv, false);
+
+	return sendto_variant_addrlen(sock_fd, srv, addrlen, buf, len, flags);
+}
+
 FIXTURE(protocol)
 {
 	struct service_fixture srv0, srv1, srv2, unspec_any0, unspec_srv0;
@@ -950,6 +1008,103 @@ TEST_F(protocol, connect_unspec)
 	EXPECT_EQ(0, close(bind_fd));
 }
 
+TEST_F(protocol, tcp_fastopen)
+{
+	const bool restricted = variant->sandbox == TCP_SANDBOX &&
+				variant->prot.type == SOCK_STREAM &&
+				(variant->prot.protocol == IPPROTO_TCP ||
+				 variant->prot.protocol == IPPROTO_IP) &&
+				(variant->prot.domain == AF_INET ||
+				 variant->prot.domain == AF_INET6);
+	const struct landlock_ruleset_attr ruleset_attr = {
+		.handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
+	};
+	int bind_fd, client_fd, status;
+	char buf;
+	pid_t child;
+
+	bind_fd = socket_variant(&self->srv0);
+	ASSERT_LE(0, bind_fd);
+	EXPECT_EQ(0, bind_variant(bind_fd, &self->srv0));
+	if (self->srv0.protocol.type == SOCK_STREAM)
+		EXPECT_EQ(0, listen(bind_fd, backlog));
+
+	child = fork();
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		int connect_fd, ret;
+
+		/* Closes listening socket for the child. */
+		EXPECT_EQ(0, close(bind_fd));
+
+		connect_fd = socket_variant(&self->srv0);
+		ASSERT_LE(0, connect_fd);
+
+		if (variant->sandbox == TCP_SANDBOX) {
+			const int ruleset_fd = landlock_create_ruleset(
+				&ruleset_attr, sizeof(ruleset_attr), 0);
+			ASSERT_LE(0, ruleset_fd);
+
+			enforce_ruleset(_metadata, ruleset_fd);
+			EXPECT_EQ(0, close(ruleset_fd));
+		}
+
+		/* Fast Open with no address. */
+		ret = sendto_variant(connect_fd, NULL, NULL, 0, MSG_FASTOPEN);
+		if (self->srv0.protocol.domain == AF_UNIX) {
+			ASSERT_EQ(-ENOTCONN, ret);
+		} else if (self->srv0.protocol.type == SOCK_DGRAM) {
+			ASSERT_EQ(-EDESTADDRREQ, ret);
+		} else {
+			ASSERT_EQ(-EINVAL, ret);
+		}
+
+		/* Fast Open to a denied address. */
+		ret = sendto_variant(connect_fd, &self->srv0, "A", 1,
+				     MSG_FASTOPEN);
+		if (restricted) {
+			ASSERT_EQ(-EACCES, ret);
+		} else if (self->srv0.protocol.domain == AF_UNIX &&
+			   self->srv0.protocol.type == SOCK_STREAM) {
+			ASSERT_EQ(-EOPNOTSUPP, ret);
+		} else {
+			ASSERT_EQ(0, ret);
+		}
+
+		EXPECT_EQ(0, close(connect_fd));
+		_exit(_metadata->exit_code);
+		return;
+	}
+
+	client_fd = bind_fd;
+	if (!restricted && self->srv0.protocol.type == SOCK_STREAM &&
+	    self->srv0.protocol.domain != AF_UNIX) {
+		client_fd = accept(bind_fd, NULL, 0);
+		ASSERT_LE(0, client_fd);
+	}
+
+	if (restricted) {
+		EXPECT_EQ(-1, read(client_fd, &buf, 1));
+		EXPECT_EQ(ENOTCONN, errno);
+	} else if (self->srv0.protocol.domain == AF_UNIX &&
+		   self->srv0.protocol.type == SOCK_STREAM) {
+		EXPECT_EQ(-1, read(client_fd, &buf, 1));
+		EXPECT_EQ(EINVAL, errno);
+	} else {
+		EXPECT_EQ(1, read(client_fd, &buf, 1));
+		EXPECT_EQ('A', buf);
+	}
+
+	EXPECT_EQ(child, waitpid(child, &status, 0));
+	EXPECT_EQ(1, WIFEXITED(status));
+	EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+
+	if (client_fd != bind_fd)
+		EXPECT_LE(0, close(client_fd));
+
+	EXPECT_EQ(0, close(bind_fd));
+}
+
 FIXTURE(ipv4)
 {
 	struct service_fixture srv0, srv1;
-- 
2.47.3


^ permalink raw reply related

* Re: Landlock: LANDLOCK_ACCESS_NET_CONNECT_TCP bypass via TCP Fast Open
From: Matthieu Buffet @ 2026-06-17 18:05 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack,
	linux-security-module, Mikhail Ivanov, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-kernel, netdev, Matthieu Buffet
In-Reply-To: <20260617.eemahv8ui7Ee@digikod.net>

Hi,

On 6/17/2026 4:22 PM, Mickaël Salaün wrote:
> Thanks for the report.  This was previously identified by Mikhail and
> Matthieu, see the related issue:
> https://github.com/landlock-lsm/linux/issues/41

(I worked on a v0 patch for that issue after I first reported it to
Mickaël, missing the fact that it was already documented as a github
issue. Then tried a more generic approach that failed. Here's the v0,
rebased on the beggining of -next to ease backporting, it might be a
good start. For instance, someone with more performance/benchmarking
background might want to add an unlikely() around the MSG_FASTOPEN
condition in the hot code path?)

Have a nice day!

Matthieu Buffet (2):
  landlock: fix TCP Fast Open connection bypass
  selftests/landlock: Add test for TCP fast open

 security/landlock/net.c                     |  17 +++
 tools/testing/selftests/landlock/net_test.c | 155 ++++++++++++++++++++
 2 files changed, 172 insertions(+)

base-commit: 0ce4243509d1580349dd0d50624036d6b097e958
-- 
2.47.3

^ permalink raw reply

* [RFC PATCH 1/2] landlock: fix TCP Fast Open connection bypass
From: Matthieu Buffet @ 2026-06-17 18:05 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack,
	linux-security-module, Mikhail Ivanov, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-kernel, netdev, Matthieu Buffet
In-Reply-To: <20260617180526.15627-1-matthieu@buffet.re>

The documentation of the socket_connect() LSM hook states that it
controls connecting a socket to a remote address. It has not been the
case since the addition of TCP Fast Open (RFC 7413) support, which allows
opening a TCP connection (thus, setting a socket's destination address)
via the MSG_FASTOPEN flag passed to sendto()/sendmsg()/sendmmsg(). The
problem then got duplicated into MPTCP.

Landlock did not take it into account when its TCP support was added,
leaving a bypass of TCP connect policy.

Ideally a call to the LSM hook would be added in the fastopen code path,
in order to fix this generically. But connect() hooks are designed to run
with the socket locked, unlike sendmsg() hooks.

Closes: https://github.com/landlock-lsm/linux/issues/41
Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
---
 security/landlock/net.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/security/landlock/net.c b/security/landlock/net.c
index 4ee4002a8f56..a2375762c18b 100644
--- a/security/landlock/net.c
+++ b/security/landlock/net.c
@@ -246,9 +246,26 @@ static int hook_socket_connect(struct socket *const sock,
 					   access_request);
 }
 
+static int hook_socket_sendmsg(struct socket *const sock,
+			       struct msghdr *const msg, const int size)
+{
+	struct sockaddr *const address = msg->msg_name;
+	const int addrlen = msg->msg_namelen;
+
+	if (sk_is_tcp(sock->sk) && address != NULL &&
+	    (msg->msg_flags & MSG_FASTOPEN) != 0) {
+		return current_check_access_socket(
+			sock, address, addrlen,
+			LANDLOCK_ACCESS_NET_CONNECT_TCP);
+	}
+
+	return 0;
+}
+
 static struct security_hook_list landlock_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
 	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
+	LSM_HOOK_INIT(socket_sendmsg, hook_socket_sendmsg),
 };
 
 __init void landlock_add_net_hooks(void)
-- 
2.47.3


^ permalink raw reply related

* Re: Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD
From: Günther Noack @ 2026-06-17 15:31 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Jens Axboe, Bryam Vargas, Paul Moore, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, linux-security-module,
	Tingmao Wang
In-Reply-To: <20260617.aeNg7Aeseez4@digikod.net>

On Wed, Jun 17, 2026 at 04:16:38PM +0200, Mickaël Salaün wrote:
> As I explained in previous (private) reports, there is currently no
> io_uring hooks implemented for Landlock because there is no use for
> them.
> 
> io_uring "bypass" was already mentioned to us two times in March but
> io_uring personality credential is not a Landlock bypass.  The Landlock
> threat model is about enforcing restrictions when accessing new kernel
> resources, on a sandboxed subject.  The credential identifies a set of
> access rights, so in the case of io_uring, the subject is inherited by
> the io_uring personality (i.e. the file descriptor).  If a sandboxed
> task creates an io_uring personality, it will be sandboxed with the same
> restrictions, which is BTW an interesting property (e.g.  pass a
> restricted io_uring FD to processes)

Remark on the side: We have previously received bug reports due to
io_uring using different credentials, but this report is not about that.

Instead, it is about the block device "discard" operation, which is
accessible through both (a) the ioctl() interface and (b) an io_uring
interface.  The report is, in my reading, about the fact that the access
through (a) can be blocked with Landlock, while the access through (b)
can not be blocked through Landlock.  (See the other answer I sent.)

But either way, as you are also saying here, we should probably document
better what the threat model for Landlock is, so that security
researchers (and AI models) can refer to that.  It'll result in less
work for everyone.

I opened https://github.com/landlock-lsm/linux/issues/64 to track it and
collected some notes.

—Günther

^ permalink raw reply

* Re: Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD
From: Jens Axboe @ 2026-06-17 14:56 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Bryam Vargas, Günther Noack, Paul Moore, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, linux-security-module,
	Tingmao Wang
In-Reply-To: <20260617.aeNg7Aeseez4@digikod.net>

On 6/17/26 8:16 AM, Micka?l Sala?n wrote:
> On Tue, Jun 16, 2026 at 08:44:55PM -0600, Jens Axboe wrote:
>> On 6/16/26 8:25 PM, Bryam Vargas wrote:
>>> Thanks Jens ? noted, the fix belongs in Landlock. Micka?l has the full
>>> report.
>>
>> Indeed - and hence no need to bother anyone else with it by blasting it
>> wide. I've already explained this multiple times, but on the private
>> security list, when the occasional AI report comes in on things like
>> this. Hence why it's a bit tiring to see the same stuff come across,
>> once again.
>>
>> For the landlock folks, I'd suggest taking a look at what hooks already
>> exists (and existed, when landlock was merged) for selinux etc, that'd
>> be a really good hint on the existing surface covered.
> 
> As I explained in previous (private) reports, there is currently no
> io_uring hooks implemented for Landlock because there is no use for
> them.
> 
> io_uring "bypass" was already mentioned to us two times in March but
> io_uring personality credential is not a Landlock bypass.  The Landlock
> threat model is about enforcing restrictions when accessing new kernel
> resources, on a sandboxed subject.  The credential identifies a set of
> access rights, so in the case of io_uring, the subject is inherited by
> the io_uring personality (i.e. the file descriptor).  If a sandboxed
> task creates an io_uring personality, it will be sandboxed with the same
> restrictions, which is BTW an interesting property (e.g.  pass a
> restricted io_uring FD to processes)
> 
> A sandboxed process cannot create an io_uring personality that would
> bypass its own restrictions, so there is no Landlock bypass.  Inheriting
> or receiving a file descriptor is not restricted by Landlock because
> these are operations from outside (or before) the sandbox.  If we want
> to restrict them, we need to restrict the processes creating such file
> descriptor.
> 
> Inherited or passed file descriptors are outside the Landlock threat
> model because Landlock is only one part of the security policy when
> (willingly) interacting with other processes.  In a nutshell, it's the
> security capability model (where an object has some associated rights).
> For instance, if a process willingly passes a file descriptor tied to a
> secret file, then the receiving side can (and should be able to) read
> it, being sandboxed with Landlock or not.  The scope of Landlock is to
> drop ambient rights, but if an *unsandboxed* process is OK to pass a
> sensitive resource, then that's a security architecture issue (i.e. a
> confused deputy attack).
> 
> A nice side effect of this approach is that a process can sandbox itself
> with a specific Landlock security policy and create an io_uring file
> descriptor that will inherit the Landlock restrictions.  It can then
> pass this FD to other processes with the guarantee that this FD will
> only give access to resources allowed by the Landlock policy.
> 
> Landlock could implement the security_uring_sqpoll() hook, but for now
> the use case is not clear to me.  We are working on controling socket
> creation according to they properties and I think the same approach
> would be useful for IO_URING:
> https://lore.kernel.org/r/20251118134639.3314803-1-ivanov.mikhail1@huawei-partners.com
> 
> I agree that this might be confusing and I plan to improve Landlock
> documentation to make this clear and simpler for AI to take into
> account.

I think updating the documentation is a good idea. Most of these are AI
nonsense anyway, and it'd hopefully help if the documentation reflected
what you wrote above. Even if not, then at least when the next one of
these slop reports come in, the reply can be as simple as just linking
to the documentation.

> BTW, we also have an opened issue to add io_uring tests:
> https://github.com/landlock-lsm/linux/issues/23

Nice!

-- 
Jens Axboe

^ permalink raw reply

* Re: Landlock: LANDLOCK_ACCESS_NET_CONNECT_TCP bypass via TCP Fast Open
From: Mickaël Salaün @ 2026-06-17 14:22 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Günther Noack, Matthieu Buffet, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-security-module, netdev, linux-kernel,
	Mikhail Ivanov
In-Reply-To: <20260616201615.275032-1-hexlabsecurity@proton.me>

Hi,

Thanks for the report.  This was previously identified by Mikhail and
Matthieu, see the related issue:
https://github.com/landlock-lsm/linux/issues/41


On Tue, Jun 16, 2026 at 08:16:22PM +0000, Bryam Vargas wrote:
> Hello Mickaël, and Landlock folks,
> 
> A task confined by a Landlock ruleset that handles
> LANDLOCK_ACCESS_NET_CONNECT_TCP and is denied connecting to a given port can
> still establish a TCP connection to that port by using TCP Fast Open, i.e.
> sendto(fd, ..., MSG_FASTOPEN, &dst, dstlen) on a fresh stream socket. The
> network-egress confinement for TCP connect is silently bypassed.
> 
> Affected
> --------
> Any kernel with CONFIG_SECURITY_LANDLOCK=y and Landlock enabled that supports
> the TCP network access rights (Landlock ABI >= 4, since Linux 6.7). Confirmed by
> source inspection on mainline (v7.1-rc7) and reproduced on Linux 7.0.11
> (Landlock ABI 8). No CONFIG beyond Landlock + IPv4/IPv6 TCP; TCP Fast Open client
> is enabled by the per-netns default (net.ipv4.tcp_fastopen has TFO_CLIENT_ENABLE
> set), so no sysctl change and no setsockopt are required.
> 
> Root cause
> ----------
> LANDLOCK_ACCESS_NET_CONNECT_TCP is enforced only by the socket_connect LSM hook
> (hook_socket_connect -> current_check_access_socket). security_socket_connect()
> has exactly one call site in the tree, net/socket.c (the connect(2) syscall).
> 
> TCP Fast Open performs an implicit connect inside sendmsg:
> 
>   tcp_sendmsg_locked()            net/ipv4/tcp.c  (MSG_FASTOPEN branch)
>    -> tcp_sendmsg_fastopen()      net/ipv4/tcp.c
>    -> __inet_stream_connect(..., is_sendmsg=1)  net/ipv4/af_inet.c
>    -> sk->sk_prot->connect()      net/ipv4/af_inet.c  -> tcp_v4_connect()
> 
> This path establishes the connection to the address taken from msg_name but
> never calls security_socket_connect(). The only LSM hook fired on the sendmsg
> path is security_socket_sendmsg(), and Landlock registers no socket_sendmsg
> hook, so LANDLOCK_ACCESS_NET_CONNECT_TCP is never re-checked. __inet_stream_connect()
> itself carries no LSM hook (only the cgroup-BPF pre_connect, a different
> mechanism).
> 
> Notably the kernel already mediates the analogous AF_UNIX implicit-connect on the
> send path via the unix_may_send hook, which Landlock does register
> (hook_unix_may_send) -- so the sendmsg-implies-connect pattern is recognized, but
> the TCP Fast Open case has no equivalent coverage. The MPTCP fast-open path
> (mptcp_sendmsg_fastopen -> __inet_stream_connect) is a second producer of the
> same unmediated connect (by source inspection; not separately reproduced).
> 
> Reproducer
> ----------
> A self-contained, fully unprivileged PoC is available on request. It forks an
> unconfined TFO-capable loopback listener, then in a child applies a Landlock
> ruleset handling LANDLOCK_ACCESS_NET_CONNECT_TCP with no allow rule
> (landlock_create_ruleset() with handled_access_net =
> LANDLOCK_ACCESS_NET_CONNECT_TCP, no landlock_add_rule(), then
> landlock_restrict_self(); every TCP connect is denied) and tries the forbidden
> port two ways:
> 
>   (1) connect(fd, &dst)                 -> -EACCES   (Landlock enforces CONNECT_TCP)
>   (2) sendto(fd2, buf, len, MSG_FASTOPEN, &dst, dstlen)
>                                         -> succeeds; the listener accepts the
>                                            connection and reads the payload.
> 
> Observed on Linux 7.0.11 (Landlock ABI 8):
> 
>   [1] connect(2)            -> ret=-1 errno=13 (Permission denied)
>   [2] sendto(MSG_FASTOPEN)  -> ret=14 errno=0 (OK/queued)
>   [+] listener ACCEPTED the confined child's connection; payload="..."
> 
> connect(2) to the port is denied while sendto(MSG_FASTOPEN) reaches the identical
> port and delivers data.
> 
> Impact
> ------
> A sandbox that uses LANDLOCK_ACCESS_NET_CONNECT_TCP to restrict outbound TCP
> (e.g. to keep a confined component from reaching an internal service or a
> metadata endpoint) can be escaped by an unprivileged, self-confined task with no
> CAP and no namespace transition -- for any destination port, since the
> implicit-connect path never consults the connect hook regardless of address (the
> run above shows one port). It is an integrity
> bypass of the network-confinement property; no memory safety is involved.
> I score it CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:N/I:H/A:N (6.5 Medium) -- the
> confined task escapes the policy authority that defined its sandbox, a scope
> change; 5.5 if you treat the Landlock boundary as the same authority (S:U).
> 
> Note on the in-flight UDP series
> --------------------------------
> The "landlock: Add UDP access control support" series (v5, Matthieu Buffet,
> https://lore.kernel.org/r/20260611162107.49278-3-matthieu@buffet.re) adds a
> socket_sendmsg hook, hook_socket_sendmsg(), but it returns 0 for non-UDP
> sockets:
> 
>     if (sk_is_udp(sock->sk))
>             access_request = LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP;
>     else
>             return 0;
> 
> so a TCP socket using MSG_FASTOPEN still bypasses LANDLOCK_ACCESS_NET_CONNECT_TCP
> even after that series lands. It may be most convenient to fix this there.
> 
> Suggested direction
> -------------------
> Re-check LANDLOCK_ACCESS_NET_CONNECT_TCP on the implicit-connect path: either have
> the socket_sendmsg hook evaluate CONNECT_TCP for stream sockets when the call
> performs an implicit connect (mirroring the AF_UNIX unix_may_send handling), or
> place the check inside __inet_stream_connect() so a single chokepoint covers
> connect(2), TCP Fast Open, and the MPTCP fast-open sibling.
> 
> I am happy to send a patch for this if you would like me to.

Yes please.

> 
> Best regards,
> 
> Bryam Vargas
> Independent security researcher, HEXLAB S.A.S., Cali, Colombia
> hexlabsecurity@proton.me
> 
> 

^ permalink raw reply

* Re: Landlock: LANDLOCK_ACCESS_FS_IOCTL_DEV bypass via io_uring IORING_OP_URING_CMD
From: Mickaël Salaün @ 2026-06-17 14:16 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bryam Vargas, Günther Noack, Paul Moore, Keith Busch,
	Christoph Hellwig, Sagi Grimberg, linux-security-module,
	Tingmao Wang
In-Reply-To: <209b76b4-e028-4af7-bdcb-b5813fef32fc@kernel.dk>

On Tue, Jun 16, 2026 at 08:44:55PM -0600, Jens Axboe wrote:
> On 6/16/26 8:25 PM, Bryam Vargas wrote:
> > Thanks Jens ? noted, the fix belongs in Landlock. Micka?l has the full
> > report.
> 
> Indeed - and hence no need to bother anyone else with it by blasting it
> wide. I've already explained this multiple times, but on the private
> security list, when the occasional AI report comes in on things like
> this. Hence why it's a bit tiring to see the same stuff come across,
> once again.
> 
> For the landlock folks, I'd suggest taking a look at what hooks already
> exists (and existed, when landlock was merged) for selinux etc, that'd
> be a really good hint on the existing surface covered.

As I explained in previous (private) reports, there is currently no
io_uring hooks implemented for Landlock because there is no use for
them.

io_uring "bypass" was already mentioned to us two times in March but
io_uring personality credential is not a Landlock bypass.  The Landlock
threat model is about enforcing restrictions when accessing new kernel
resources, on a sandboxed subject.  The credential identifies a set of
access rights, so in the case of io_uring, the subject is inherited by
the io_uring personality (i.e. the file descriptor).  If a sandboxed
task creates an io_uring personality, it will be sandboxed with the same
restrictions, which is BTW an interesting property (e.g.  pass a
restricted io_uring FD to processes)

A sandboxed process cannot create an io_uring personality that would
bypass its own restrictions, so there is no Landlock bypass.  Inheriting
or receiving a file descriptor is not restricted by Landlock because
these are operations from outside (or before) the sandbox.  If we want
to restrict them, we need to restrict the processes creating such file
descriptor.

Inherited or passed file descriptors are outside the Landlock threat
model because Landlock is only one part of the security policy when
(willingly) interacting with other processes.  In a nutshell, it's the
security capability model (where an object has some associated rights).
For instance, if a process willingly passes a file descriptor tied to a
secret file, then the receiving side can (and should be able to) read
it, being sandboxed with Landlock or not.  The scope of Landlock is to
drop ambient rights, but if an *unsandboxed* process is OK to pass a
sensitive resource, then that's a security architecture issue (i.e. a
confused deputy attack).

A nice side effect of this approach is that a process can sandbox itself
with a specific Landlock security policy and create an io_uring file
descriptor that will inherit the Landlock restrictions.  It can then
pass this FD to other processes with the guarantee that this FD will
only give access to resources allowed by the Landlock policy.

Landlock could implement the security_uring_sqpoll() hook, but for now
the use case is not clear to me.  We are working on controling socket
creation according to they properties and I think the same approach
would be useful for IO_URING:
https://lore.kernel.org/r/20251118134639.3314803-1-ivanov.mikhail1@huawei-partners.com

I agree that this might be confusing and I plan to improve Landlock
documentation to make this clear and simpler for AI to take into
account.

BTW, we also have an opened issue to add io_uring tests:
https://github.com/landlock-lsm/linux/issues/23

Regards,
 Mickaël

^ permalink raw reply

* Re: [PATCH v5 7/8] vfs: Replace security_sb_mount/security_move_mount with granular hooks
From: Christian Brauner @ 2026-06-17 13:53 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, paul,
	jmorris, serge, viro, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <20260528182607.3150386-8-song@kernel.org>

On Thu, May 28, 2026 at 11:26:06AM -0700, Song Liu wrote:
> Replace the monolithic security_sb_mount() call in path_mount() and
> security_move_mount() in vfs_move_mount() with the new granular mount
> hooks:
> 
> - do_loopback(): call security_mount_bind()
> - do_new_mount(): call security_mount_new()
> - do_remount(): call security_mount_remount()
> - do_reconfigure_mnt(): call security_mount_reconfigure()
> - do_move_mount_old(): call security_mount_move()
> - do_change_type(): call security_mount_change_type()
> - vfs_move_mount(): replace security_move_mount() with
>   security_mount_move()
> 
> The new hooks are called at the individual operation level with
> appropriate context (resolved paths, fs_context), rather than at
> the top of path_mount() with raw string arguments.
> 
> Code generated with the assistance of Claude, reviewed by human.
> 
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  fs/namespace.c | 41 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index fe919abd2f01..43f22c5e2bf4 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2888,6 +2888,10 @@ static int do_change_type(const struct path *path, int ms_flags)
>  	if (!type)
>  		return -EINVAL;
>  
> +	err = security_mount_change_type(path, ms_flags);
> +	if (err)
> +		return err;
> +
>  	guard(namespace_excl)();
>  
>  	err = may_change_propagation(mnt);
> @@ -3006,6 +3010,10 @@ static int do_loopback(const struct path *path, const char *old_name,
>  	if (err)
>  		return err;
>  
> +	err = security_mount_bind(&old_path, path, recurse);
> +	if (err)
> +		return err;

This again is racy as it is called outside of the namespace semaphore:

        err = security_mount_bind(&old_path, path, recurse);
        if (err)
                return err;

        if (mnt_ns_loop(old_path.dentry))
                return -EINVAL;

        LOCK_MOUNT(mp, path);
        if (IS_ERR(mp.parent))
                return PTR_ERR(mp.parent);

After LOCK_MOUNT @path might point to a completely different mount then
the one you performed your security checks on.

> +
>  	if (mnt_ns_loop(old_path.dentry))
>  		return -EINVAL;
>  
> @@ -3328,7 +3336,8 @@ static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
>   * superblock it refers to.  This is triggered by specifying MS_REMOUNT|MS_BIND
>   * to mount(2).
>   */
> -static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
> +static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags,
> +			      unsigned long flags)
>  {
>  	struct super_block *sb = path->mnt->mnt_sb;
>  	struct mount *mnt = real_mount(path->mnt);
> @@ -3343,6 +3352,10 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
>  	if (!can_change_locked_flags(mnt, mnt_flags))
>  		return -EPERM;
>  
> +	ret = security_mount_reconfigure(path, mnt_flags, flags);
> +	if (ret)
> +		return ret;
> +
>  	/*
>  	 * We're only checking whether the superblock is read-only not
>  	 * changing it, so only take down_read(&sb->s_umount).
> @@ -3366,7 +3379,7 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
>   * on it - tough luck.
>   */
>  static int do_remount(const struct path *path, int sb_flags,
> -		      int mnt_flags, void *data)
> +		      int mnt_flags, void *data, unsigned long flags)
>  {
>  	int err;
>  	struct super_block *sb = path->mnt->mnt_sb;
> @@ -3393,6 +3406,9 @@ static int do_remount(const struct path *path, int sb_flags,
>  	fc->oldapi = true;
>  
>  	err = parse_monolithic_mount_data(fc, data);
> +	if (!err)
> +		err = security_mount_remount(fc, path, mnt_flags, flags,
> +					    data);
>  	if (!err) {
>  		down_write(&sb->s_umount);
>  		err = -EPERM;
> @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
>  	if (err)
>  		return err;
>  
> +	err = security_mount_move(&old_path, path);
> +	if (err)
> +		return err;

Placement of this hook suffers from the same issue as the bind mount
hook. Here it's worse because the security layer isn't even informed
about MOVE_MOUNT_BENEATH which completely alters the mount relationship.

^ permalink raw reply

* Re: [GIT PULL] selinux/selinux-pr-20260615
From: Kuan-Wei Chiu @ 2026-06-17 13:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Moore, selinux, linux-security-module, linux-kernel,
	Andrew Morton, jserv, marscheng
In-Reply-To: <CAHk-=wjBF1ZXNRRqnA+KDFzqxZaXPgmDc8=Ly3+RdxUXWuve9Q@mail.gmail.com>

Hi Linus,

On Wed, Jun 17, 2026 at 12:54:44PM +0100, Linus Torvalds wrote:
> On Tue, 16 Jun 2026 at 03:55, Paul Moore <paul@paul-moore.com> wrote:
> >
> > - Avoid nontransitive comparisons comparisons in our sorting code
> >
> > Done to prevent unexpected sorting results due to overflow.  Qualys
> > documented a similar issue with glibc:
> > https://www.qualys.com/2024/01/30/qsort.txt
> 
> So this is clearly worth fixing in the selinux code regardless, but
> did anybody check whether our sorting routines in lib/sort.c actually
> have any overflow issues with non-transitive comparison functions?
> 
> Strange sort order may be confusing but tends to be largely harmless
> (the confusion might then obviously cause other issues)
> 
>  The whole "confuses the sort function enough to result in bad
> accesses" might be worth fixing in lib/sort.c if somebody looked into
> it...
> 
Since I made most of the recent changes to lib/sort.c, I can
hopefully shed some light on this.

With the current Linux lib/sort.c implementation, passing a compare
function that lacks transitivity will absolutely **not** lead to any
out-of-bounds memory accesses. Unlike glibc which defaults to merge
sort and falls back to heapsort if malloc fails, the kernel uses a
strict in-place heapsort. Because of this, the compare and swap
operations will always operate safely within the boundaries of the
provided array.

However, it still inevitably leads to unexpected sorting results. This
has caused actual user-visible issues in the past (the previous acpi
breakage being an example [1][2]). It turns out it is easy for people
to accidentally write comparators that violate transitivity, which is
why I submitted a patch previously to emphasize the properties a
comparator must satisfy. [3]

I have actually thought about whether we could detect transitivity
violations at runtime. But if we map this to graph theory: treating
each element as a node and the comparison results as directed edges,
detecting a violation is equivalent to finding a cycle in the graph.
Doing this would require an O(n^2) time complexity, which is obviously
unacceptable at runtime.

[1]: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=233323f9b9f828cd7cd5145ad811c1990b692542
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e0a15f8b4bd47548032acccdbeb5b9083b3675e

Regards,
Kuan-Wei

^ permalink raw reply

* Re: [GIT PULL] lsm/lsm-pr-20260615
From: pr-tracker-bot @ 2026-06-17 11:59 UTC (permalink / raw)
  To: Paul Moore; +Cc: Linus Torvalds, linux-security-module, linux-kernel
In-Reply-To: <9b359189953cc739f62fc94af4c24a27@paul-moore.com>

The pull request you sent on Mon, 15 Jun 2026 22:55:38 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git tags/lsm-pr-20260615

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/87599bd29856ea7bfdd62591c581c8be5a4719ee

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [GIT PULL] selinux/selinux-pr-20260615
From: pr-tracker-bot @ 2026-06-17 11:58 UTC (permalink / raw)
  To: Paul Moore; +Cc: Linus Torvalds, selinux, linux-security-module, linux-kernel
In-Reply-To: <577e6fb29cf0b9c335748aa5fa026275@paul-moore.com>

The pull request you sent on Mon, 15 Jun 2026 22:55:42 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git tags/selinux-pr-20260615

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/231e9d447ea97033ae8b8dff7b910e6269d7c5af

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [GIT PULL] selinux/selinux-pr-20260615
From: Linus Torvalds @ 2026-06-17 11:54 UTC (permalink / raw)
  To: Paul Moore; +Cc: selinux, linux-security-module, linux-kernel
In-Reply-To: <577e6fb29cf0b9c335748aa5fa026275@paul-moore.com>

On Tue, 16 Jun 2026 at 03:55, Paul Moore <paul@paul-moore.com> wrote:
>
> - Avoid nontransitive comparisons comparisons in our sorting code
>
> Done to prevent unexpected sorting results due to overflow.  Qualys
> documented a similar issue with glibc:
> https://www.qualys.com/2024/01/30/qsort.txt

So this is clearly worth fixing in the selinux code regardless, but
did anybody check whether our sorting routines in lib/sort.c actually
have any overflow issues with non-transitive comparison functions?

Strange sort order may be confusing but tends to be largely harmless
(the confusion might then obviously cause other issues)

 The whole "confuses the sort function enough to result in bad
accesses" might be worth fixing in lib/sort.c if somebody looked into
it...

                 Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox