* [PATCH v6 0/5] lsm: introduce lsm_config_self_policy() and lsm_config_system_policy() syscalls
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
This patchset introduces two new syscalls: lsm_config_self_policy(),
lsm_config_system_policy() and the associated Linux Security Module hooks
security_lsm_config_*_policy(), providing a unified interface for loading
and managing LSM policies. These syscalls complement the existing per‑LSM
pseudo‑filesystem mechanism and work even when those filesystems are not
mounted or available.
With these new syscalls, users and administrators may lock down access to
the pseudo‑filesystem yet still manage LSM policies. Two tightly-scoped
entry points then replace the many file operations exposed by those
filesystems, significantly reducing the attack surface. This is
particularly useful in containers or processes already confined by
Landlock, where these pseudo‑filesystems are typically unavailable.
Because they provide a logical and unified interface, these syscalls are
simpler to use than several heterogeneous pseudo‑filesystems and avoid
edge cases such as partially loaded policies. They also eliminates VFS
overhead, yielding performance gains notably when many policies are
loaded, for instance at boot time.
This initial implementation is intentionally minimal to limit the scope
of changes. Currently, only policy loading is supported. This new LSM
hook is currently registered by AppArmor, SELinux and Smack. However, any
LSM can adopt this interface, and future patches could extend this
syscall to support more operations, such as replacing, removing, or
querying loaded policies.
Landlock already provides three Landlock‑specific syscalls (e.g.
landlock_add_rule()) to restrict ambient rights for sets of processes
without touching any pseudo-filesystem. lsm_config_*_policy() generalizes
that approach to the entire LSM layer, so any module can choose to
support either or both of these syscalls, and expose its policy
operations through a uniform interface and reap the advantages outlined
above.
This patchset is available at [1], a minimal user space example
showing how to use lsm_config_system_policy with AppArmor is at [2] and a
performance benchmark of both syscalls is available at [3].
[1] https://github.com/emixam16/linux/tree/lsm_syscall_v6
[2] https://gitlab.com/emixam16/apparmor/tree/lsm_syscall_v6
[3] https://gitlab.com/-/snippets/4864908
---
Changes in v6
- Add support for SELinux and Smack
Changes in v5
- Improve syscall input verification
- Do not export security_lsm_config_*_policy symbols
Changes in v4
- Make the syscall's maximum buffer size defined per module
- Fix a memory leak
Changes in v3
- Fix typos
Changes in v2
- Split lsm_manage_policy() into two distinct syscalls:
lsm_config_self_policy() and lsm_config_system_policy()
- The LSM hook now calls only the appropriate LSM (and not all LSMs)
- Add a configuration variable to limit the buffer size of these
syscalls
- AppArmor now allows stacking policies through lsm_config_self_policy()
and loading policies in any namespace through
lsm_config_system_policy()
---
Maxime Bélair (5):
Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
lsm: introduce security_lsm_config_*_policy hooks
AppArmor: add support for lsm_config_self_policy and
lsm_config_system_policy
SELinux: add support for lsm_config_system_policy
Smack: add support for lsm_config_self_policy and
lsm_config_system_policy
arch/alpha/kernel/syscalls/syscall.tbl | 2 +
arch/arm/tools/syscall.tbl | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 2 +
arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
arch/parisc/kernel/syscalls/syscall.tbl | 2 +
arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
arch/s390/kernel/syscalls/syscall.tbl | 2 +
arch/sh/kernel/syscalls/syscall.tbl | 2 +
arch/sparc/kernel/syscalls/syscall.tbl | 2 +
arch/x86/entry/syscalls/syscall_32.tbl | 2 +
arch/x86/entry/syscalls/syscall_64.tbl | 2 +
arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
include/linux/lsm_hook_defs.h | 4 +
include/linux/security.h | 20 +++++
include/linux/syscalls.h | 5 ++
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/lsm.h | 8 ++
kernel/sys_ni.c | 2 +
security/apparmor/apparmorfs.c | 31 +++++++
security/apparmor/include/apparmor.h | 4 +
security/apparmor/include/apparmorfs.h | 3 +
security/apparmor/lsm.c | 84 +++++++++++++++++++
security/lsm_syscalls.c | 21 +++++
security/security.c | 60 +++++++++++++
security/selinux/hooks.c | 27 ++++++
security/selinux/include/security.h | 7 ++
security/selinux/selinuxfs.c | 16 +++-
security/smack/smack.h | 8 ++
security/smack/smack_lsm.c | 73 ++++++++++++++++
security/smack/smackfs.c | 2 +-
tools/include/uapi/asm-generic/unistd.h | 6 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 2 +
35 files changed, 412 insertions(+), 7 deletions(-)
base-commit: 9c32cda43eb78f78c73aee4aa344b777714e259b
--
2.48.1
^ permalink raw reply
* [PATCH v6 2/5] lsm: introduce security_lsm_config_*_policy hooks
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Define two new LSM hooks: security_lsm_config_self_policy and
security_lsm_config_system_policy and wire them into the corresponding
lsm_config_*_policy() syscalls so that LSMs can register a unified
interface for policy management. This initial, minimal implementation
only supports the LSM_POLICY_LOAD operation to limit changes.
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
include/linux/lsm_hook_defs.h | 4 +++
include/linux/security.h | 20 ++++++++++++
include/uapi/linux/lsm.h | 8 +++++
security/lsm_syscalls.c | 13 ++++++--
security/security.c | 60 +++++++++++++++++++++++++++++++++++
5 files changed, 103 insertions(+), 2 deletions(-)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index bf3bbac4e02a..50b6e8aed787 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -464,3 +464,7 @@ LSM_HOOK(int, 0, bdev_alloc_security, struct block_device *bdev)
LSM_HOOK(void, LSM_RET_VOID, bdev_free_security, struct block_device *bdev)
LSM_HOOK(int, 0, bdev_setintegrity, struct block_device *bdev,
enum lsm_integrity_type type, const void *value, size_t size)
+LSM_HOOK(int, -EINVAL, lsm_config_self_policy, u32 op, void __user *buf,
+ size_t size, u32 flags)
+LSM_HOOK(int, -EINVAL, lsm_config_system_policy, u32 op,
+ void __user *buf, size_t size, u32 flags)
diff --git a/include/linux/security.h b/include/linux/security.h
index cc9b54d95d22..54acaee4a994 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -581,6 +581,11 @@ void security_bdev_free(struct block_device *bdev);
int security_bdev_setintegrity(struct block_device *bdev,
enum lsm_integrity_type type, const void *value,
size_t size);
+int security_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags);
+int security_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags);
+
#else /* CONFIG_SECURITY */
/**
@@ -1603,6 +1608,21 @@ static inline int security_bdev_setintegrity(struct block_device *bdev,
return 0;
}
+static inline int security_lsm_config_self_policy(u32 lsm_id, u32 op,
+ void __user *buf,
+ size_t size, u32 flags)
+{
+
+ return -EOPNOTSUPP;
+}
+
+static inline int security_lsm_config_system_policy(u32 lsm_id, u32 op,
+ void __user *buf,
+ size_t size, u32 flags)
+{
+
+ return -EOPNOTSUPP;
+}
#endif /* CONFIG_SECURITY */
#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
index 938593dfd5da..2b9432a30cdc 100644
--- a/include/uapi/linux/lsm.h
+++ b/include/uapi/linux/lsm.h
@@ -90,4 +90,12 @@ struct lsm_ctx {
*/
#define LSM_FLAG_SINGLE 0x0001
+/*
+ * LSM_POLICY_XXX definitions identify the different operations
+ * to configure LSM policies
+ */
+
+#define LSM_POLICY_UNDEF 0
+#define LSM_POLICY_LOAD 100
+
#endif /* _UAPI_LINUX_LSM_H */
diff --git a/security/lsm_syscalls.c b/security/lsm_syscalls.c
index b02a7623dea6..0796673b6f19 100644
--- a/security/lsm_syscalls.c
+++ b/security/lsm_syscalls.c
@@ -122,11 +122,20 @@ SYSCALL_DEFINE3(lsm_list_modules, u64 __user *, ids, u32 __user *, size,
SYSCALL_DEFINE6(lsm_config_self_policy, u32, lsm_id, u32, op, void __user *,
buf, u32 __user, size, u32, common_flags, u32, flags)
{
- return 0;
+ if (common_flags) // Reserved for future use
+ return -EINVAL;
+
+ return security_lsm_config_self_policy(lsm_id, op, buf, size, flags);
}
SYSCALL_DEFINE6(lsm_config_system_policy, u32, lsm_id, u32, op, void __user *,
buf, u32 __user, size, u32, common_flags, u32, flags)
{
- return 0;
+ if (common_flags) // Reserved for future use
+ return -EINVAL;
+
+ if (!capable(CAP_MAC_ADMIN))
+ return -EPERM;
+
+ return security_lsm_config_system_policy(lsm_id, op, buf, size, flags);
}
diff --git a/security/security.c b/security/security.c
index fb57e8fddd91..eeb61b27cd56 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5883,6 +5883,66 @@ int security_bdev_setintegrity(struct block_device *bdev,
}
EXPORT_SYMBOL(security_bdev_setintegrity);
+/**
+ * security_lsm_config_self_policy() - Configure caller's LSM policies
+ * @lsm_id: id of the LSM to target
+ * @op: Operation to perform (one of the LSM_POLICY_XXX values)
+ * @buf: userspace pointer to policy data
+ * @size: size of @buf
+ * @flags: lsm policy configuration flags
+ *
+ * Configure the policies of a LSM for the current domain/user. This notably
+ * allows to update them even when the lsmfs is unavailable or restricted.
+ * Currently, only LSM_POLICY_LOAD is supported.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ int rc = LSM_RET_DEFAULT(lsm_config_self_policy);
+ struct lsm_static_call *scall;
+
+ lsm_for_each_hook(scall, lsm_config_self_policy) {
+ if ((scall->hl->lsmid->id) == lsm_id) {
+ rc = scall->hl->hook.lsm_config_self_policy(op, buf, size, flags);
+ break;
+ }
+ }
+
+ return rc;
+}
+
+/**
+ * security_lsm_config_system_policy() - Configure system LSM policies
+ * @lsm_id: id of the lsm to target
+ * @op: Operation to perform (one of the LSM_POLICY_XXX values)
+ * @buf: userspace pointer to policy data
+ * @size: size of @buf
+ * @flags: lsm policy configuration flags
+ *
+ * Configure the policies of a LSM for the whole system. This notably allows
+ * to update them even when the lsmfs is unavailable or restricted. Currently,
+ * only LSM_POLICY_LOAD is supported.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ int rc = LSM_RET_DEFAULT(lsm_config_system_policy);
+ struct lsm_static_call *scall;
+
+ lsm_for_each_hook(scall, lsm_config_system_policy) {
+ if ((scall->hl->lsmid->id) == lsm_id) {
+ rc = scall->hl->hook.lsm_config_system_policy(op, buf, size, flags);
+ break;
+ }
+ }
+
+ return rc;
+}
+
#ifdef CONFIG_PERF_EVENTS
/**
* security_perf_event_open() - Check if a perf event open is allowed
--
2.48.1
^ permalink raw reply related
* [PATCH v6 3/5] AppArmor: add support for lsm_config_self_policy and lsm_config_system_policy
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Enable users to manage AppArmor policies through the new hooks
lsm_config_self_policy and lsm_config_system_policy.
lsm_config_self_policy allows stacking existing policies in the kernel.
This ensures that it can only further restrict the caller and can never
be used to gain new privileges.
lsm_config_system_policy allows loading or replacing AppArmor policies in
any AppArmor namespace and is restricted to CAP_MAC_ADMIN.
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
security/apparmor/apparmorfs.c | 31 ++++++++++
security/apparmor/include/apparmor.h | 4 ++
security/apparmor/include/apparmorfs.h | 3 +
security/apparmor/lsm.c | 84 ++++++++++++++++++++++++++
4 files changed, 122 insertions(+)
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 6039afae4bfc..6df43299b045 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -439,6 +439,37 @@ static ssize_t policy_update(u32 mask, const char __user *buf, size_t size,
return error;
}
+/**
+ * aa_profile_load_ns_name - load a profile into the current namespace identified by name
+ * @name: The name of the namesapce to load the policy in. "" for root_ns
+ * @name_size: size of @name. 0 For root ns
+ * @buf: buffer containing the user-provided policy
+ * @size: size of @buf
+ * @ppos: position pointer in the file
+ *
+ * Returns: 0 on success, negative value on error
+ */
+ssize_t aa_profile_load_ns_name(char *name, size_t name_size, const void __user *buf,
+ size_t size, loff_t *ppos)
+{
+ struct aa_ns *ns;
+
+ if (name_size == 0)
+ ns = aa_get_ns(root_ns);
+ else
+ ns = aa_lookupn_ns(root_ns, name, name_size);
+
+ if (!ns)
+ return -EINVAL;
+
+ int error = policy_update(AA_MAY_LOAD_POLICY | AA_MAY_REPLACE_POLICY,
+ buf, size, ppos, ns);
+
+ aa_put_ns(ns);
+
+ return error >= 0 ? 0 : error;
+}
+
/* .load file hook fn to load policy */
static ssize_t profile_load(struct file *f, const char __user *buf, size_t size,
loff_t *pos)
diff --git a/security/apparmor/include/apparmor.h b/security/apparmor/include/apparmor.h
index f83934913b0f..1d9a2881a8b9 100644
--- a/security/apparmor/include/apparmor.h
+++ b/security/apparmor/include/apparmor.h
@@ -62,5 +62,9 @@ extern unsigned int aa_g_path_max;
#define AA_DEFAULT_CLEVEL 0
#endif /* CONFIG_SECURITY_APPARMOR_EXPORT_BINARY */
+/* Syscall-related buffer size limits */
+
+#define AA_PROFILE_NAME_MAX_SIZE (1 << 9)
+#define AA_PROFILE_MAX_SIZE (1 << 28)
#endif /* __APPARMOR_H */
diff --git a/security/apparmor/include/apparmorfs.h b/security/apparmor/include/apparmorfs.h
index 1e94904f68d9..fd415afb7659 100644
--- a/security/apparmor/include/apparmorfs.h
+++ b/security/apparmor/include/apparmorfs.h
@@ -112,6 +112,9 @@ int __aafs_profile_mkdir(struct aa_profile *profile, struct dentry *parent);
void __aafs_ns_rmdir(struct aa_ns *ns);
int __aafs_ns_mkdir(struct aa_ns *ns, struct dentry *parent, const char *name,
struct dentry *dent);
+ssize_t aa_profile_load_ns_name(char *name, size_t name_len, const void __user *buf,
+ size_t size, loff_t *ppos);
+
struct aa_loaddata;
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 9b6c2f157f83..0c127f9dae19 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1275,6 +1275,86 @@ static int apparmor_socket_shutdown(struct socket *sock, int how)
return aa_sock_perm(OP_SHUTDOWN, AA_MAY_SHUTDOWN, sock);
}
+/**
+ * apparmor_lsm_config_self_policy - Stack a profile
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: buffer containing the user-provided name of the profile to stack
+ * @size: size of @buf
+ * @flags: reserved for future use; must be zero
+ *
+ * Returns: 0 on success, negative value on error
+ */
+static int apparmor_lsm_config_self_policy(u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ char *name;
+ long name_size;
+ int ret;
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+ if (size == 0)
+ return -EINVAL;
+ if (size > AA_PROFILE_NAME_MAX_SIZE)
+ return -E2BIG;
+
+ name = kmalloc(size, GFP_KERNEL);
+ if (!name)
+ return -ENOMEM;
+
+ name_size = strncpy_from_user(name, buf, size);
+ if (name_size <= 0) {
+ kfree(name);
+ return name_size;
+ } else if (name_size == size) {
+ kfree(name);
+ return -E2BIG;
+ }
+
+ ret = aa_change_profile(name, AA_CHANGE_STACK);
+
+ kfree(name);
+
+ return ret;
+}
+
+/**
+ * apparmor_lsm_config_system_policy - Load or replace a system policy
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: user-supplied buffer in the form "<ns>\0<policy>"
+ * <ns> is the namespace to load the policy into (empty string for root)
+ * <policy> is the policy to load
+ * @size: size of @buf
+ * @flags: reserved for future uses; must be zero
+ *
+ * Returns: 0 on success, negative value on error
+ */
+static int apparmor_lsm_config_system_policy(u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ loff_t pos = 0; // Partial writing is not currently supported
+ char ns_name[AA_PROFILE_NAME_MAX_SIZE];
+ size_t ns_size;
+ size_t max_ns_size = min(size, AA_PROFILE_NAME_MAX_SIZE);
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+ if (size < 2)
+ return -EINVAL;
+ if (size > AA_PROFILE_MAX_SIZE)
+ return -E2BIG;
+
+ ns_size = strncpy_from_user(ns_name, buf, max_ns_size);
+ if (ns_size < 0)
+ return ns_size;
+ if (ns_size == max_ns_size)
+ return -E2BIG;
+
+ return aa_profile_load_ns_name(ns_name, ns_size, buf + ns_size + 1,
+ size - ns_size - 1, &pos);
+}
+
+
#ifdef CONFIG_NETWORK_SECMARK
/**
* apparmor_socket_sock_rcv_skb - check perms before associating skb to sk
@@ -1483,6 +1563,10 @@ static struct security_hook_list apparmor_hooks[] __ro_after_init = {
LSM_HOOK_INIT(socket_getsockopt, apparmor_socket_getsockopt),
LSM_HOOK_INIT(socket_setsockopt, apparmor_socket_setsockopt),
LSM_HOOK_INIT(socket_shutdown, apparmor_socket_shutdown),
+
+ LSM_HOOK_INIT(lsm_config_self_policy, apparmor_lsm_config_self_policy),
+ LSM_HOOK_INIT(lsm_config_system_policy,
+ apparmor_lsm_config_system_policy),
#ifdef CONFIG_NETWORK_SECMARK
LSM_HOOK_INIT(socket_sock_rcv_skb, apparmor_socket_sock_rcv_skb),
#endif
--
2.48.1
^ permalink raw reply related
* [PATCH v6 1/5] Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Add support for the new lsm_config_self_policy and
lsm_config_system_policy syscalls, providing a unified API for loading
and modifying LSM policies, for the current user and for the entire
system, respectively without requiring the LSM’s pseudo-filesystems.
Benefits:
- Works even if the LSM pseudo-filesystem isn’t mounted or available
(e.g. in containers)
- Offers a logical and unified interface rather than multiple
heterogeneous pseudo-filesystems
- Avoids the overhead of other kernel interfaces for better efficiency
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
arch/alpha/kernel/syscalls/syscall.tbl | 2 ++
arch/arm/tools/syscall.tbl | 2 ++
arch/m68k/kernel/syscalls/syscall.tbl | 2 ++
arch/microblaze/kernel/syscalls/syscall.tbl | 2 ++
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 ++
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 ++
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 ++
arch/parisc/kernel/syscalls/syscall.tbl | 2 ++
arch/powerpc/kernel/syscalls/syscall.tbl | 2 ++
arch/s390/kernel/syscalls/syscall.tbl | 2 ++
arch/sh/kernel/syscalls/syscall.tbl | 2 ++
arch/sparc/kernel/syscalls/syscall.tbl | 2 ++
arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
arch/xtensa/kernel/syscalls/syscall.tbl | 2 ++
include/linux/syscalls.h | 5 +++++
include/uapi/asm-generic/unistd.h | 6 +++++-
kernel/sys_ni.c | 2 ++
security/lsm_syscalls.c | 12 ++++++++++++
tools/include/uapi/asm-generic/unistd.h | 6 +++++-
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
21 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 2dd6340de6b4..4fc75352220d 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -507,3 +507,5 @@
575 common listxattrat sys_listxattrat
576 common removexattrat sys_removexattrat
577 common open_tree_attr sys_open_tree_attr
+578 common lsm_config_self_policy sys_lsm_config_self_policy
+579 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 27c1d5ebcd91..326483cb94a4 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -482,3 +482,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 9fe47112c586..d37364df1cd7 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -467,3 +467,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 7b6e97828e55..9d58ebfcf967 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -473,3 +473,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index aa70e371bb54..8627b5f56280 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -406,3 +406,5 @@
465 n32 listxattrat sys_listxattrat
466 n32 removexattrat sys_removexattrat
467 n32 open_tree_attr sys_open_tree_attr
+468 n32 lsm_config_self_policy sys_lsm_config_self_policy
+469 n32 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 1e8c44c7b614..813207b61f58 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -382,3 +382,5 @@
465 n64 listxattrat sys_listxattrat
466 n64 removexattrat sys_removexattrat
467 n64 open_tree_attr sys_open_tree_attr
+468 n64 lsm_config_self_policy sys_lsm_config_self_policy
+469 n64 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 114a5a1a6230..9cd0946b4370 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -455,3 +455,5 @@
465 o32 listxattrat sys_listxattrat
466 o32 removexattrat sys_removexattrat
467 o32 open_tree_attr sys_open_tree_attr
+468 o32 lsm_config_self_policy sys_lsm_config_self_policy
+469 o32 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 94df3cb957e9..9db01dd55793 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -466,3 +466,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 9a084bdb8926..97714acb39ab 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -558,3 +558,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index a4569b96ef06..d2b0f14fb516 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -470,3 +470,5 @@
465 common listxattrat sys_listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 52a7652fcff6..210d7118ce16 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -471,3 +471,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 83e45eb6c095..494417d80680 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -513,3 +513,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ac007ea00979..36c2c538e04f 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -473,3 +473,5 @@
465 i386 listxattrat sys_listxattrat
466 i386 removexattrat sys_removexattrat
467 i386 open_tree_attr sys_open_tree_attr
+468 i386 lsm_config_self_policy sys_lsm_config_self_policy
+469 i386 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index cfb5ca41e30d..7eefbccfe531 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -391,6 +391,8 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index f657a77314f8..90d86a54a952 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -438,3 +438,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e5603cc91963..43b53fbd44be 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -988,6 +988,11 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
u32 size, u32 flags);
asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
+asmlinkage long sys_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
+ u32 __user size, u32 common_flags, u32 flags);
+asmlinkage long sys_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
+ u32 __user size, u32 common_flags u32 flags);
+
/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 2892a45023af..021d0689c929 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -851,9 +851,13 @@ __SYSCALL(__NR_listxattrat, sys_listxattrat)
__SYSCALL(__NR_removexattrat, sys_removexattrat)
#define __NR_open_tree_attr 467
__SYSCALL(__NR_open_tree_attr, sys_open_tree_attr)
+#define __NR_lsm_config_self_policy 468
+__SYSCALL(__NR_lsm_config_self_policy, sys_lsm_config_self_policy)
+#define __NR_lsm_config_system_policy 469
+__SYSCALL(__NR_lsm_config_system_policy, sys_lsm_config_system_policy)
#undef __NR_syscalls
-#define __NR_syscalls 468
+#define __NR_syscalls 470
/*
* 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c00a86931f8c..3ecebcd3fbe0 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -172,6 +172,8 @@ COND_SYSCALL_COMPAT(fadvise64_64);
COND_SYSCALL(lsm_get_self_attr);
COND_SYSCALL(lsm_set_self_attr);
COND_SYSCALL(lsm_list_modules);
+COND_SYSCALL(lsm_config_self_policy);
+COND_SYSCALL(lsm_config_system_policy);
/* CONFIG_MMU only */
COND_SYSCALL(swapon);
diff --git a/security/lsm_syscalls.c b/security/lsm_syscalls.c
index 8440948a690c..b02a7623dea6 100644
--- a/security/lsm_syscalls.c
+++ b/security/lsm_syscalls.c
@@ -118,3 +118,15 @@ SYSCALL_DEFINE3(lsm_list_modules, u64 __user *, ids, u32 __user *, size,
return lsm_active_cnt;
}
+
+SYSCALL_DEFINE6(lsm_config_self_policy, u32, lsm_id, u32, op, void __user *,
+ buf, u32 __user, size, u32, common_flags, u32, flags)
+{
+ return 0;
+}
+
+SYSCALL_DEFINE6(lsm_config_system_policy, u32, lsm_id, u32, op, void __user *,
+ buf, u32 __user, size, u32, common_flags, u32, flags)
+{
+ return 0;
+}
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 2892a45023af..021d0689c929 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -851,9 +851,13 @@ __SYSCALL(__NR_listxattrat, sys_listxattrat)
__SYSCALL(__NR_removexattrat, sys_removexattrat)
#define __NR_open_tree_attr 467
__SYSCALL(__NR_open_tree_attr, sys_open_tree_attr)
+#define __NR_lsm_config_self_policy 468
+__SYSCALL(__NR_lsm_config_self_policy, sys_lsm_config_self_policy)
+#define __NR_lsm_config_system_policy 469
+__SYSCALL(__NR_lsm_config_system_policy, sys_lsm_config_system_policy)
#undef __NR_syscalls
-#define __NR_syscalls 468
+#define __NR_syscalls 470
/*
* 32 bit systems traditionally used different
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index cfb5ca41e30d..7eefbccfe531 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -391,6 +391,8 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
#
# Due to a historical design error, certain syscalls are numbered differently
--
2.48.1
^ permalink raw reply related
* [PATCH v6 4/5] SELinux: add support for lsm_config_system_policy
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Enable users to manage SELinux policies through the new hook
lsm_config_system_policy. This feature is restricted to CAP_MAC_ADMIN.
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
security/selinux/hooks.c | 27 +++++++++++++++++++++++++++
security/selinux/include/security.h | 7 +++++++
security/selinux/selinuxfs.c | 16 ++++++++++++----
3 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e7a7dcab81db..3d14d4e47937 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -7196,6 +7196,31 @@ static int selinux_uring_allowed(void)
}
#endif /* CONFIG_IO_URING */
+/**
+ * selinux_lsm_config_system_policy - Manage a LSM policy
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: User-supplied buffer
+ * @size: size of @buf
+ * @flags: reserved for future use; must be zero
+ *
+ * Returns: number of written rules on success, negative value on error
+ */
+static int selinux_lsm_config_system_policy(u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ loff_t pos = 0;
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+
+ if (!selinux_null.dentry || !selinux_null.dentry->d_sb ||
+ !selinux_null.dentry->d_sb->s_fs_info)
+ return -ENODEV;
+
+ return __sel_write_load(selinux_null.dentry->d_sb->s_fs_info, buf, size,
+ &pos);
+}
+
static const struct lsm_id selinux_lsmid = {
.name = "selinux",
.id = LSM_ID_SELINUX,
@@ -7499,6 +7524,8 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
#ifdef CONFIG_PERF_EVENTS
LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
#endif
+ LSM_HOOK_INIT(lsm_config_system_policy, selinux_lsm_config_system_policy),
+
};
static __init int selinux_init(void)
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index e7827ed7be5f..7b779ea43cc3 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -389,7 +389,14 @@ struct selinux_kernel_status {
extern void selinux_status_update_setenforce(bool enforcing);
extern void selinux_status_update_policyload(u32 seqno);
extern void selinux_complete_init(void);
+
+struct selinux_fs_info;
+
extern struct path selinux_null;
+extern ssize_t __sel_write_load(struct selinux_fs_info *fsi,
+ const char __user *buf, size_t count,
+ loff_t *ppos);
+
extern void selnl_notify_setenforce(int val);
extern void selnl_notify_policyload(u32 seqno);
extern int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm);
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 47480eb2189b..1f7e611d8300 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -567,11 +567,11 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
return ret;
}
-static ssize_t sel_write_load(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
+ssize_t __sel_write_load(struct selinux_fs_info *fsi,
+ const char __user *buf, size_t count,
+ loff_t *ppos)
{
- struct selinux_fs_info *fsi;
struct selinux_load_state load_state;
ssize_t length;
void *data = NULL;
@@ -605,7 +605,6 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
pr_warn_ratelimited("SELinux: failed to load policy\n");
goto out;
}
- fsi = file_inode(file)->i_sb->s_fs_info;
length = sel_make_policy_nodes(fsi, load_state.policy);
if (length) {
pr_warn_ratelimited("SELinux: failed to initialize selinuxfs\n");
@@ -626,6 +625,15 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
return length;
}
+static ssize_t sel_write_load(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct selinux_fs_info *fsi = file_inode(file)->i_sb->s_fs_info;
+
+ return __sel_write_load(fsi, buf, count, ppos);
+}
+
+
static const struct file_operations sel_load_ops = {
.write = sel_write_load,
.llseek = generic_file_llseek,
--
2.48.1
^ permalink raw reply related
* Re: [PATCH v5 0/3] lsm: introduce lsm_config_self_policy() and lsm_config_system_policy() syscalls
From: Maxime Bélair @ 2025-10-10 13:34 UTC (permalink / raw)
To: Casey Schaufler, linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, takedakn, penguin-kernel, song, rdunlap,
linux-api, apparmor, linux-kernel
In-Reply-To: <5ae541ce-613f-47c0-8a23-1ec9a0b346cf@schaufler-ca.com>
On 7/9/25 18:48, Casey Schaufler wrote:
> On 7/9/2025 1:00 AM, Maxime Bélair wrote:
>> This patchset introduces two new syscalls: lsm_config_self_policy(),
>> lsm_config_system_policy() and the associated Linux Security Module hooks
>> security_lsm_config_*_policy(), providing a unified interface for loading
>> and managing LSM policies. These syscalls complement the existing per‑LSM
>> pseudo‑filesystem mechanism and work even when those filesystems are not
>> mounted or available.
>>
>> With these new syscalls, users and administrators may lock down access to
>> the pseudo‑filesystem yet still manage LSM policies. Two tightly-scoped
>> entry points then replace the many file operations exposed by those
>> filesystems, significantly reducing the attack surface. This is
>> particularly useful in containers or processes already confined by
>> Landlock, where these pseudo‑filesystems are typically unavailable.
>>
>> Because they provide a logical and unified interface, these syscalls are
>> simpler to use than several heterogeneous pseudo‑filesystems and avoid
>> edge cases such as partially loaded policies. They also eliminates VFS
>> overhead, yielding performance gains notably when many policies are
>> loaded, for instance at boot time.
>>
>> This initial implementation is intentionally minimal to limit the scope
>> of changes. Currently, only policy loading is supported, and only
>> AppArmor registers this LSM hook. However, any LSM can adopt this
>> interface, and future patches could extend this syscall to support more
>> operations, such as replacing, removing, or querying loaded policies.
>
> It would help me be more confident in the interface if you also included
> hooks for SELinux and Smack. The API needs to be general enough to support
> SELinux's atomic policy load, Smack's atomic and incremental load options,
> and Smack's self rule loads. I really don't want to have to implement
> lsm_config_self_policy2() when I decide to us it for Smack.
>
I provided a minimal initial implementation for SELinux and Smack in v6.
For SELinux, I implemented only lsm_config_system_policy, which
currently allows to load policies with this syscall.
For Smack, I supported both hooks, allowing modification of both global
and subject rules. However since modifying even the subject rules is a
privileged operation, both operation are limited to CAP_MAC_ADMIN.
If we could ensure that the new rules only further restrict capabilities,
we could allow to load subject rules with fewer privileges.
>>
>> Landlock already provides three Landlock‑specific syscalls (e.g.
>> landlock_add_rule()) to restrict ambient rights for sets of processes
>> without touching any pseudo-filesystem. lsm_config_*_policy() generalizes
>> that approach to the entire LSM layer, so any module can choose to
>> support either or both of these syscalls, and expose its policy
>> operations through a uniform interface and reap the advantages outlined
>> above.
>>
>> This patchset is available at [1], a minimal user space example
>> showing how to use lsm_config_system_policy with AppArmor is at [2] and a
>> performance benchmark of both syscalls is available at [3].
>>
>> [1] https://github.com/emixam16/linux/tree/lsm_syscall
>> [2] https://gitlab.com/emixam16/apparmor/tree/lsm_syscall
>> [3] https://gitlab.com/-/snippets/4864908
>>
>> ---
>> Changes in v5
>> - Improve syscall input verification
>> - Do not export security_lsm_config_*_policy symbols
>>
>> Changes in v4
>> - Make the syscall's maximum buffer size defined per module
>> - Fix a memory leak
>>
>> Changes in v3
>> - Fix typos
>>
>> Changes in v2
>> - Split lsm_manage_policy() into two distinct syscalls:
>> lsm_config_self_policy() and lsm_config_system_policy()
>> - The LSM hook now calls only the appropriate LSM (and not all LSMs)
>> - Add a configuration variable to limit the buffer size of these
>> syscalls
>> - AppArmor now allows stacking policies through lsm_config_self_policy()
>> and loading policies in any namespace through
>> lsm_config_system_policy()
>> ---
>>
>> Maxime Bélair (3):
>> Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
>> lsm: introduce security_lsm_config_*_policy hooks
>> AppArmor: add support for lsm_config_self_policy and
>> lsm_config_system_policy
>>
>> arch/alpha/kernel/syscalls/syscall.tbl | 2 +
>> arch/arm/tools/syscall.tbl | 2 +
>> arch/m68k/kernel/syscalls/syscall.tbl | 2 +
>> arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
>> arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
>> arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
>> arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
>> arch/parisc/kernel/syscalls/syscall.tbl | 2 +
>> arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
>> arch/s390/kernel/syscalls/syscall.tbl | 2 +
>> arch/sh/kernel/syscalls/syscall.tbl | 2 +
>> arch/sparc/kernel/syscalls/syscall.tbl | 2 +
>> arch/x86/entry/syscalls/syscall_32.tbl | 2 +
>> arch/x86/entry/syscalls/syscall_64.tbl | 2 +
>> arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
>> include/linux/lsm_hook_defs.h | 4 +
>> include/linux/security.h | 20 +++++
>> include/linux/syscalls.h | 5 ++
>> include/uapi/asm-generic/unistd.h | 6 +-
>> include/uapi/linux/lsm.h | 8 ++
>> kernel/sys_ni.c | 2 +
>> security/apparmor/apparmorfs.c | 31 +++++++
>> security/apparmor/include/apparmor.h | 4 +
>> security/apparmor/include/apparmorfs.h | 3 +
>> security/apparmor/lsm.c | 84 +++++++++++++++++++
>> security/lsm_syscalls.c | 25 ++++++
>> security/security.c | 60 +++++++++++++
>> tools/include/uapi/asm-generic/unistd.h | 6 +-
>> .../arch/x86/entry/syscalls/syscall_64.tbl | 2 +
>> 29 files changed, 288 insertions(+), 2 deletions(-)
>>
>>
>> base-commit: 9c32cda43eb78f78c73aee4aa344b777714e259b
^ permalink raw reply
* Re: [PATCH v5 2/3] lsm: introduce security_lsm_config_*_policy hooks
From: Maxime Bélair @ 2025-10-10 13:32 UTC (permalink / raw)
To: Mickaël Salaün
Cc: linux-security-module, john.johansen, paul, jmorris, serge, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel
In-Reply-To: <20250820.Ao3iquoshaiB@digikod.net>
On 8/20/25 16:21, Mickaël Salaün wrote:
> On Wed, Jul 09, 2025 at 10:00:55AM +0200, Maxime Bélair wrote:
>> Define two new LSM hooks: security_lsm_config_self_policy and
>> security_lsm_config_system_policy and wire them into the corresponding
>> lsm_config_*_policy() syscalls so that LSMs can register a unified
>> interface for policy management. This initial, minimal implementation
>> only supports the LSM_POLICY_LOAD operation to limit changes.
>>
>> Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
>> ---
>> include/linux/lsm_hook_defs.h | 4 +++
>> include/linux/security.h | 20 ++++++++++++
>> include/uapi/linux/lsm.h | 8 +++++
>> security/lsm_syscalls.c | 17 ++++++++--
>> security/security.c | 60 +++++++++++++++++++++++++++++++++++
>> 5 files changed, 107 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
>> index bf3bbac4e02a..fca490444643 100644
>> --- a/include/linux/lsm_hook_defs.h
>> +++ b/include/linux/lsm_hook_defs.h
>> @@ -464,3 +464,7 @@ LSM_HOOK(int, 0, bdev_alloc_security, struct block_device *bdev)
>> LSM_HOOK(void, LSM_RET_VOID, bdev_free_security, struct block_device *bdev)
>> LSM_HOOK(int, 0, bdev_setintegrity, struct block_device *bdev,
>> enum lsm_integrity_type type, const void *value, size_t size)
>> +LSM_HOOK(int, -EINVAL, lsm_config_self_policy, u32 lsm_id, u32 op,
>> + void __user *buf, size_t size, u32 flags)
>> +LSM_HOOK(int, -EINVAL, lsm_config_system_policy, u32 lsm_id, u32 op,
>> + void __user *buf, size_t size, u32 flags)
>> diff --git a/include/linux/security.h b/include/linux/security.h
>> index cc9b54d95d22..54acaee4a994 100644
>> --- a/include/linux/security.h
>> +++ b/include/linux/security.h
>> @@ -581,6 +581,11 @@ void security_bdev_free(struct block_device *bdev);
>> int security_bdev_setintegrity(struct block_device *bdev,
>> enum lsm_integrity_type type, const void *value,
>> size_t size);
>> +int security_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
>> + size_t size, u32 flags);
>> +int security_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
>> + size_t size, u32 flags);
>> +
>> #else /* CONFIG_SECURITY */
>>
>> /**
>> @@ -1603,6 +1608,21 @@ static inline int security_bdev_setintegrity(struct block_device *bdev,
>> return 0;
>> }
>>
>> +static inline int security_lsm_config_self_policy(u32 lsm_id, u32 op,
>> + void __user *buf,
>> + size_t size, u32 flags)
>> +{
>> +
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static inline int security_lsm_config_system_policy(u32 lsm_id, u32 op,
>> + void __user *buf,
>> + size_t size, u32 flags)
>> +{
>> +
>> + return -EOPNOTSUPP;
>> +}
>> #endif /* CONFIG_SECURITY */
>>
>> #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
>> diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
>> index 938593dfd5da..2b9432a30cdc 100644
>> --- a/include/uapi/linux/lsm.h
>> +++ b/include/uapi/linux/lsm.h
>> @@ -90,4 +90,12 @@ struct lsm_ctx {
>> */
>> #define LSM_FLAG_SINGLE 0x0001
>>
>> +/*
>> + * LSM_POLICY_XXX definitions identify the different operations
>> + * to configure LSM policies
>> + */
>> +
>> +#define LSM_POLICY_UNDEF 0
>> +#define LSM_POLICY_LOAD 100
>
> Why the gap between 0 and 100?
>
>> +
>> #endif /* _UAPI_LINUX_LSM_H */
>> diff --git a/security/lsm_syscalls.c b/security/lsm_syscalls.c
>> index a3cb6dab8102..dd016ba6976c 100644
>> --- a/security/lsm_syscalls.c
>> +++ b/security/lsm_syscalls.c
>> @@ -122,11 +122,24 @@ SYSCALL_DEFINE3(lsm_list_modules, u64 __user *, ids, u32 __user *, size,
>> SYSCALL_DEFINE5(lsm_config_self_policy, u32, lsm_id, u32, op, void __user *,
>> buf, u32 __user *, size, u32, flags)
>
> Given these are a multiplexor syscalls, I'm wondering if they should not
> have common flags and LSM-specific flags. Alternatively, the op
> argument could also contains some optional flags. In either case, the
> documentation should guide LSM developers for flags that may be shared
> amongst LSMs.
>
> Examples of such flags could be to restrict the whole process instead of
> the calling thread.
>
Indeed, in v6 I used both common_flags and flags. For now I didn't
support any of them to keep this patchset simple but we could discuss
which flags we want to support.
>> {
>> - return 0;
>> + size_t usize;
>> +
>> + if (get_user(usize, size))
>
> Size should just be u32, not a pointer.
Indeed
>
>> + return -EFAULT;
>> +
>> + return security_lsm_config_self_policy(lsm_id, op, buf, usize, flags);
>> }
>>
>> SYSCALL_DEFINE5(lsm_config_system_policy, u32, lsm_id, u32, op, void __user *,
>> buf, u32 __user *, size, u32, flags)
>> {
>> - return 0;
>> + size_t usize;
>> +
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>
> I like this mandatory capability check for this specific syscall. This
> makes the semantic clearer. However, to avoid the superpower of
> CAP_SYS_ADMIN, I'm wondering how we could use the CAP_MAC_ADMIN instead.
> This syscall could require CAP_MAC_ADMIN, and current LSMs (relying on a
> filesystem interface for policy configuration) could also enforce
> CAP_SYS_ADMIN for compatibility reasons.
I agree and lsm_config_system_policy is now restricted to CAP_MAC_ADMIN
in v6.
>
> In fact, this "system" syscall could be a "namespace" syscall, which
> would take a security/LSM namespace file descriptor as argument. If the
> namespace is not the initial namespace, any CAP_SYS_ADMIN implemented by
> current LSMs could be avoided. See
> https://lore.kernel.org/r/CAHC9VhRGMmhxbajwQNfGFy+ZFF1uN=UEBjqQZQ4UBy7yds3eVQ@mail.gmail.com
I would appreciate additional feedback on the best way to handle
namespaces for this syscall.
Possible approaches include:
- Passing a value in buf (as I did patch v6 3/5 for AppArmor). This is
simple and let individual LSM handle namespaces as see fit. However,
it may slightly complicate the policy format.
- Passing a file descriptor as a syscall argument. This offers a cleaner
interface but couples the pseudofs to this syscall, reducing some of
its advantages.
- Providing no support for namespaces at this time.
I tend to prefer the first approach here but I'm open to suggestions
>
>> +
>> + if (get_user(usize, size))
>
> ditto
>
>> + return -EFAULT;
>> +
>> + return security_lsm_config_system_policy(lsm_id, op, buf, usize, flags);
>> }
>> diff --git a/security/security.c b/security/security.c
>> index fb57e8fddd91..166d7d9936d0 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -5883,6 +5883,66 @@ int security_bdev_setintegrity(struct block_device *bdev,
>> }
>> EXPORT_SYMBOL(security_bdev_setintegrity);
>>
>> +/**
>> + * security_lsm_config_self_policy() - Configure caller's LSM policies
>> + * @lsm_id: id of the LSM to target
>> + * @op: Operation to perform (one of the LSM_POLICY_XXX values)
>> + * @buf: userspace pointer to policy data
>> + * @size: size of @buf
>> + * @flags: lsm policy configuration flags
>> + *
>> + * Configure the policies of a LSM for the current domain/user. This notably
>> + * allows to update them even when the lsmfs is unavailable or restricted.
>> + * Currently, only LSM_POLICY_LOAD is supported.
>> + *
>> + * Return: Returns 0 on success, error on failure.
>> + */
>> +int security_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
>> + size_t size, u32 flags)
>> +{
>> + int rc = LSM_RET_DEFAULT(lsm_config_self_policy);
>> + struct lsm_static_call *scall;
>> +
>> + lsm_for_each_hook(scall, lsm_config_self_policy) {
>> + if ((scall->hl->lsmid->id) == lsm_id) {
>> + rc = scall->hl->hook.lsm_config_self_policy(lsm_id, op, buf, size, flags);
>
> The lsm_id should not be passed to the hook.
Indeed
>
> The LSM syscall should manage the argument copy and buffer allocation
> instead of duplicating this code in each LSM hook implementation (see
> other LSM syscalls).
I get your point but methods used internally by LSMs already handle the
allocation themselves through a char __user * parameter.
- smack: smk_write_rules_list
- selinux: sel_write_load
- apparmor: policy_update
Hence, I think that it's actually better to let LSMs handle allocations
>
>> + break;
>> + }
>> + }
>> +
>> + return rc;
>> +}
>> +
>> +/**
>> + * security_lsm_config_system_policy() - Configure system LSM policies
>> + * @lsm_id: id of the lsm to target
>> + * @op: Operation to perform (one of the LSM_POLICY_XXX values)
>> + * @buf: userspace pointer to policy data
>> + * @size: size of @buf
>> + * @flags: lsm policy configuration flags
>> + *
>> + * Configure the policies of a LSM for the whole system. This notably allows
>> + * to update them even when the lsmfs is unavailable or restricted. Currently,
>> + * only LSM_POLICY_LOAD is supported.
>> + *
>> + * Return: Returns 0 on success, error on failure.
>> + */
>> +int security_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
>> + size_t size, u32 flags)
>> +{
>> + int rc = LSM_RET_DEFAULT(lsm_config_system_policy);
>> + struct lsm_static_call *scall;
>> +
>> + lsm_for_each_hook(scall, lsm_config_system_policy) {
>> + if ((scall->hl->lsmid->id) == lsm_id) {
>> + rc = scall->hl->hook.lsm_config_system_policy(lsm_id, op, buf, size, flags);
>
> ditto
>
>> + break;
>> + }
>> + }
>> +
>> + return rc;
>> +}
>> +
>> #ifdef CONFIG_PERF_EVENTS
>> /**
>> * security_perf_event_open() - Check if a perf event open is allowed
>> --
>> 2.48.1
>>
>>
^ permalink raw reply
* [PATCH v6 5/5] Smack: add support for lsm_config_self_policy and lsm_config_system_policy
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Enable users to manage Smack policies through the new hooks
lsm_config_self_policy and lsm_config_system_policy.
lsm_config_self_policy allows adding Smack policies for the current cred.
For now it remains restricted to CAP_MAC_ADMIN.
lsm_config_system_policy allows adding globabl Smack policies. This is
restricted to CAP_MAC_ADMIN.
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
security/smack/smack.h | 8 +++++
security/smack/smack_lsm.c | 73 ++++++++++++++++++++++++++++++++++++++
security/smack/smackfs.c | 2 +-
3 files changed, 82 insertions(+), 1 deletion(-)
diff --git a/security/smack/smack.h b/security/smack/smack.h
index bf6a6ed3946c..3e3d30dfdcf7 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -275,6 +275,14 @@ struct smk_audit_info {
#endif
};
+/*
+ * This function is in smackfs.c
+ */
+ssize_t smk_write_rules_list(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos,
+ struct list_head *rule_list,
+ struct mutex *rule_lock, int format);
+
/*
* These functions are in smack_access.c
*/
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 99833168604e..bf4bb2242768 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -5027,6 +5027,76 @@ static int smack_uring_cmd(struct io_uring_cmd *ioucmd)
#endif /* CONFIG_IO_URING */
+/**
+ * smack_lsm_config_system_policy - Configure a system smack policy
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: User-supplied buffer in the form "<fmt><policy>"
+ * <fmt> is the 1-byte format of <policy>
+ * <policy> is the policy to load
+ * @size: size of @buf
+ * @flags: reserved for future use; must be zero
+ *
+ * Returns: number of written rules on success, negative value on error
+ */
+static int smack_lsm_config_system_policy(u32 op, void __user *buf, size_t size,
+ u32 flags)
+{
+ loff_t pos = 0;
+ u8 fmt;
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+
+ if (size < 2)
+ return -EINVAL;
+
+ if (get_user(fmt, (uint8_t *)buf))
+ return -EFAULT;
+
+ return smk_write_rules_list(NULL, buf + 1, size - 1, &pos, NULL, NULL, fmt);
+}
+
+/**
+ * smack_lsm_config_self_policy - Configure a smack policy for the current cred
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: User-supplied buffer in the form "<fmt><policy>"
+ * <fmt> is the 1-byte format of <policy>
+ * <policy> is the policy to load
+ * @size: size of @buf
+ * @flags: reserved for future use; must be zero
+ *
+ * Returns: number of written rules on success, negative value on error
+ */
+static int smack_lsm_config_self_policy(u32 op, void __user *buf, size_t size,
+ u32 flags)
+{
+ loff_t pos = 0;
+ u8 fmt;
+ struct task_smack *tsp;
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+
+ if (size < 2)
+ return -EINVAL;
+
+ if (get_user(fmt, (uint8_t *)buf))
+ return -EFAULT;
+ /**
+ * smk_write_rules_list could be used to gain privileges.
+ * This function is thus restricted to CAP_MAC_ADMIN.
+ * TODO: Ensure that the new rule does not give extra privileges
+ * before dropping this CAP_MAC_ADMIN check.
+ */
+ if (!capable(CAP_MAC_ADMIN))
+ return -EPERM;
+
+
+ tsp = smack_cred(current_cred());
+ return smk_write_rules_list(NULL, buf + 1, size - 1, &pos, &tsp->smk_rules,
+ &tsp->smk_rules_lock, fmt);
+}
+
struct lsm_blob_sizes smack_blob_sizes __ro_after_init = {
.lbs_cred = sizeof(struct task_smack),
.lbs_file = sizeof(struct smack_known *),
@@ -5203,6 +5273,9 @@ static struct security_hook_list smack_hooks[] __ro_after_init = {
LSM_HOOK_INIT(uring_sqpoll, smack_uring_sqpoll),
LSM_HOOK_INIT(uring_cmd, smack_uring_cmd),
#endif
+ LSM_HOOK_INIT(lsm_config_self_policy, smack_lsm_config_self_policy),
+ LSM_HOOK_INIT(lsm_config_system_policy, smack_lsm_config_system_policy),
+
};
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 90a67e410808..ed1814588d56 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -441,7 +441,7 @@ static ssize_t smk_parse_long_rule(char *data, struct smack_parsed_rule *rule,
* "subject<whitespace>object<whitespace>
* acc_enable<whitespace>acc_disable[<whitespace>...]"
*/
-static ssize_t smk_write_rules_list(struct file *file, const char __user *buf,
+ssize_t smk_write_rules_list(struct file *file, const char __user *buf,
size_t count, loff_t *ppos,
struct list_head *rule_list,
struct mutex *rule_lock, int format)
--
2.48.1
^ permalink raw reply related
* Re: [PATCH v4 00/30] Live Update Orchestrator
From: Pasha Tatashin @ 2025-10-10 12:45 UTC (permalink / raw)
To: Pratyush Yadav
Cc: jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes, corbet,
rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
parav, leonro, witu, hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <mafs0ms5zn0nm.fsf@kernel.org>
On Thu, Oct 9, 2025 at 6:58 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Tue, Oct 07 2025, Pasha Tatashin wrote:
>
> > On Sun, Sep 28, 2025 at 9:03 PM Pasha Tatashin
> > <pasha.tatashin@soleen.com> wrote:
> >>
> [...]
> > 4. New File-Lifecycle-Bound Global State
> > ----------------------------------------
> > A new mechanism for managing global state was proposed, designed to be
> > tied to the lifecycle of the preserved files themselves. This would
> > allow a file owner (e.g., the IOMMU subsystem) to save and retrieve
> > global state that is only relevant when one or more of its FDs are
> > being managed by LUO.
>
> Is this going to replace LUO subsystems? If yes, then why? The global
> state will likely need to have its own lifecycle just like the FDs, and
> subsystems are a simple and clean abstraction to control that. I get the
> idea of only "activating" a subsystem when one or more of its FDs are
> participating in LUO, but we can do that while keeping subsystems
> around.
>
> >
> > The key characteristics of this new mechanism are:
> > The global state is optionally created on the first preserve() call
> > for a given file handler.
> > The state can be updated on subsequent preserve() calls.
> > The state is destroyed when the last corresponding file is unpreserved
> > or finished.
> > The data can be accessed during boot.
> >
> > I am thinking of an API like this.
> >
> > 1. Add three more callbacks to liveupdate_file_ops:
> > /*
> > * Optional. Called by LUO during first get global state call.
> > * The handler should allocate/KHO preserve its global state object and return a
> > * pointer to it via 'obj'. It must also provide a u64 handle (e.g., a physical
> > * address of preserved memory) via 'data_handle' that LUO will save.
> > * Return: 0 on success.
> > */
> > int (*global_state_create)(struct liveupdate_file_handler *h,
> > void **obj, u64 *data_handle);
> >
> > /*
> > * Optional. Called by LUO in the new kernel
> > * before the first access to the global state. The handler receives
> > * the preserved u64 data_handle and should use it to reconstruct its
> > * global state object, returning a pointer to it via 'obj'.
> > * Return: 0 on success.
> > */
> > int (*global_state_restore)(struct liveupdate_file_handler *h,
> > u64 data_handle, void **obj);
> >
> > /*
> > * Optional. Called by LUO after the last
> > * file for this handler is unpreserved or finished. The handler
> > * must free its global state object and any associated resources.
> > */
> > void (*global_state_destroy)(struct liveupdate_file_handler *h, void *obj);
> >
> > The get/put global state data:
> >
> > /* Get and lock the data with file_handler scoped lock */
> > int liveupdate_fh_global_state_get(struct liveupdate_file_handler *h,
> > void **obj);
> >
> > /* Unlock the data */
> > void liveupdate_fh_global_state_put(struct liveupdate_file_handler *h);
>
> IMHO this looks clunky and overcomplicated. Each LUO FD type knows what
> its subsystem is. It should talk to it directly. I don't get why we are
> adding this intermediate step.
>
> Here is how I imagine the proposed API would compare against subsystems
> with hugetlb as an example (hugetlb support is still WIP, so I'm still
> not clear on specifics, but this is how I imagine it will work):
>
> - Hugetlb subsystem needs to track its huge page pools and which pages
> are allocated and free. This is its global state. The pools get
> reconstructed after kexec. Post-kexec, the free pages are ready for
> allocation from other "regular" files and the pages used in LUO files
> are reserved.
Thinking more about this, HugeTLB is different from iommufd/iommu-core
vfiofd/pci because it supports many types of FDs, such as memfd and
guest_memfd (1G support is coming soon!). Also, since not all memfds
or guest_memfd instances require HugeTLB, binding their lifecycles to
HugeTLB doesn't make sense here. I agree that a subsystem is more
appropriate for this use case.
Pasha
^ permalink raw reply
* Re: [PATCH 0/2] Fix to EOPNOTSUPP double conversion in ioctl_setflags()
From: Christian Brauner @ 2025-10-10 11:47 UTC (permalink / raw)
To: linux-api, linux-fsdevel, linux-kernel, linux-xfs,
Andrey Albershteyn
Cc: Christian Brauner, Jan Kara, Jiri Slaby, Arnd Bergmann,
Andrey Albershteyn
In-Reply-To: <20251008-eopnosupp-fix-v1-0-5990de009c9f@kernel.org>
On Wed, 08 Oct 2025 14:44:16 +0200, Andrey Albershteyn wrote:
> Revert original double conversion patch from ENOIOCTLCMD to EOPNOSUPP for
> vfs_fileattr_get and vfs_fileattr_set. Instead, convert ENOIOCTLCMD only
> where necessary.
>
> To: linux-api@vger.kernel.org
> To: linux-fsdevel@vger.kernel.org
> To: linux-kernel@vger.kernel.org
> To: linux-xfs@vger.kernel.org,
> Cc: "Jan Kara" <jack@suse.cz>
> Cc: "Jiri Slaby" <jirislaby@kernel.org>
> Cc: "Christian Brauner" <brauner@kernel.org>
> Cc: "Arnd Bergmann" <arnd@arndb.de>
>
> [...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/2] Revert "fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP"
https://git.kernel.org/vfs/vfs/c/4dd5b5ac089b
[2/2] fs: return EOPNOTSUPP from file_setattr/file_getattr syscalls
https://git.kernel.org/vfs/vfs/c/d90ad28e8aa4
^ permalink raw reply
* Re: [PATCH 2/2] fs: return EOPNOTSUPP from file_setattr/file_getattr syscalls
From: Christian Brauner @ 2025-10-10 11:45 UTC (permalink / raw)
To: Andrey Albershteyn
Cc: Darrick J. Wong, linux-api, linux-fsdevel, linux-kernel,
linux-xfs, Jan Kara, Jiri Slaby, Arnd Bergmann,
Andrey Albershteyn
In-Reply-To: <q6phvrrl2fumjwwd66d5glauch76uca4rr5pkvl2dwaxzx62bm@sjcixwa7r6r5>
On Fri, Oct 10, 2025 at 12:05:04PM +0200, Andrey Albershteyn wrote:
> On 2025-10-09 10:20:41, Darrick J. Wong wrote:
> > On Wed, Oct 08, 2025 at 02:44:18PM +0200, Andrey Albershteyn wrote:
> > > These syscalls call to vfs_fileattr_get/set functions which return
> > > ENOIOCTLCMD if filesystem doesn't support setting file attribute on an
> > > inode. For syscalls EOPNOTSUPP would be more appropriate return error.
> > >
> > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > ---
> > > fs/file_attr.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/fs/file_attr.c b/fs/file_attr.c
> > > index 460b2dd21a85..5e3e2aba97b5 100644
> > > --- a/fs/file_attr.c
> > > +++ b/fs/file_attr.c
> > > @@ -416,6 +416,8 @@ SYSCALL_DEFINE5(file_getattr, int, dfd, const char __user *, filename,
> > > }
> > >
> > > error = vfs_fileattr_get(filepath.dentry, &fa);
> > > + if (error == -ENOIOCTLCMD)
> >
> > Hrm. Back in 6.17, XFS would return ENOTTY if you called ->fileattr_get
> > on a special file:
> >
> > int
> > xfs_fileattr_get(
> > struct dentry *dentry,
> > struct file_kattr *fa)
> > {
> > struct xfs_inode *ip = XFS_I(d_inode(dentry));
> >
> > if (d_is_special(dentry))
> > return -ENOTTY;
> > ...
> > }
> >
> > Given that there are other fileattr_[gs]et implementations out there
> > that might return ENOTTY (e.g. fuse servers and other externally
> > maintained filesystems), I think both syscall functions need to check
> > for that as well:
> >
> > if (error == -ENOIOCTLCMD || error == -ENOTTY)
> > return -EOPNOTSUPP;
>
> Make sense (looks like ubifs, jfs and gfs2 also return ENOTTY for
> special files), I haven't found ENOTTY being used for anything else
> there
I'm folding this in.
^ permalink raw reply
* Re: [PATCH 2/2] fs: return EOPNOTSUPP from file_setattr/file_getattr syscalls
From: Andrey Albershteyn @ 2025-10-10 10:05 UTC (permalink / raw)
To: Darrick J. Wong
Cc: linux-api, linux-fsdevel, linux-kernel, linux-xfs, Jan Kara,
Jiri Slaby, Christian Brauner, Arnd Bergmann, Andrey Albershteyn
In-Reply-To: <20251009172041.GA6174@frogsfrogsfrogs>
On 2025-10-09 10:20:41, Darrick J. Wong wrote:
> On Wed, Oct 08, 2025 at 02:44:18PM +0200, Andrey Albershteyn wrote:
> > These syscalls call to vfs_fileattr_get/set functions which return
> > ENOIOCTLCMD if filesystem doesn't support setting file attribute on an
> > inode. For syscalls EOPNOTSUPP would be more appropriate return error.
> >
> > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > ---
> > fs/file_attr.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/file_attr.c b/fs/file_attr.c
> > index 460b2dd21a85..5e3e2aba97b5 100644
> > --- a/fs/file_attr.c
> > +++ b/fs/file_attr.c
> > @@ -416,6 +416,8 @@ SYSCALL_DEFINE5(file_getattr, int, dfd, const char __user *, filename,
> > }
> >
> > error = vfs_fileattr_get(filepath.dentry, &fa);
> > + if (error == -ENOIOCTLCMD)
>
> Hrm. Back in 6.17, XFS would return ENOTTY if you called ->fileattr_get
> on a special file:
>
> int
> xfs_fileattr_get(
> struct dentry *dentry,
> struct file_kattr *fa)
> {
> struct xfs_inode *ip = XFS_I(d_inode(dentry));
>
> if (d_is_special(dentry))
> return -ENOTTY;
> ...
> }
>
> Given that there are other fileattr_[gs]et implementations out there
> that might return ENOTTY (e.g. fuse servers and other externally
> maintained filesystems), I think both syscall functions need to check
> for that as well:
>
> if (error == -ENOIOCTLCMD || error == -ENOTTY)
> return -EOPNOTSUPP;
Make sense (looks like ubifs, jfs and gfs2 also return ENOTTY for
special files), I haven't found ENOTTY being used for anything else
there
--
- Andrey
^ permalink raw reply
* [PATCH v2 3/3] init: remove /proc/sys/kernel/real-root-dev
From: Askar Safin @ 2025-10-10 9:40 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251010094047.3111495-1-safinaskar@gmail.com>
It is not used anymore
Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
Documentation/admin-guide/sysctl/kernel.rst | 6 ------
include/uapi/linux/sysctl.h | 1 -
init/do_mounts_initrd.c | 20 --------------------
3 files changed, 27 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 8b49eab937d0..cc958c228bc2 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1215,12 +1215,6 @@ that support this feature.
== ===========================================================================
-real-root-dev
-=============
-
-See Documentation/admin-guide/initrd.rst.
-
-
reboot-cmd (SPARC only)
=======================
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 63d1464cb71c..1c7fe0f4dca4 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -92,7 +92,6 @@ enum
KERN_DOMAINNAME=8, /* string: domainname */
KERN_PANIC=15, /* int: panic timeout */
- KERN_REALROOTDEV=16, /* real root device to mount after initrd */
KERN_SPARC_REBOOT=21, /* reboot command on Sparc */
KERN_CTLALTDEL=22, /* int: allow ctl-alt-del to reboot */
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index d4f5f4c60a22..fb0c9d3b722f 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -8,31 +8,11 @@
unsigned long initrd_start, initrd_end;
int initrd_below_start_ok;
-static unsigned int real_root_dev; /* do_proc_dointvec cannot handle kdev_t */
static int __initdata mount_initrd = 1;
phys_addr_t phys_initrd_start __initdata;
unsigned long phys_initrd_size __initdata;
-#ifdef CONFIG_SYSCTL
-static const struct ctl_table kern_do_mounts_initrd_table[] = {
- {
- .procname = "real-root-dev",
- .data = &real_root_dev,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
-};
-
-static __init int kernel_do_mounts_initrd_sysctls_init(void)
-{
- register_sysctl_init("kernel", kern_do_mounts_initrd_table);
- return 0;
-}
-late_initcall(kernel_do_mounts_initrd_sysctls_init);
-#endif /* CONFIG_SYSCTL */
-
static int __init no_initrd(char *str)
{
pr_warn("noinitrd option is deprecated and will be removed soon\n");
--
2.47.3
^ permalink raw reply related
* [PATCH v2 2/3] initrd: remove deprecated code path (linuxrc)
From: Askar Safin @ 2025-10-10 9:40 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251010094047.3111495-1-safinaskar@gmail.com>
Remove linuxrc initrd code path, which was deprecated in 2020.
Initramfs and (non-initial) RAM disks (i. e. brd) still work.
Both built-in and bootloader-supplied initramfs still work.
Non-linuxrc initrd code path (i. e. using /dev/ram as final root
filesystem) still works, but I put deprecation message into it
Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
.../admin-guide/kernel-parameters.txt | 4 +-
fs/init.c | 14 ---
include/linux/init_syscalls.h | 1 -
include/linux/initrd.h | 2 -
init/do_mounts.c | 4 +-
init/do_mounts.h | 18 +---
init/do_mounts_initrd.c | 85 ++-----------------
init/do_mounts_rd.c | 17 +---
8 files changed, 17 insertions(+), 128 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 521ab3425504..24d8899d8a39 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4285,7 +4285,7 @@
Note that this argument takes precedence over
the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
- noinitrd [RAM] Tells the kernel not to load any configured
+ noinitrd [Deprecated,RAM] Tells the kernel not to load any configured
initial RAM disk.
nointremap [X86-64,Intel-IOMMU,EARLY] Do not enable interrupt
@@ -5299,7 +5299,7 @@
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/admin-guide/blockdev/ramdisk.rst.
- ramdisk_start= [RAM] RAM disk image start address
+ ramdisk_start= [Deprecated,RAM] RAM disk image start address
random.trust_cpu=off
[KNL,EARLY] Disable trusting the use of the CPU's
diff --git a/fs/init.c b/fs/init.c
index 07f592ccdba8..60719494d9a0 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -27,20 +27,6 @@ int __init init_mount(const char *dev_name, const char *dir_name,
return ret;
}
-int __init init_umount(const char *name, int flags)
-{
- int lookup_flags = LOOKUP_MOUNTPOINT;
- struct path path;
- int ret;
-
- if (!(flags & UMOUNT_NOFOLLOW))
- lookup_flags |= LOOKUP_FOLLOW;
- ret = kern_path(name, lookup_flags, &path);
- if (ret)
- return ret;
- return path_umount(&path, flags);
-}
-
int __init init_chdir(const char *filename)
{
struct path path;
diff --git a/include/linux/init_syscalls.h b/include/linux/init_syscalls.h
index 92045d18cbfc..0bdbc458a881 100644
--- a/include/linux/init_syscalls.h
+++ b/include/linux/init_syscalls.h
@@ -2,7 +2,6 @@
int __init init_mount(const char *dev_name, const char *dir_name,
const char *type_page, unsigned long flags, void *data_page);
-int __init init_umount(const char *name, int flags);
int __init init_chdir(const char *filename);
int __init init_chroot(const char *filename);
int __init init_chown(const char *filename, uid_t user, gid_t group, int flags);
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index f1a1f4c92ded..7e5d26c8136f 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -3,8 +3,6 @@
#ifndef __LINUX_INITRD_H
#define __LINUX_INITRD_H
-#define INITRD_MINOR 250 /* shouldn't collide with /dev/ram* too soon ... */
-
/* starting block # of image */
extern int rd_image_start;
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 0f2f44e6250c..1054ad3c905a 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -476,13 +476,11 @@ void __init prepare_namespace(void)
if (saved_root_name[0])
ROOT_DEV = parse_root_device(saved_root_name);
- if (initrd_load(saved_root_name))
- goto out;
+ initrd_load();
if (root_wait)
wait_for_root(saved_root_name);
mount_root(saved_root_name);
-out:
devtmpfs_mount();
init_mount(".", "/", NULL, MS_MOVE, NULL);
init_chroot(".");
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 6069ea3eb80d..a386ee5314c9 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -23,25 +23,15 @@ static inline __init int create_dev(char *name, dev_t dev)
}
#ifdef CONFIG_BLK_DEV_RAM
-
-int __init rd_load_disk(int n);
-int __init rd_load_image(char *from);
-
+int __init rd_load_image(void);
#else
-
-static inline int rd_load_disk(int n) { return 0; }
-static inline int rd_load_image(char *from) { return 0; }
-
+static inline int rd_load_image(void) { return 0; }
#endif
#ifdef CONFIG_BLK_DEV_INITRD
-bool __init initrd_load(char *root_device_name);
+void __init initrd_load(void);
#else
-static inline bool initrd_load(char *root_device_name)
-{
- return false;
- }
-
+static inline void initrd_load(void) { }
#endif
/* Ensure that async file closing finished to prevent spurious errors. */
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index f6867bad0d78..d4f5f4c60a22 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -2,13 +2,7 @@
#include <linux/unistd.h>
#include <linux/kernel.h>
#include <linux/fs.h>
-#include <linux/minix_fs.h>
-#include <linux/romfs_fs.h>
#include <linux/initrd.h>
-#include <linux/sched.h>
-#include <linux/freezer.h>
-#include <linux/kmod.h>
-#include <uapi/linux/mount.h>
#include "do_mounts.h"
@@ -41,6 +35,7 @@ late_initcall(kernel_do_mounts_initrd_sysctls_init);
static int __init no_initrd(char *str)
{
+ pr_warn("noinitrd option is deprecated and will be removed soon\n");
mount_initrd = 0;
return 1;
}
@@ -70,85 +65,17 @@ static int __init early_initrd(char *p)
}
early_param("initrd", early_initrd);
-static int __init init_linuxrc(struct subprocess_info *info, struct cred *new)
-{
- ksys_unshare(CLONE_FS | CLONE_FILES);
- console_on_rootfs();
- /* move initrd over / and chdir/chroot in initrd root */
- init_chdir("/root");
- init_mount(".", "/", NULL, MS_MOVE, NULL);
- init_chroot(".");
- ksys_setsid();
- return 0;
-}
-
-static void __init handle_initrd(char *root_device_name)
-{
- struct subprocess_info *info;
- static char *argv[] = { "linuxrc", NULL, };
- extern char *envp_init[];
- int error;
-
- pr_warn("using deprecated initrd support, will be removed soon.\n");
-
- real_root_dev = new_encode_dev(ROOT_DEV);
- create_dev("/dev/root.old", Root_RAM0);
- /* mount initrd on rootfs' /root */
- mount_root_generic("/dev/root.old", root_device_name,
- root_mountflags & ~MS_RDONLY);
- init_mkdir("/old", 0700);
- init_chdir("/old");
-
- info = call_usermodehelper_setup("/linuxrc", argv, envp_init,
- GFP_KERNEL, init_linuxrc, NULL, NULL);
- if (!info)
- return;
- call_usermodehelper_exec(info, UMH_WAIT_PROC|UMH_FREEZABLE);
-
- /* move initrd to rootfs' /old */
- init_mount("..", ".", NULL, MS_MOVE, NULL);
- /* switch root and cwd back to / of rootfs */
- init_chroot("..");
-
- if (new_decode_dev(real_root_dev) == Root_RAM0) {
- init_chdir("/old");
- return;
- }
-
- init_chdir("/");
- ROOT_DEV = new_decode_dev(real_root_dev);
- mount_root(root_device_name);
-
- printk(KERN_NOTICE "Trying to move old root to /initrd ... ");
- error = init_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
- if (!error)
- printk("okay\n");
- else {
- if (error == -ENOENT)
- printk("/initrd does not exist. Ignored.\n");
- else
- printk("failed\n");
- printk(KERN_NOTICE "Unmounting old root\n");
- init_umount("/old", MNT_DETACH);
- }
-}
-
-bool __init initrd_load(char *root_device_name)
+void __init initrd_load(void)
{
if (mount_initrd) {
create_dev("/dev/ram", Root_RAM0);
/*
- * Load the initrd data into /dev/ram0. Execute it as initrd
- * unless /dev/ram0 is supposed to be our actual root device,
- * in that case the ram disk is just set up here, and gets
- * mounted in the normal path.
+ * Load the initrd data into /dev/ram0.
*/
- if (rd_load_image("/initrd.image") && ROOT_DEV != Root_RAM0) {
- init_unlink("/initrd.image");
- handle_initrd(root_device_name);
- return true;
+ if (rd_load_image()) {
+ pr_warn("using deprecated initrd support, will be removed in September 2026; "
+ "use initramfs instead or (as a last resort) /sys/firmware/initrd.\n");
}
}
init_unlink("/initrd.image");
- return false;
}
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 5311f2d7edc8..0a021bbcd501 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -22,6 +22,7 @@ int __initdata rd_image_start; /* starting block # of image */
static int __init ramdisk_start_setup(char *str)
{
+ pr_warn("ramdisk_start= option is deprecated and will be removed soon\n");
rd_image_start = simple_strtol(str,NULL,0);
return 1;
}
@@ -177,7 +178,7 @@ static unsigned long nr_blocks(struct file *file)
return i_size_read(inode) >> 10;
}
-int __init rd_load_image(char *from)
+int __init rd_load_image(void)
{
int res = 0;
unsigned long rd_blocks, devblocks, nr_disks;
@@ -191,7 +192,7 @@ int __init rd_load_image(char *from)
if (IS_ERR(out_file))
goto out;
- in_file = filp_open(from, O_RDONLY, 0);
+ in_file = filp_open("/initrd.image", O_RDONLY, 0);
if (IS_ERR(in_file))
goto noclose_input;
@@ -220,10 +221,7 @@ int __init rd_load_image(char *from)
/*
* OK, time to copy in the data
*/
- if (strcmp(from, "/initrd.image") == 0)
- devblocks = nblocks;
- else
- devblocks = nr_blocks(in_file);
+ devblocks = nblocks;
if (devblocks == 0) {
printk(KERN_ERR "RAMDISK: could not determine device size\n");
@@ -267,13 +265,6 @@ int __init rd_load_image(char *from)
return res;
}
-int __init rd_load_disk(int n)
-{
- create_dev("/dev/root", ROOT_DEV);
- create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, n));
- return rd_load_image("/dev/root");
-}
-
static int exit_code;
static int decompress_error;
--
2.47.3
^ permalink raw reply related
* [PATCH v2 1/3] init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
From: Askar Safin @ 2025-10-10 9:40 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251010094047.3111495-1-safinaskar@gmail.com>
...which do nothing. They were deprecated (in documentation) in
6b99e6e6aa62 ("Documentation/admin-guide: blockdev/ramdisk: remove use of
"rdev"") and in kernel messages in c8376994c86c ("initrd: remove support
for multiple floppies")
Signed-off-by: Askar Safin <safinaskar@gmail.com>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ----
arch/arm/configs/neponset_defconfig | 2 +-
init/do_mounts.c | 7 -------
init/do_mounts_rd.c | 7 -------
4 files changed, 1 insertion(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e019db1633fd..521ab3425504 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3280,8 +3280,6 @@
If there are multiple matching configurations changing
the same attribute, the last one is used.
- load_ramdisk= [RAM] [Deprecated]
-
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
@@ -5245,8 +5243,6 @@
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
- prompt_ramdisk= [RAM] [Deprecated]
-
prot_virt= [S390] enable hosting protected virtual machines
isolated from the hypervisor (if hardware supports
that). If enabled, the default kernel base address
diff --git a/arch/arm/configs/neponset_defconfig b/arch/arm/configs/neponset_defconfig
index 2227f86100ad..4d720001c12e 100644
--- a/arch/arm/configs/neponset_defconfig
+++ b/arch/arm/configs/neponset_defconfig
@@ -9,7 +9,7 @@ CONFIG_ASSABET_NEPONSET=y
CONFIG_ZBOOT_ROM_TEXT=0x80000
CONFIG_ZBOOT_ROM_BSS=0xc1000000
CONFIG_ZBOOT_ROM=y
-CONFIG_CMDLINE="console=ttySA0,38400n8 cpufreq=221200 rw root=/dev/mtdblock2 mtdparts=sa1100:512K(boot),1M(kernel),2560K(initrd),4M(root) load_ramdisk=1 prompt_ramdisk=0 mem=32M noinitrd initrd=0xc0800000,3M"
+CONFIG_CMDLINE="console=ttySA0,38400n8 cpufreq=221200 rw root=/dev/mtdblock2 mtdparts=sa1100:512K(boot),1M(kernel),2560K(initrd),4M(root) mem=32M noinitrd initrd=0xc0800000,3M"
CONFIG_FPE_NWFPE=y
CONFIG_PM=y
CONFIG_MODULES=y
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 6af29da8889e..0f2f44e6250c 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -34,13 +34,6 @@ static int root_wait;
dev_t ROOT_DEV;
-static int __init load_ramdisk(char *str)
-{
- pr_warn("ignoring the deprecated load_ramdisk= option\n");
- return 1;
-}
-__setup("load_ramdisk=", load_ramdisk);
-
static int __init readonly(char *str)
{
if (*str)
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 19d9f33dcacf..5311f2d7edc8 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -18,13 +18,6 @@
static struct file *in_file, *out_file;
static loff_t in_pos, out_pos;
-static int __init prompt_ramdisk(char *str)
-{
- pr_warn("ignoring the deprecated prompt_ramdisk= option\n");
- return 1;
-}
-__setup("prompt_ramdisk=", prompt_ramdisk);
-
int __initdata rd_image_start; /* starting block # of image */
static int __init ramdisk_start_setup(char *str)
--
2.47.3
^ permalink raw reply related
* [PATCH v2 0/3] initrd: remove half of classic initrd support
From: Askar Safin @ 2025-10-10 9:40 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
Intro
====
This patchset removes half of classic initrd (initial RAM disk) support,
i. e. linuxrc code path, which was deprecated in 2020.
Initramfs still stays, RAM disk itself (brd) still stays.
And other half of initrd stays, too.
init/do_mounts* are listed in VFS entry in
MAINTAINERS, so I think this patchset should go through VFS tree.
I tested the patchset on 8 (!!!) archs in Qemu (see details below).
If you still use initrd, see below for workaround.
In 2020 deprecation notice was put to linuxrc initrd code path.
In previous version of this patchset I tried to remove initrd
fully, but Nicolas Schichan reported that he still uses
other code path (root=/dev/ram0 one) on million devices [4].
root=/dev/ram0 code path did not contain deprecation notice.
So, in this version of patchset I remove deprecated code path,
i. e. linuxrc one, while keeping other, i. e. root=/dev/ram0 one.
Also I put deprecation notice to remaining code path, i. e. to
root=/dev/ram0 one. I plan to send patches for full removal
of initrd after one year, i. e. in September 2026 (of course,
initramfs will still work).
Also, I tried to make this patchset small to make sure it
can be reverted easily. I plan to send cleanups later.
Details
====
Other user-visible changes:
- Removed kernel command line parameters "load_ramdisk" and
"prompt_ramdisk", which did nothing and were deprecated
- Removed /proc/sys/kernel/real-root-dev . It was used
for initrd only
- Command line parameters "noinitrd" and "ramdisk_start=" are deprecated
This patchset is based on current mainline (7f7072574127).
Testing
====
I tested my patchset on many architectures in Qemu using my Rust
program, heavily based on mkroot [1].
I used the following cross-compilers:
aarch64-linux-musleabi
armv4l-linux-musleabihf
armv5l-linux-musleabihf
armv7l-linux-musleabihf
i486-linux-musl
i686-linux-musl
mips-linux-musl
mips64-linux-musl
mipsel-linux-musl
powerpc-linux-musl
powerpc64-linux-musl
powerpc64le-linux-musl
riscv32-linux-musl
riscv64-linux-musl
s390x-linux-musl
sh4-linux-musl
sh4eb-linux-musl
x86_64-linux-musl
taken from this directory [2].
So, as you can see, there are 18 triplets, which correspond to 8 subdirs in arch/.
For every triplet I tested that:
- Initramfs still works (both builtin and external)
- Direct boot from disk still works
- Remaining initrd code path (root=/dev/ram0) still works
Workaround
====
If "retain_initrd" is passed to kernel, then initramfs/initrd,
passed by bootloader, is retained and becomes available after boot
as read-only magic file /sys/firmware/initrd [3].
No copies are involved. I. e. /sys/firmware/initrd is simply
a reference to original blob passed by bootloader.
This works even if initrd/initramfs is not recognized by kernel
in any way, i. e. even if it is not valid cpio archive, nor
a fs image supported by classic initrd.
This works both with my patchset and without it.
This means that you can emulate classic initrd so:
link builtin initramfs to kernel; in /init in this initramfs
copy /sys/firmware/initrd to some file in / and loop-mount it.
This is even better than classic initrd, because:
- You can use fs not supported by classic initrd, for example erofs
- One copy is involved (from /sys/firmware/initrd to some file in /)
as opposed to two when using classic initrd
Still, I don't recommend using this workaround, because
I want everyone to migrate to proper modern initramfs.
But still you can use this workaround if you want.
Also: it is not possible to directly loop-mount
/sys/firmware/initrd . Theoretically kernel can be changed
to allow this (and/or to make it writable), but I think nobody needs this.
And I don't want to implement this.
On Qemu's -initrd and GRUB's initrd
====
Don't panic, this patchset doesn't remove initramfs
(which is used by nearly all Linux distros). And I don't
have plans to remove it.
Qemu's -initrd option and GRUB's initrd command refer
to initrd bootloader mechanism, which is used to
load both initrd and (external) initramfs.
So, if you use Qemu's -initrd or GRUB's initrd,
then you likely use them to pass initramfs, and thus
you are safe.
v1: https://lore.kernel.org/lkml/20250913003842.41944-1-safinaskar@gmail.com/
v1 -> v2 changes:
- A lot. I removed most patches, see cover letter for details
[1] https://github.com/landley/toybox/tree/master/mkroot
[2] https://landley.net/toybox/downloads/binaries/toolchains/latest
[3] https://lore.kernel.org/all/20231207235654.16622-1-graf@amazon.com/
[4] https://lore.kernel.org/lkml/20250918152830.438554-1-nschichan@freebox.fr/
Askar Safin (3):
init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command
line parameters
initrd: remove deprecated code path (linuxrc)
init: remove /proc/sys/kernel/real-root-dev
.../admin-guide/kernel-parameters.txt | 8 +-
Documentation/admin-guide/sysctl/kernel.rst | 6 -
arch/arm/configs/neponset_defconfig | 2 +-
fs/init.c | 14 ---
include/linux/init_syscalls.h | 1 -
include/linux/initrd.h | 2 -
include/uapi/linux/sysctl.h | 1 -
init/do_mounts.c | 11 +-
init/do_mounts.h | 18 +--
init/do_mounts_initrd.c | 105 +-----------------
init/do_mounts_rd.c | 24 +---
11 files changed, 18 insertions(+), 174 deletions(-)
base-commit: 7f7072574127c9e971cad83a0274e86f6275c0d5
--
2.47.3
^ permalink raw reply
* Re: [PATCH v3 19/30] liveupdate: luo_sysfs: add sysfs state monitoring
From: Greg KH @ 2025-10-10 6:39 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Yanjun.Zhu, Pasha Tatashin, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, tglx, mingo, bp, dave.hansen, x86, hpa,
rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu
In-Reply-To: <mafs0ecrbmzzh.fsf@kernel.org>
On Fri, Oct 10, 2025 at 01:12:18AM +0200, Pratyush Yadav wrote:
> On Thu, Oct 09 2025, Yanjun.Zhu wrote:
>
> > On 10/9/25 10:04 AM, Pasha Tatashin wrote:
> >> On Thu, Oct 9, 2025 at 11:35 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> >>>
> >>> 在 2025/10/9 5:01, Pasha Tatashin 写道:
> >>>>>> Because the window of kernel live update is short, it is difficult to statistics
> >>>>>> how many times the kernel is live updated.
> >>>>>>
> >>>>>> Is it possible to add a variable to statistics the times that the kernel is live
> >>>>>> updated?
> >>>>> The kernel doesn't do the live update on its own. The process is driven
> >>>>> and sequenced by userspace. So if you want to keep statistics, you
> >>>>> should do it from your userspace (luod maybe?). I don't see any need for
> >>>>> this in the kernel.
> >>>>>
> >>>> One use case I can think of is including information in kdump or the
> >>>> backtrace warning/panic messages about how many times this machine has
> >>>> been live-updated. In the past, I've seen bugs (related to memory
> >>>> corruption) that occurred only after several kexecs, not on the first
> >>>> one. With live updates, especially while the code is being stabilized,
> >>>> I imagine we might have a similar situation. For that reason, it could
> >>>> be useful to have a count in the dmesg logs showing how many times
> >>>> this machine has been live-updated. While this information is also
> >>>> available in userspace, it would be simpler for kernel developers
> >>>> triaging these issues if everything were in one place.
>
> Hmm, good point.
>
> >>> I’m considering this issue from a system security perspective. After the
> >>> kernel is automatically updated, user-space applications are usually
> >>> unaware of the change. In one possible scenario, an attacker could
> >>> replace the kernel with a compromised version, while user-space
> >>> applications remain unaware of it — which poses a potential security risk.
>
> Wouldn't signing be the way to avoid that? Because if the kernel is
> compromised then it can very well fake the reboot count as well.
>
> >>>
> >>> To mitigate this, it would be useful to expose the number of kernel
> >>> updates through a sysfs interface, so that we can detect whether the
> >>> kernel has been updated and then collect information about the new
> >>> kernel to check for possible security issues.
> >>>
> >>> Of course, there are other ways to detect kernel updates — for example,
> >>> by using ftrace to monitor functions involved in live kernel updates —
> >>> but such approaches tend to have a higher performance overhead. In
> >>> contrast, adding a simple update counter to track live kernel updates
> >>> would provide similar monitoring capability with minimal overhead.
> >> Would a print during boot, i.e. when we print that this kernel is live
> >> updating, we could include the number, work for you? Otherwise, we
> >> could export this number in a debugfs.
> > Since I received a notification that my previous message was not sent
> > successfully, I am resending it.
> >
> > IMO, it would be better to export this number via debugfs. This approach reduces
> > the overhead involved in detecting a kernel live update.
> > If the number is printed in logs instead, the overhead would be higher compared
> > to using debugfs.
>
> Yeah, debugfs sounds fine. No ABI at least.
Do not provide any functionality in debugfs that userspace relies on at
all, as odds are, it will not be able to be accessed by most/all of
userspace on many systems. It is for debugging only.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Christoph Hellwig @ 2025-10-10 5:27 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Christoph Hellwig, Pavel Emelyanov, linux-fsdevel,
Raphael S . Carvalho, linux-api, linux-xfs
In-Reply-To: <CALCETrW3iQWQTdMbB52R4=GztfuFYvN_8p52H1fopdS8uExQWg@mail.gmail.com>
On Wed, Oct 08, 2025 at 08:22:35AM -0700, Andy Lutomirski wrote:
> On Mon, Oct 6, 2025 at 10:08 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Sat, Oct 04, 2025 at 09:08:05AM -0700, Andy Lutomirski wrote:
> > > > Well, we'll need to look into that, including maybe non-blockin
> > > > timestamp updates.
> > > >
> > >
> > > It's been 12 years (!), but maybe it's time to reconsider this:
> > >
> > > https://lore.kernel.org/all/cover.1377193658.git.luto@amacapital.net/
> >
> > I don't see how that is relevant here. Also writes through shared
> > mmaps are problematic for so many reasons that I'm not sure we want
> > to encourage people to use that more.
> >
>
> Because the same exact issue exists in the normal non-mmap write path,
> and I can even quote you upthread :)
The thread that started this is about io_uring nonblock writes, aka
O_DIRECT. So there isn't any writeback to defer to.
^ permalink raw reply
* Re: [PATCH-RFC] init: simplify initrd code (was Re: [PATCH RESEND 00/62] initrd: remove classic initrd support).
From: Askar Safin @ 2025-10-10 4:57 UTC (permalink / raw)
To: nschichan
Cc: akpm, andy.shevchenko, axboe, brauner, cyphar, devicetree,
ecurtin, email2tema, graf, gregkh, hca, hch, hsiangkao, initramfs,
jack, julian.stecklina, kees, linux-acpi, linux-alpha, linux-api,
linux-arch, linux-arm-kernel, linux-block, linux-csky, linux-doc,
linux-efi, linux-ext4, linux-fsdevel, linux-hexagon, linux-kernel,
linux-m68k, linux-mips, linux-openrisc, linux-parisc, linux-riscv,
linux-s390, linux-sh, linux-snps-arc, linux-um, linuxppc-dev,
loongarch, mcgrof, mingo, monstr, mzxreary, patches, rob,
sparclinux, thomas.weissschuh, thorsten.blum, torvalds, tytso,
viro, x86
In-Reply-To: <20250925131055.3933381-1-nschichan@freebox.fr>
On Thu, Sep 25, 2025 at 4:12 PM <nschichan@freebox.fr> wrote:
> - drop prompt_ramdisk and ramdisk_start kernel parameters
> - drop compression support
> - drop image autodetection, the whole /initrd.image content is now
> copied into /dev/ram0
> - remove rd_load_disk() which doesn't seem to be used anywhere.
I welcome any initrd simplification!
> Hopefully my email config is now better and reaches gmail users
> correctly.
Yes, I got this email.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH RESEND 00/62] initrd: remove classic initrd support
From: Askar Safin @ 2025-10-10 4:09 UTC (permalink / raw)
To: Jessica Clarke
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Greg Kroah-Hartman,
Christian Brauner, Al Viro, Jan Kara, Christoph Hellwig,
Jens Axboe, Andy Shevchenko, Aleksa Sarai, Thomas Weißschuh,
Julian Stecklina, Gao Xiang, Art Nikpal, Andrew Morton,
Eric Curtin, Alexander Graf, Rob Landley, Lennart Poettering,
linux-arch, linux-alpha, linux-snps-arc, linux-arm-kernel,
linux-csky, linux-hexagon, loongarch, linux-m68k, linux-mips,
linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv,
linux-s390, linux-sh, sparclinux, linux-um, x86, Ingo Molnar,
linux-block, initramfs, linux-api, linux-doc, linux-efi,
linux-ext4, Theodore Y . Ts'o, linux-acpi, Michal Simek,
devicetree, Luis Chamberlain, Kees Cook, Thorsten Blum,
Heiko Carstens, patches
In-Reply-To: <A08066E1-A57E-4980-B15A-8FB00AC747CC@jrtc27.com>
On Tue, Sep 16, 2025 at 8:08 PM Jessica Clarke <jrtc27@jrtc27.com> wrote:
> I strongly suggest picking different names given __builtin_foo is the
> naming scheme used for GNU C builtins/intrinsics. I leave you and
> others to bikeshed that one.
Thank you! I will fix this.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH RESEND 28/62] init: alpha, arc, arm, arm64, csky, m68k, microblaze, mips, nios2, openrisc, parisc, powerpc, s390, sh, sparc, um, x86, xtensa: rename initrd_{start,end} to virt_external_initramfs_{start,end}
From: Askar Safin @ 2025-10-10 4:07 UTC (permalink / raw)
To: Rob Herring
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Greg Kroah-Hartman,
Christian Brauner, Al Viro, Jan Kara, Christoph Hellwig,
Jens Axboe, Andy Shevchenko, Aleksa Sarai, Thomas Weißschuh,
Julian Stecklina, Gao Xiang, Art Nikpal, Andrew Morton,
Eric Curtin, Alexander Graf, Rob Landley, Lennart Poettering,
linux-arch, linux-alpha, linux-snps-arc, linux-arm-kernel,
linux-csky, linux-hexagon, loongarch, linux-m68k, linux-mips,
linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv,
linux-s390, linux-sh, sparclinux, linux-um, x86, Ingo Molnar,
linux-block, initramfs, linux-api, linux-doc, linux-efi,
linux-ext4, Theodore Y . Ts'o, linux-acpi, Michal Simek,
devicetree, Luis Chamberlain, Kees Cook, Thorsten Blum,
Heiko Carstens, patches
In-Reply-To: <20250916030903.GA3598798-robh@kernel.org>
On Tue, Sep 16, 2025 at 6:09 AM Rob Herring <robh@kernel.org> wrote:
> There's not really any point in listing every arch in the subject.
Ok, I will fix this.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH RESEND 02/62] init: remove deprecated "prompt_ramdisk" command line parameter, which does nothing
From: Askar Safin @ 2025-10-10 3:17 UTC (permalink / raw)
To: Christophe Leroy
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Greg Kroah-Hartman,
Christian Brauner, Al Viro, Jan Kara, Christoph Hellwig,
Jens Axboe, Andy Shevchenko, Aleksa Sarai, Thomas Weißschuh,
Julian Stecklina, Gao Xiang, Art Nikpal, Andrew Morton,
Eric Curtin, Alexander Graf, Rob Landley, Lennart Poettering,
linux-arch, linux-alpha, linux-snps-arc, linux-arm-kernel,
linux-csky, linux-hexagon, loongarch, linux-m68k, linux-mips,
linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv,
linux-s390, linux-sh, sparclinux, linux-um, x86, Ingo Molnar,
linux-block, initramfs, linux-api, linux-doc, linux-efi,
linux-ext4, Theodore Y . Ts'o, linux-acpi, Michal Simek,
devicetree, Luis Chamberlain, Kees Cook, Thorsten Blum,
Heiko Carstens, patches
In-Reply-To: <053f39a9-06dc-4fbd-ad1b-325f9d3f3f66@csgroup.eu>
On Mon, Sep 15, 2025 at 2:16 PM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
> Squash patch 1 and patch 2 together and say this is cleanup of two
> options deprecated by commit c8376994c86c ("initrd: remove support for
> multiple floppies") with the documentation by commit 6b99e6e6aa62
> ("Documentation/admin-guide: blockdev/ramdisk: remove use of "rdev"")
Will do in v2.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH RESEND 21/62] init: remove all mentions of root=/dev/ram*
From: Askar Safin @ 2025-10-10 2:48 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Greg Kroah-Hartman,
Christian Brauner, Al Viro, Jan Kara, Christoph Hellwig,
Jens Axboe, Andy Shevchenko, Aleksa Sarai, Thomas Weißschuh,
Julian Stecklina, Gao Xiang, Art Nikpal, Andrew Morton,
Eric Curtin, Alexander Graf, Rob Landley, Lennart Poettering,
linux-arch, linux-alpha, linux-snps-arc, linux-arm-kernel,
linux-csky, linux-hexagon, loongarch, linux-m68k, linux-mips,
linux-openrisc, linux-parisc, linuxppc-dev, linux-riscv,
linux-s390, linux-sh, sparclinux, linux-um, x86, Ingo Molnar,
linux-block, initramfs, linux-api, linux-doc, linux-efi,
linux-ext4, Theodore Y . Ts'o, linux-acpi, Michal Simek,
devicetree, Luis Chamberlain, Kees Cook, Thorsten Blum,
Heiko Carstens, patches
In-Reply-To: <a079375f-38c2-4f38-b2be-57737084fde8@kernel.org>
On Sun, Sep 14, 2025 at 1:06 PM Krzysztof Kozlowski <krzk@kernel.org> wrote:
> Please wrap commit message according to Linux coding style / submission
I will do this for v2
> To me your patchset is way too big bomb, too difficult to review. You
v2 will be small.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH v4 00/30] Live Update Orchestrator
From: Pasha Tatashin @ 2025-10-09 23:50 UTC (permalink / raw)
To: Pratyush Yadav
Cc: jasonmiu, graf, changyuanl, rppt, dmatlack, rientjes, corbet,
rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl, masahiroy, akpm,
tj, yoann.congal, mmaurer, roman.gushchin, chenridong, axboe,
mark.rutland, jannh, vincent.guittot, hannes, dan.j.williams,
david, joel.granados, rostedt, anna.schumaker, song, zhangguopeng,
linux, linux-kernel, linux-doc, linux-mm, gregkh, tglx, mingo, bp,
dave.hansen, x86, hpa, rafael, dakr, bartosz.golaszewski,
cw00.choi, myungjoo.ham, yesanishhere, Jonathan.Cameron,
quic_zijuhu, aleksander.lobakin, ira.weiny, andriy.shevchenko,
leon, lukas, bhelgaas, wagi, djeffery, stuart.w.hayes, lennart,
brauner, linux-api, linux-fsdevel, saeedm, ajayachandra, jgg,
parav, leonro, witu, hughd, skhawaja, chrisl, steven.sistare
In-Reply-To: <mafs0ms5zn0nm.fsf@kernel.org>
On Thu, Oct 9, 2025 at 6:58 PM Pratyush Yadav <pratyush@kernel.org> wrote:
>
> On Tue, Oct 07 2025, Pasha Tatashin wrote:
>
> > On Sun, Sep 28, 2025 at 9:03 PM Pasha Tatashin
> > <pasha.tatashin@soleen.com> wrote:
> >>
> [...]
> > 4. New File-Lifecycle-Bound Global State
> > ----------------------------------------
> > A new mechanism for managing global state was proposed, designed to be
> > tied to the lifecycle of the preserved files themselves. This would
> > allow a file owner (e.g., the IOMMU subsystem) to save and retrieve
> > global state that is only relevant when one or more of its FDs are
> > being managed by LUO.
>
> Is this going to replace LUO subsystems? If yes, then why? The global
> state will likely need to have its own lifecycle just like the FDs, and
> subsystems are a simple and clean abstraction to control that. I get the
> idea of only "activating" a subsystem when one or more of its FDs are
> participating in LUO, but we can do that while keeping subsystems
> around.
Thanks for the feedback. The FLB Global State is not replacing the LUO
subsystems. On the contrary, it's a higher-level abstraction that is
itself implemented as a LUO subsystem. The goal is to provide a
solution for a pattern that emerged during the PCI and IOMMU
discussions.
You can see the WIP implementation here, which shows it registering as
a subsystem named "luo-fh-states-v1-struct":
https://github.com/soleen/linux/commit/94e191aab6b355d83633718bc4a1d27dda390001
The existing subsystem API is a low-level tool that provides for the
preservation of a raw 8-byte handle. It doesn't provide locking, nor
is it explicitly tied to the lifecycle of any higher-level object like
a file handler. The new API is designed to solve a more specific
problem: allowing global components (like IOMMU or PCI) to
automatically track when resources relevant to them are added to or
removed from preservation. If HugeTLB requires a subsystem, it can
still use it, but I suspect it might benefit from FLB Global State as
well.
> Here is how I imagine the proposed API would compare against subsystems
> with hugetlb as an example (hugetlb support is still WIP, so I'm still
> not clear on specifics, but this is how I imagine it will work):
>
> - Hugetlb subsystem needs to track its huge page pools and which pages
> are allocated and free. This is its global state. The pools get
> reconstructed after kexec. Post-kexec, the free pages are ready for
> allocation from other "regular" files and the pages used in LUO files
> are reserved.
>
> - Pre-kexec, when a hugetlb FD is preserved, it marks that as preserved
> in hugetlb's global data structure tracking this. This is runtime data
> (say xarray), and _not_ serialized data. Reason being, there are
> likely more FDs to come so no point in wasting time serializing just
> yet.
>
> This can look something like:
>
> hugetlb_luo_preserve_folio(folio, ...);
>
> Nice and simple.
>
> Compare this with the new proposed API:
>
> liveupdate_fh_global_state_get(h, &hugetlb_data);
> // This will have update serialized state now.
> hugetlb_luo_preserve_folio(hugetlb_data, folio, ...);
> liveupdate_fh_global_state_put(h);
>
> We do the same thing but in a very complicated way.
>
> - When the system-wide preserve happens, the hugetlb subsystem gets a
> callback to serialize. It converts its runtime global state to
> serialized state since now it knows no more FDs will be added.
>
> With the new API, this doesn't need to be done since each FD prepare
> already updates serialized state.
>
> - If there are no hugetlb FDs, then the hugetlb subsystem doesn't put
> anything in LUO. This is same as new API.
>
> - If some hugetlb FDs are not restored after liveupdate and the finish
> event is triggered, the subsystem gets its finish() handler called and
> it can free things up.
>
> I don't get how that would work with the new API.
The new API isn't more complicated; It codifies the common pattern of
"create on first use, destroy on last use" into a reusable helper,
saving each file handler from having to reinvent the same reference
counting and locking scheme. But, as you point out, subsystems provide
more control, specifically they handle full creation/free instead of
relying on file-handlers for that.
> My point is, I see subsystems working perfectly fine here and I don't
> get how the proposed API is any better.
>
> Am I missing something?
No, I don't think you are. Your analysis is correct that this is
achievable with subsystems. The goal of the new API is to make that
specific, common use case simpler.
Pasha
^ permalink raw reply
* Re: [PATCH v3 19/30] liveupdate: luo_sysfs: add sysfs state monitoring
From: Pratyush Yadav @ 2025-10-09 23:12 UTC (permalink / raw)
To: Yanjun.Zhu
Cc: Pasha Tatashin, Pratyush Yadav, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, jgg, parav, leonro, witu
In-Reply-To: <d09881f5-0e0b-4795-99bf-cd3711ee48ab@linux.dev>
On Thu, Oct 09 2025, Yanjun.Zhu wrote:
> On 10/9/25 10:04 AM, Pasha Tatashin wrote:
>> On Thu, Oct 9, 2025 at 11:35 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>>
>>> 在 2025/10/9 5:01, Pasha Tatashin 写道:
>>>>>> Because the window of kernel live update is short, it is difficult to statistics
>>>>>> how many times the kernel is live updated.
>>>>>>
>>>>>> Is it possible to add a variable to statistics the times that the kernel is live
>>>>>> updated?
>>>>> The kernel doesn't do the live update on its own. The process is driven
>>>>> and sequenced by userspace. So if you want to keep statistics, you
>>>>> should do it from your userspace (luod maybe?). I don't see any need for
>>>>> this in the kernel.
>>>>>
>>>> One use case I can think of is including information in kdump or the
>>>> backtrace warning/panic messages about how many times this machine has
>>>> been live-updated. In the past, I've seen bugs (related to memory
>>>> corruption) that occurred only after several kexecs, not on the first
>>>> one. With live updates, especially while the code is being stabilized,
>>>> I imagine we might have a similar situation. For that reason, it could
>>>> be useful to have a count in the dmesg logs showing how many times
>>>> this machine has been live-updated. While this information is also
>>>> available in userspace, it would be simpler for kernel developers
>>>> triaging these issues if everything were in one place.
Hmm, good point.
>>> I’m considering this issue from a system security perspective. After the
>>> kernel is automatically updated, user-space applications are usually
>>> unaware of the change. In one possible scenario, an attacker could
>>> replace the kernel with a compromised version, while user-space
>>> applications remain unaware of it — which poses a potential security risk.
Wouldn't signing be the way to avoid that? Because if the kernel is
compromised then it can very well fake the reboot count as well.
>>>
>>> To mitigate this, it would be useful to expose the number of kernel
>>> updates through a sysfs interface, so that we can detect whether the
>>> kernel has been updated and then collect information about the new
>>> kernel to check for possible security issues.
>>>
>>> Of course, there are other ways to detect kernel updates — for example,
>>> by using ftrace to monitor functions involved in live kernel updates —
>>> but such approaches tend to have a higher performance overhead. In
>>> contrast, adding a simple update counter to track live kernel updates
>>> would provide similar monitoring capability with minimal overhead.
>> Would a print during boot, i.e. when we print that this kernel is live
>> updating, we could include the number, work for you? Otherwise, we
>> could export this number in a debugfs.
> Since I received a notification that my previous message was not sent
> successfully, I am resending it.
>
> IMO, it would be better to export this number via debugfs. This approach reduces
> the overhead involved in detecting a kernel live update.
> If the number is printed in logs instead, the overhead would be higher compared
> to using debugfs.
Yeah, debugfs sounds fine. No ABI at least.
--
Regards,
Pratyush Yadav
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox