* Re: [PATCH v2 1/3] init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
From: Askar Safin @ 2025-10-13 6:05 UTC (permalink / raw)
To: Andy Shevchenko
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Greg Kroah-Hartman,
Christian Brauner, Al Viro, Jan Kara, Christoph Hellwig,
Jens Axboe, Aleksa Sarai, Thomas Weißschuh, Julian Stecklina,
Gao Xiang, Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <CAHp75VeJM_OoCWDX20FhphRi6e7rG9Z4X6zkjx9vFF12n7Ef7A@mail.gmail.com>
On Fri, Oct 10, 2025 at 6:02 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> 1) often the last period is missing in the commit messages;
I will fix in v3.
> 2) in this change it's unclear for how long (years) the feature was
> deprecated, i.e. the other patch states that 2020 for something else.
> I wonder if this one has the similar order of oldness.
These two commits were done in 2020, too. I will fix in v3.
--
Askar Safin
^ permalink raw reply
* Re: [PATCH v6 1/5] Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
From: kernel test robot @ 2025-10-11 12:07 UTC (permalink / raw)
To: Maxime Bélair, linux-security-module
Cc: oe-kbuild-all, john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-2-maxime.belair@canonical.com>
Hi Maxime,
kernel test robot noticed the following build errors:
[auto build test ERROR on 9c32cda43eb78f78c73aee4aa344b777714e259b]
url: https://github.com/intel-lab-lkp/linux/commits/Maxime-B-lair/Wire-up-lsm_config_self_policy-and-lsm_config_system_policy-syscalls/20251010-213606
base: 9c32cda43eb78f78c73aee4aa344b777714e259b
patch link: https://lore.kernel.org/r/20251010132610.12001-2-maxime.belair%40canonical.com
patch subject: [PATCH v6 1/5] Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
config: sh-randconfig-001-20251011 (https://download.01.org/0day-ci/archive/20251011/202510111947.0ObJ6YUH-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510111947.0ObJ6YUH-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510111947.0ObJ6YUH-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from kernel/umh.c:9:0:
>> include/linux/syscalls.h:994:45: error: expected ';', ',' or ')' before 'u32'
u32 __user size, u32 common_flags u32 flags);
^~~
--
In file included from kernel/fork.c:56:0:
>> include/linux/syscalls.h:994:45: error: expected ';', ',' or ')' before 'u32'
u32 __user size, u32 common_flags u32 flags);
^~~
kernel/fork.c: In function '__do_sys_clone3':
kernel/fork.c:3135:2: warning: #warning clone3() entry point is missing, please fix [-Wcpp]
#warning clone3() entry point is missing, please fix
^~~~~~~
vim +994 include/linux/syscalls.h
817
818 /* CONFIG_MMU only */
819 asmlinkage long sys_swapon(const char __user *specialfile, int swap_flags);
820 asmlinkage long sys_swapoff(const char __user *specialfile);
821 asmlinkage long sys_mprotect(unsigned long start, size_t len,
822 unsigned long prot);
823 asmlinkage long sys_msync(unsigned long start, size_t len, int flags);
824 asmlinkage long sys_mlock(unsigned long start, size_t len);
825 asmlinkage long sys_munlock(unsigned long start, size_t len);
826 asmlinkage long sys_mlockall(int flags);
827 asmlinkage long sys_munlockall(void);
828 asmlinkage long sys_mincore(unsigned long start, size_t len,
829 unsigned char __user * vec);
830 asmlinkage long sys_madvise(unsigned long start, size_t len, int behavior);
831 asmlinkage long sys_process_madvise(int pidfd, const struct iovec __user *vec,
832 size_t vlen, int behavior, unsigned int flags);
833 asmlinkage long sys_process_mrelease(int pidfd, unsigned int flags);
834 asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
835 unsigned long prot, unsigned long pgoff,
836 unsigned long flags);
837 asmlinkage long sys_mseal(unsigned long start, size_t len, unsigned long flags);
838 asmlinkage long sys_mbind(unsigned long start, unsigned long len,
839 unsigned long mode,
840 const unsigned long __user *nmask,
841 unsigned long maxnode,
842 unsigned flags);
843 asmlinkage long sys_get_mempolicy(int __user *policy,
844 unsigned long __user *nmask,
845 unsigned long maxnode,
846 unsigned long addr, unsigned long flags);
847 asmlinkage long sys_set_mempolicy(int mode, const unsigned long __user *nmask,
848 unsigned long maxnode);
849 asmlinkage long sys_migrate_pages(pid_t pid, unsigned long maxnode,
850 const unsigned long __user *from,
851 const unsigned long __user *to);
852 asmlinkage long sys_move_pages(pid_t pid, unsigned long nr_pages,
853 const void __user * __user *pages,
854 const int __user *nodes,
855 int __user *status,
856 int flags);
857 asmlinkage long sys_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig,
858 siginfo_t __user *uinfo);
859 asmlinkage long sys_perf_event_open(
860 struct perf_event_attr __user *attr_uptr,
861 pid_t pid, int cpu, int group_fd, unsigned long flags);
862 asmlinkage long sys_accept4(int, struct sockaddr __user *, int __user *, int);
863 asmlinkage long sys_recvmmsg(int fd, struct mmsghdr __user *msg,
864 unsigned int vlen, unsigned flags,
865 struct __kernel_timespec __user *timeout);
866 asmlinkage long sys_recvmmsg_time32(int fd, struct mmsghdr __user *msg,
867 unsigned int vlen, unsigned flags,
868 struct old_timespec32 __user *timeout);
869 asmlinkage long sys_wait4(pid_t pid, int __user *stat_addr,
870 int options, struct rusage __user *ru);
871 asmlinkage long sys_prlimit64(pid_t pid, unsigned int resource,
872 const struct rlimit64 __user *new_rlim,
873 struct rlimit64 __user *old_rlim);
874 asmlinkage long sys_fanotify_init(unsigned int flags, unsigned int event_f_flags);
875 #if defined(CONFIG_ARCH_SPLIT_ARG64)
876 asmlinkage long sys_fanotify_mark(int fanotify_fd, unsigned int flags,
877 unsigned int mask_1, unsigned int mask_2,
878 int dfd, const char __user * pathname);
879 #else
880 asmlinkage long sys_fanotify_mark(int fanotify_fd, unsigned int flags,
881 u64 mask, int fd,
882 const char __user *pathname);
883 #endif
884 asmlinkage long sys_name_to_handle_at(int dfd, const char __user *name,
885 struct file_handle __user *handle,
886 void __user *mnt_id, int flag);
887 asmlinkage long sys_open_by_handle_at(int mountdirfd,
888 struct file_handle __user *handle,
889 int flags);
890 asmlinkage long sys_clock_adjtime(clockid_t which_clock,
891 struct __kernel_timex __user *tx);
892 asmlinkage long sys_clock_adjtime32(clockid_t which_clock,
893 struct old_timex32 __user *tx);
894 asmlinkage long sys_syncfs(int fd);
895 asmlinkage long sys_setns(int fd, int nstype);
896 asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
897 asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
898 unsigned int vlen, unsigned flags);
899 asmlinkage long sys_process_vm_readv(pid_t pid,
900 const struct iovec __user *lvec,
901 unsigned long liovcnt,
902 const struct iovec __user *rvec,
903 unsigned long riovcnt,
904 unsigned long flags);
905 asmlinkage long sys_process_vm_writev(pid_t pid,
906 const struct iovec __user *lvec,
907 unsigned long liovcnt,
908 const struct iovec __user *rvec,
909 unsigned long riovcnt,
910 unsigned long flags);
911 asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
912 unsigned long idx1, unsigned long idx2);
913 asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
914 asmlinkage long sys_sched_setattr(pid_t pid,
915 struct sched_attr __user *attr,
916 unsigned int flags);
917 asmlinkage long sys_sched_getattr(pid_t pid,
918 struct sched_attr __user *attr,
919 unsigned int size,
920 unsigned int flags);
921 asmlinkage long sys_renameat2(int olddfd, const char __user *oldname,
922 int newdfd, const char __user *newname,
923 unsigned int flags);
924 asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
925 void __user *uargs);
926 asmlinkage long sys_getrandom(char __user *buf, size_t count,
927 unsigned int flags);
928 asmlinkage long sys_memfd_create(const char __user *uname_ptr, unsigned int flags);
929 asmlinkage long sys_bpf(int cmd, union bpf_attr __user *attr, unsigned int size);
930 asmlinkage long sys_execveat(int dfd, const char __user *filename,
931 const char __user *const __user *argv,
932 const char __user *const __user *envp, int flags);
933 asmlinkage long sys_userfaultfd(int flags);
934 asmlinkage long sys_membarrier(int cmd, unsigned int flags, int cpu_id);
935 asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
936 asmlinkage long sys_copy_file_range(int fd_in, loff_t __user *off_in,
937 int fd_out, loff_t __user *off_out,
938 size_t len, unsigned int flags);
939 asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
940 unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
941 rwf_t flags);
942 asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
943 unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
944 rwf_t flags);
945 asmlinkage long sys_pkey_mprotect(unsigned long start, size_t len,
946 unsigned long prot, int pkey);
947 asmlinkage long sys_pkey_alloc(unsigned long flags, unsigned long init_val);
948 asmlinkage long sys_pkey_free(int pkey);
949 asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
950 unsigned mask, struct statx __user *buffer);
951 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
952 int flags, uint32_t sig);
953 asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags);
954 asmlinkage long sys_open_tree_attr(int dfd, const char __user *path,
955 unsigned flags,
956 struct mount_attr __user *uattr,
957 size_t usize);
958 asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
959 int to_dfd, const char __user *to_path,
960 unsigned int ms_flags);
961 asmlinkage long sys_mount_setattr(int dfd, const char __user *path,
962 unsigned int flags,
963 struct mount_attr __user *uattr, size_t usize);
964 asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags);
965 asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key,
966 const void __user *value, int aux);
967 asmlinkage long sys_fsmount(int fs_fd, unsigned int flags, unsigned int ms_flags);
968 asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags);
969 asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
970 siginfo_t __user *info,
971 unsigned int flags);
972 asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
973 asmlinkage long sys_landlock_create_ruleset(const struct landlock_ruleset_attr __user *attr,
974 size_t size, __u32 flags);
975 asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
976 const void __user *rule_attr, __u32 flags);
977 asmlinkage long sys_landlock_restrict_self(int ruleset_fd, __u32 flags);
978 asmlinkage long sys_memfd_secret(unsigned int flags);
979 asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len,
980 unsigned long home_node,
981 unsigned long flags);
982 asmlinkage long sys_cachestat(unsigned int fd,
983 struct cachestat_range __user *cstat_range,
984 struct cachestat __user *cstat, unsigned int flags);
985 asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags);
986 asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
987 u32 __user *size, u32 flags);
988 asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
989 u32 size, u32 flags);
990 asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
991 asmlinkage long sys_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
992 u32 __user size, u32 common_flags, u32 flags);
993 asmlinkage long sys_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
> 994 u32 __user size, u32 common_flags u32 flags);
995
996
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Andy Lutomirski @ 2025-10-11 4:04 UTC (permalink / raw)
To: Dave Chinner
Cc: Christoph Hellwig, Pavel Emelyanov, linux-fsdevel,
Raphael S . Carvalho, linux-api, linux-xfs
In-Reply-To: <aOm0WCB_woFgnv0v@dread.disaster.area>
On Fri, Oct 10, 2025 at 6:35 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Oct 08, 2025 at 02:51:14PM -0700, Andy Lutomirski wrote:
> > On Wed, Oct 8, 2025 at 2:27 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Wed, Oct 08, 2025 at 08:22:35AM -0700, Andy Lutomirski wrote:
> > > > On Mon, Oct 6, 2025 at 10:08 PM Christoph Hellwig <hch@infradead.org> wrote:
> > > > >
> > > > > On Sat, Oct 04, 2025 at 09:08:05AM -0700, Andy Lutomirski wrote:
> >
> > >
> > > You are conflating "synchronous update" with "blocking".
> > >
> > > Avoiding the need for synchronous timestamp updates is exactly what
> > > the lazytime mount option provides. i.e. lazytime degrades immediate
> > > consistency requirements to eventual consistency similar to how the
> > > default relatime behaviour defers atime updates for eventual
> > > writeback.
> > >
> > > IOWs, we've already largely addressed the synchronous c/mtime update
> > > problem but what we haven't done is made timestamp updates
> > > fully support non-blocking caller semantics. That's a separate
> > > problem...
> >
> > I'm probably missing something, but is this really different?
>
> Yes, and yes.
>
> > Either the mtime update can block or it can't block.
>
> Sure, but that's not the issue we have to deal with.
>
> In many filesystems and fs operations, we have to know if an
> operation is going to block -before- we start the operation. e.g.
> transactional changes cannot be rolled back once we've started the
> modification if they need to block to make progress (e.g. read in
> on-disk metadata).
>
> This foresight, in many cases, is -unknowable-. Even though the
> operation /likely/ won't block, we cannot *guarantee* ahead of time
> that any given instance of the operation will /not/ block. Hence
> the reliable non-blocking operation that users are asking for is not
> possible with unknowable implementation characteristics like this.
>
> IOWs, a timestamp update implementation can be synchronous and
> reliably non-blocking if it always knows when blocking will occur
> and can return -EAGAIN instead of blocking to complete the
> operation.
>
> If it can't know when/if blocking will occur, then lazytime allows
> us to defer the (potentially) blocking update operation to another
> context that can block. Queuing for async processing can easily be
> made non-blocking, and __mark_inode_dirty(I_DIRTY_TIME) does this
> for us.
>
> So, yeah, it should be pretty obvious at this point that non-blocking
> implementation is completely independent of whether the operation is
> performed synchronously or asynchronously. It's easier to make async
> operations non-blocking, but that doesn't mean "non_blocking" and
> "asynchronous execution" are interchangable terms or behaviours.
>
> > I haven't dug all the
> > way into exactly what happens in __mark_inode_dirty(), but there is a
> > lot going on in there even in the I_DIRTY_TIME path.
>
> It's pretty simple, really. __mark_inode_dirty(I_DIRTY_TIME) is
> non-blocking and queues the inode on the wb->i_dirty_time queue
> for later processing.
>
First, I apologize if I'm off base here.
Second, I don't think I'm entirely nuts, and I'm moderately confident
that, ten-ish years ago, I tested lazytime in the hopes that it would
solve my old problem, and IIRC it didn't help. I was running a
production workload on ext4 on regrettably slow spinning rust backed
by a truly atrocious HPE controller. And I was running latencytop to
generate little traces when my task got blocked, and there was no form
of AIO involved. (And I don't really understand how AIO is wired up
internally... And yes, in retrospect I should not have been using
shared-writable mmaps or even file-backed things at all for what I was
doing, but I had unrealistic expectations of how mmap worked when I
wrote that code more like 20 years ago, and I wasn't even using Linux
at the time I wrote it.)
I'm looking at the code now, and I see what you're talking about, and
__mark_inode_dirty(inode, I_DIRTY_TIME) looks fairly polite and like
it won't block. But the relevant code seems to be:
int generic_update_time(struct inode *inode, int flags)
{
int updated = inode_update_timestamps(inode, flags);
int dirty_flags = 0;
if (updated & (S_ATIME|S_MTIME|S_CTIME))
dirty_flags = inode->i_sb->s_flags & SB_LAZYTIME ?
I_DIRTY_TIME : I_DIRTY_SYNC;
if (updated & S_VERSION)
dirty_flags |= I_DIRTY_SYNC;
__mark_inode_dirty(inode, dirty_flags);
...
inode_update_timestamps does this, where updated != 0 if the timestamp
actually changed (which is subject to some complex coarse-graining
logic so it may only happen some of the time):
if (IS_I_VERSION(inode) &&
inode_maybe_inc_iversion(inode, updated))
updated |= S_VERSION;
IS_I_VERSION seems to be unconditionally true on ext4.
inode_maybe_inc_iversion always returns true if updated is set, so
generic_update_time has a decent chance of doing
__mark_inode_dirty(inode, I_DIRTY_SYNC), which calls
s_op->dirty_inode, which calls ext4_journal_start, which, from my
recollection a decade ago, could easily block for a good second or so
on my delightful, now retired, HP/HPE system.
In my case, I think this is the path that was blocking for me in lots
of do_wp_page calls that would otherwise not have blocked. I also
don't see any kiocb passed around or any mechanism by which this code
could know that it's supposed to be nonblocking, although I have
approximately no understanding of Linux AIO and I don't really know
what I should be looking for.
I could try to instrument the code a bit and test to see if I've
analyzed it right in a few days.
--Andy
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply
* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Dave Chinner @ 2025-10-11 1:35 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Christoph Hellwig, Pavel Emelyanov, linux-fsdevel,
Raphael S . Carvalho, linux-api, linux-xfs
In-Reply-To: <CALCETrX-cs5MH3k369q2Fk5Q-pYQfEV6CW3va-4E9vD1CoCaGA@mail.gmail.com>
On Wed, Oct 08, 2025 at 02:51:14PM -0700, Andy Lutomirski wrote:
> On Wed, Oct 8, 2025 at 2:27 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Oct 08, 2025 at 08:22:35AM -0700, Andy Lutomirski wrote:
> > > On Mon, Oct 6, 2025 at 10:08 PM Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > On Sat, Oct 04, 2025 at 09:08:05AM -0700, Andy Lutomirski wrote:
>
> >
> > You are conflating "synchronous update" with "blocking".
> >
> > Avoiding the need for synchronous timestamp updates is exactly what
> > the lazytime mount option provides. i.e. lazytime degrades immediate
> > consistency requirements to eventual consistency similar to how the
> > default relatime behaviour defers atime updates for eventual
> > writeback.
> >
> > IOWs, we've already largely addressed the synchronous c/mtime update
> > problem but what we haven't done is made timestamp updates
> > fully support non-blocking caller semantics. That's a separate
> > problem...
>
> I'm probably missing something, but is this really different?
Yes, and yes.
> Either the mtime update can block or it can't block.
Sure, but that's not the issue we have to deal with.
In many filesystems and fs operations, we have to know if an
operation is going to block -before- we start the operation. e.g.
transactional changes cannot be rolled back once we've started the
modification if they need to block to make progress (e.g. read in
on-disk metadata).
This foresight, in many cases, is -unknowable-. Even though the
operation /likely/ won't block, we cannot *guarantee* ahead of time
that any given instance of the operation will /not/ block. Hence
the reliable non-blocking operation that users are asking for is not
possible with unknowable implementation characteristics like this.
IOWs, a timestamp update implementation can be synchronous and
reliably non-blocking if it always knows when blocking will occur
and can return -EAGAIN instead of blocking to complete the
operation.
If it can't know when/if blocking will occur, then lazytime allows
us to defer the (potentially) blocking update operation to another
context that can block. Queuing for async processing can easily be
made non-blocking, and __mark_inode_dirty(I_DIRTY_TIME) does this
for us.
So, yeah, it should be pretty obvious at this point that non-blocking
implementation is completely independent of whether the operation is
performed synchronously or asynchronously. It's easier to make async
operations non-blocking, but that doesn't mean "non_blocking" and
"asynchronous execution" are interchangable terms or behaviours.
> I haven't dug all the
> way into exactly what happens in __mark_inode_dirty(), but there is a
> lot going on in there even in the I_DIRTY_TIME path.
It's pretty simple, really. __mark_inode_dirty(I_DIRTY_TIME) is
non-blocking and queues the inode on the wb->i_dirty_time queue
for later processing.
> And Pavel is
> saying that AIO and mtime updates don't play along well.
Again: this is exactly why lazytime was added to XFS *ten years
ago*. From 2015 (issue #3):
https://lore.kernel.org/linux-xfs/CAD-J=zZh1dtJsfrW_Gwxjg+qvkZMu7ED-QOXrMMO6B-G0HY2-A@mail.gmail.com/
(Oh, look, a discussion that starts from a user suggestion of
exposing FMODE_NOCMTIME to userspace apps! Sound familiar?)
> > IOWs, with lazytime, writeback already persists timestamp updates
> > when appropriate for best performance.
>
> I'm probably doing a bad job explaining myself.
No, I think both Christoph and I both understand exactly what you
are trying to describe.
It seems to me that haven't yet understood that lazytime already
does exactly what you are asking for. Hence you think we don't
understand the "lazytime" concept you are proposing and keep trying
to reinvent lazytime to convince us that we need "lazytime"
functionalitying in the kernel...
> > > Thinking out loud, to handle both write_iter and mmap, there might
> > > need to be two bits: one saying "the timestamp needs to be updated"
> > > and another saying "the timestamp has been updated in the in-memory
> > > inode, but the inode hasn't been dirtied yet".
> >
> > The flag that implements the latter is called I_DIRTY_TIME. We have
> > not implemented the former as that's a userspace visible change of
> > behaviour.
>
> Maybe that change should be done? Or not -- it wouldn't be terribly
> hard to have a pair of atomic timestamps in struct inode indicating
> what timestamps we want to write the next time we get around to it.
See, you just reinvented the lazytime mechanism. Again. :/
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply
* Re: [PATCH v6 1/5] Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
From: Casey Schaufler @ 2025-10-10 21:13 UTC (permalink / raw)
To: Song Liu, Maxime Bélair
Cc: linux-security-module, john.johansen, paul, jmorris, serge, mic,
kees, stephen.smalley.work, takedakn, penguin-kernel, rdunlap,
linux-api, apparmor, linux-kernel, Casey Schaufler
In-Reply-To: <CAHzjS_uBq8xGCSmHC_kBWi0j8DCdwsy4XtfkH2iH6NygCcChNw@mail.gmail.com>
On 10/10/2025 11:06 AM, Song Liu wrote:
> On Fri, Oct 10, 2025 at 6:27 AM Maxime Bélair
> <maxime.belair@canonical.com> wrote:
> [...]
>> --- a/security/lsm_syscalls.c
>> +++ b/security/lsm_syscalls.c
>> @@ -118,3 +118,15 @@ SYSCALL_DEFINE3(lsm_list_modules, u64 __user *, ids, u32 __user *, size,
>>
>> return lsm_active_cnt;
>> }
>> +
>> +SYSCALL_DEFINE6(lsm_config_self_policy, u32, lsm_id, u32, op, void __user *,
>> + buf, u32 __user, size, u32, common_flags, u32, flags)
>> +{
>> + return 0;
>> +}
>> +
>> +SYSCALL_DEFINE6(lsm_config_system_policy, u32, lsm_id, u32, op, void __user *,
>> + buf, u32 __user, size, u32, common_flags, u32, flags)
>> +{
>> + return 0;
>> +}
> These two APIs look the same. Why not just keep one API and use
> one bit in the flag to differentiate "self" vs. "system"?
I think that's a valid point.
>
> Thanks,
> Song
>
^ permalink raw reply
* Re: [PATCH v2 2/3] initrd: remove deprecated code path (linuxrc)
From: Randy Dunlap @ 2025-10-10 19:31 UTC (permalink / raw)
To: Askar Safin, linux-fsdevel, linux-kernel
Cc: Linus Torvalds, Greg Kroah-Hartman, Christian Brauner, Al Viro,
Jan Kara, Christoph Hellwig, Jens Axboe, Andy Shevchenko,
Aleksa Sarai, Thomas Weißschuh, Julian Stecklina, Gao Xiang,
Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251010094047.3111495-3-safinaskar@gmail.com>
Hi,
On 10/10/25 2:40 AM, Askar Safin wrote:
> Remove linuxrc initrd code path, which was deprecated in 2020.
>
> Initramfs and (non-initial) RAM disks (i. e. brd) still work.
>
> Both built-in and bootloader-supplied initramfs still work.
>
> Non-linuxrc initrd code path (i. e. using /dev/ram as final root
> filesystem) still works, but I put deprecation message into it
>
> Signed-off-by: Askar Safin <safinaskar@gmail.com>
> ---
> .../admin-guide/kernel-parameters.txt | 4 +-
> fs/init.c | 14 ---
> include/linux/init_syscalls.h | 1 -
> include/linux/initrd.h | 2 -
> init/do_mounts.c | 4 +-
> init/do_mounts.h | 18 +---
> init/do_mounts_initrd.c | 85 ++-----------------
> init/do_mounts_rd.c | 17 +---
> 8 files changed, 17 insertions(+), 128 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 521ab3425504..24d8899d8a39 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4285,7 +4285,7 @@
> Note that this argument takes precedence over
> the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
>
> - noinitrd [RAM] Tells the kernel not to load any configured
> + noinitrd [Deprecated,RAM] Tells the kernel not to load any configured
> initial RAM disk.
>
> nointremap [X86-64,Intel-IOMMU,EARLY] Do not enable interrupt
> @@ -5299,7 +5299,7 @@
> ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
> See Documentation/admin-guide/blockdev/ramdisk.rst.
>
> - ramdisk_start= [RAM] RAM disk image start address
> + ramdisk_start= [Deprecated,RAM] RAM disk image start address
>
> random.trust_cpu=off
> [KNL,EARLY] Disable trusting the use of the CPU's
There are more places in Documentation/ that refer to "linuxrc".
Should those also be removed or fixed?
accounting/delay-accounting.rst
admin-guide/initrd.rst
driver-api/early-userspace/early_userspace_support.rst
power/swsusp-dmcrypt.rst
translations/zh_CN/accounting/delay-accounting.rst
Thanks.
^ permalink raw reply
* Re: [PATCH v6 1/5] Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
From: Song Liu @ 2025-10-10 18:06 UTC (permalink / raw)
To: Maxime Bélair
Cc: linux-security-module, john.johansen, paul, jmorris, serge, mic,
kees, stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel
In-Reply-To: <20251010132610.12001-2-maxime.belair@canonical.com>
On Fri, Oct 10, 2025 at 6:27 AM Maxime Bélair
<maxime.belair@canonical.com> wrote:
[...]
> --- a/security/lsm_syscalls.c
> +++ b/security/lsm_syscalls.c
> @@ -118,3 +118,15 @@ SYSCALL_DEFINE3(lsm_list_modules, u64 __user *, ids, u32 __user *, size,
>
> return lsm_active_cnt;
> }
> +
> +SYSCALL_DEFINE6(lsm_config_self_policy, u32, lsm_id, u32, op, void __user *,
> + buf, u32 __user, size, u32, common_flags, u32, flags)
> +{
> + return 0;
> +}
> +
> +SYSCALL_DEFINE6(lsm_config_system_policy, u32, lsm_id, u32, op, void __user *,
> + buf, u32 __user, size, u32, common_flags, u32, flags)
> +{
> + return 0;
> +}
These two APIs look the same. Why not just keep one API and use
one bit in the flag to differentiate "self" vs. "system"?
Thanks,
Song
^ permalink raw reply
* Re: [PATCH] fs: Propagate FMODE_NOCMTIME flag to user-facing O_NOCMTIME
From: Andy Lutomirski @ 2025-10-10 17:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Pavel Emelyanov, linux-fsdevel, Raphael S . Carvalho, linux-api,
linux-xfs
In-Reply-To: <aOiZX9iqZnf9jUdQ@infradead.org>
On Thu, Oct 9, 2025 at 10:27 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Oct 08, 2025 at 08:22:35AM -0700, Andy Lutomirski wrote:
> > On Mon, Oct 6, 2025 at 10:08 PM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Sat, Oct 04, 2025 at 09:08:05AM -0700, Andy Lutomirski wrote:
> > > > > Well, we'll need to look into that, including maybe non-blockin
> > > > > timestamp updates.
> > > > >
> > > >
> > > > It's been 12 years (!), but maybe it's time to reconsider this:
> > > >
> > > > https://lore.kernel.org/all/cover.1377193658.git.luto@amacapital.net/
> > >
> > > I don't see how that is relevant here. Also writes through shared
> > > mmaps are problematic for so many reasons that I'm not sure we want
> > > to encourage people to use that more.
> > >
> >
> > Because the same exact issue exists in the normal non-mmap write path,
> > and I can even quote you upthread :)
>
> The thread that started this is about io_uring nonblock writes, aka
> O_DIRECT. So there isn't any writeback to defer to.
I haven't followed all the internal details, but RWF_DONTCACHE is
looking pretty good these days, and it does go through the writeback
path. I wonder if it's getting good enough that most or all O_DIRECT
users could switch to using it.
--Andy
^ permalink raw reply
* Re: [PATCH v6 5/5] Smack: add support for lsm_config_self_policy and lsm_config_system_policy
From: Casey Schaufler @ 2025-10-10 15:15 UTC (permalink / raw)
To: Maxime Bélair, linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, takedakn, penguin-kernel, song, rdunlap,
linux-api, apparmor, linux-kernel, Casey Schaufler
In-Reply-To: <20251010132610.12001-6-maxime.belair@canonical.com>
On 10/10/2025 6:25 AM, Maxime Bélair wrote:
> Enable users to manage Smack policies through the new hooks
> lsm_config_self_policy and lsm_config_system_policy.
>
> lsm_config_self_policy allows adding Smack policies for the current cred.
> For now it remains restricted to CAP_MAC_ADMIN.
>
> lsm_config_system_policy allows adding globabl Smack policies. This is
> restricted to CAP_MAC_ADMIN.
>
> Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
I will be reviewing these patches, but will not be able to do so
until early November. I know how frustrating review delays can be,
but it really can't be helped this time around. Thank you for your
patience.
> ---
> security/smack/smack.h | 8 +++++
> security/smack/smack_lsm.c | 73 ++++++++++++++++++++++++++++++++++++++
> security/smack/smackfs.c | 2 +-
> 3 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/security/smack/smack.h b/security/smack/smack.h
> index bf6a6ed3946c..3e3d30dfdcf7 100644
> --- a/security/smack/smack.h
> +++ b/security/smack/smack.h
> @@ -275,6 +275,14 @@ struct smk_audit_info {
> #endif
> };
>
> +/*
> + * This function is in smackfs.c
> + */
> +ssize_t smk_write_rules_list(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos,
> + struct list_head *rule_list,
> + struct mutex *rule_lock, int format);
> +
> /*
> * These functions are in smack_access.c
> */
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 99833168604e..bf4bb2242768 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -5027,6 +5027,76 @@ static int smack_uring_cmd(struct io_uring_cmd *ioucmd)
>
> #endif /* CONFIG_IO_URING */
>
> +/**
> + * smack_lsm_config_system_policy - Configure a system smack policy
> + * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
> + * @buf: User-supplied buffer in the form "<fmt><policy>"
> + * <fmt> is the 1-byte format of <policy>
> + * <policy> is the policy to load
> + * @size: size of @buf
> + * @flags: reserved for future use; must be zero
> + *
> + * Returns: number of written rules on success, negative value on error
> + */
> +static int smack_lsm_config_system_policy(u32 op, void __user *buf, size_t size,
> + u32 flags)
> +{
> + loff_t pos = 0;
> + u8 fmt;
> +
> + if (op != LSM_POLICY_LOAD || flags)
> + return -EOPNOTSUPP;
> +
> + if (size < 2)
> + return -EINVAL;
> +
> + if (get_user(fmt, (uint8_t *)buf))
> + return -EFAULT;
> +
> + return smk_write_rules_list(NULL, buf + 1, size - 1, &pos, NULL, NULL, fmt);
> +}
> +
> +/**
> + * smack_lsm_config_self_policy - Configure a smack policy for the current cred
> + * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
> + * @buf: User-supplied buffer in the form "<fmt><policy>"
> + * <fmt> is the 1-byte format of <policy>
> + * <policy> is the policy to load
> + * @size: size of @buf
> + * @flags: reserved for future use; must be zero
> + *
> + * Returns: number of written rules on success, negative value on error
> + */
> +static int smack_lsm_config_self_policy(u32 op, void __user *buf, size_t size,
> + u32 flags)
> +{
> + loff_t pos = 0;
> + u8 fmt;
> + struct task_smack *tsp;
> +
> + if (op != LSM_POLICY_LOAD || flags)
> + return -EOPNOTSUPP;
> +
> + if (size < 2)
> + return -EINVAL;
> +
> + if (get_user(fmt, (uint8_t *)buf))
> + return -EFAULT;
> + /**
> + * smk_write_rules_list could be used to gain privileges.
> + * This function is thus restricted to CAP_MAC_ADMIN.
> + * TODO: Ensure that the new rule does not give extra privileges
> + * before dropping this CAP_MAC_ADMIN check.
> + */
> + if (!capable(CAP_MAC_ADMIN))
> + return -EPERM;
> +
> +
> + tsp = smack_cred(current_cred());
> + return smk_write_rules_list(NULL, buf + 1, size - 1, &pos, &tsp->smk_rules,
> + &tsp->smk_rules_lock, fmt);
> +}
> +
> struct lsm_blob_sizes smack_blob_sizes __ro_after_init = {
> .lbs_cred = sizeof(struct task_smack),
> .lbs_file = sizeof(struct smack_known *),
> @@ -5203,6 +5273,9 @@ static struct security_hook_list smack_hooks[] __ro_after_init = {
> LSM_HOOK_INIT(uring_sqpoll, smack_uring_sqpoll),
> LSM_HOOK_INIT(uring_cmd, smack_uring_cmd),
> #endif
> + LSM_HOOK_INIT(lsm_config_self_policy, smack_lsm_config_self_policy),
> + LSM_HOOK_INIT(lsm_config_system_policy, smack_lsm_config_system_policy),
> +
> };
>
>
> diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
> index 90a67e410808..ed1814588d56 100644
> --- a/security/smack/smackfs.c
> +++ b/security/smack/smackfs.c
> @@ -441,7 +441,7 @@ static ssize_t smk_parse_long_rule(char *data, struct smack_parsed_rule *rule,
> * "subject<whitespace>object<whitespace>
> * acc_enable<whitespace>acc_disable[<whitespace>...]"
> */
> -static ssize_t smk_write_rules_list(struct file *file, const char __user *buf,
> +ssize_t smk_write_rules_list(struct file *file, const char __user *buf,
> size_t count, loff_t *ppos,
> struct list_head *rule_list,
> struct mutex *rule_lock, int format)
^ permalink raw reply
* Re: [PATCH v2 2/3] initrd: remove deprecated code path (linuxrc)
From: Andy Shevchenko @ 2025-10-10 15:04 UTC (permalink / raw)
To: Askar Safin
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Greg Kroah-Hartman,
Christian Brauner, Al Viro, Jan Kara, Christoph Hellwig,
Jens Axboe, Aleksa Sarai, Thomas Weißschuh, Julian Stecklina,
Gao Xiang, Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251010094047.3111495-3-safinaskar@gmail.com>
On Fri, Oct 10, 2025 at 12:42 PM Askar Safin <safinaskar@gmail.com> wrote:
>
> Remove linuxrc initrd code path, which was deprecated in 2020.
>
> Initramfs and (non-initial) RAM disks (i. e. brd) still work.
>
> Both built-in and bootloader-supplied initramfs still work.
>
> Non-linuxrc initrd code path (i. e. using /dev/ram as final root
> filesystem) still works, but I put deprecation message into it
...
> - noinitrd [RAM] Tells the kernel not to load any configured
> + noinitrd [Deprecated,RAM] Tells the kernel not to load any configured
> initial RAM disk.
How one is supposed to run this when just having a kernel is enough?
At least (ex)colleague of mine was a heavy user of this option for
testing kernel builds on the real HW.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v2 1/3] init: remove deprecated "load_ramdisk" and "prompt_ramdisk" command line parameters
From: Andy Shevchenko @ 2025-10-10 15:02 UTC (permalink / raw)
To: Askar Safin
Cc: linux-fsdevel, linux-kernel, Linus Torvalds, Greg Kroah-Hartman,
Christian Brauner, Al Viro, Jan Kara, Christoph Hellwig,
Jens Axboe, Aleksa Sarai, Thomas Weißschuh, Julian Stecklina,
Gao Xiang, Art Nikpal, Andrew Morton, Alexander Graf, Rob Landley,
Lennart Poettering, linux-arch, linux-block, initramfs, linux-api,
linux-doc, Michal Simek, Luis Chamberlain, Kees Cook,
Thorsten Blum, Heiko Carstens, Arnd Bergmann, Dave Young,
Christophe Leroy, Krzysztof Kozlowski, Borislav Petkov,
Jessica Clarke, Nicolas Schichan, David Disseldorp, patches
In-Reply-To: <20251010094047.3111495-2-safinaskar@gmail.com>
On Fri, Oct 10, 2025 at 12:42 PM Askar Safin <safinaskar@gmail.com> wrote:
>
> ...which do nothing. They were deprecated (in documentation) in
> 6b99e6e6aa62 ("Documentation/admin-guide: blockdev/ramdisk: remove use of
> "rdev"") and in kernel messages in c8376994c86c ("initrd: remove support
> for multiple floppies")
With all the respect to the work and the series I have noted this:
1) often the last period is missing in the commit messages;
2) in this change it's unclear for how long (years) the feature was
deprecated, i.e. the other patch states that 2020 for something else.
I wonder if this one has the similar order of oldness.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v4 00/30] Live Update Orchestrator
From: Jason Gunthorpe @ 2025-10-10 15:02 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, pratyush, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
chrisl, steven.sistare
In-Reply-To: <CA+CK2bBxMpb=jXy3-i19PdBHqxLoLrMMg1sOnditOYwNe1Fr+w@mail.gmail.com>
On Fri, Oct 10, 2025 at 10:58:00AM -0400, Pasha Tatashin wrote:
> With that, I would assume KVM itself would drive the live update and
> would make LUO calls to preserve the resources in an orderly fashion
> and then restore them in the same order during boot.
I don't think so, it should always be sequenced by userspace, and KVM
is not the thing linked to VFIO or IOMMUFD, that's backwards.
Jason
^ permalink raw reply
* Re: [PATCH v4 00/30] Live Update Orchestrator
From: Jason Gunthorpe @ 2025-10-10 15:01 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, jasonmiu, graf, changyuanl, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
saeedm, ajayachandra, parav, leonro, witu, hughd, skhawaja,
chrisl, steven.sistare
In-Reply-To: <CA+CK2bB6F634HCw_N5z9E5r_LpbGJrucuFb_5fL4da5_W99e4Q@mail.gmail.com>
On Thu, Oct 09, 2025 at 07:50:12PM -0400, Pasha Tatashin wrote:
> > This can look something like:
> >
> > hugetlb_luo_preserve_folio(folio, ...);
> >
> > Nice and simple.
> >
> > Compare this with the new proposed API:
> >
> > liveupdate_fh_global_state_get(h, &hugetlb_data);
> > // This will have update serialized state now.
> > hugetlb_luo_preserve_folio(hugetlb_data, folio, ...);
> > liveupdate_fh_global_state_put(h);
> >
> > We do the same thing but in a very complicated way.
> >
> > - When the system-wide preserve happens, the hugetlb subsystem gets a
> > callback to serialize. It converts its runtime global state to
> > serialized state since now it knows no more FDs will be added.
> >
> > With the new API, this doesn't need to be done since each FD prepare
> > already updates serialized state.
> >
> > - If there are no hugetlb FDs, then the hugetlb subsystem doesn't put
> > anything in LUO. This is same as new API.
> >
> > - If some hugetlb FDs are not restored after liveupdate and the finish
> > event is triggered, the subsystem gets its finish() handler called and
> > it can free things up.
> >
> > I don't get how that would work with the new API.
>
> The new API isn't more complicated; It codifies the common pattern of
> "create on first use, destroy on last use" into a reusable helper,
> saving each file handler from having to reinvent the same reference
> counting and locking scheme. But, as you point out, subsystems provide
> more control, specifically they handle full creation/free instead of
> relying on file-handlers for that.
I'd say hugetlb *should* be doing the more complicated thing. We
should not have global static data for luo floating around the kernel,
this is too easily abused in bad ways.
The above "complicated" sequence forces the caller to have a fd
session handle, and "hides" the global state inside luo so the
subsystem can't just randomly reach into it whenever it likes.
This is a deliberate and violent way to force clean coding practices
and good layering.
Not sure why hugetlb pools would need another xarray??
1) Use a vmalloc and store a list of the PFNs in the pool. Pool becomes
frozen, can't add/remove PFNs.
2) Require the users of hugetlb memory, like memfd, to
preserve/restore the folios they are using (using their hugetlb order)
3) Just before kexec run over the PFN list and mark a bit if the folio
was preserved by KHO or not. Make sure everything gets KHO
preserved.
Restore puts the PFNs that were not preserved directly in the free
pool, the end user of the folio like the memfd restores and eventually
normally frees the other folios.
It is simple and fits nicely into the infrastructure here, where the
first time you trigger a global state it does the pfn list and
freezing, and the lifecycle and locking for this operation is directly
managed by luo.
The memfd, when it knows it has hugetlb folios inside it, would
trigger this.
Jason
^ permalink raw reply
* Re: [PATCH v5 2/3] lsm: introduce security_lsm_config_*_policy hooks
From: Paul Moore @ 2025-10-10 14:59 UTC (permalink / raw)
To: Casey Schaufler
Cc: Mickaël Salaün, Maxime Bélair,
linux-security-module, john.johansen, jmorris, serge, kees,
stephen.smalley.work, takedakn, penguin-kernel, song, rdunlap,
linux-api, apparmor, linux-kernel
In-Reply-To: <0c7a19cb-d270-403f-9f97-354405aba746@schaufler-ca.com>
On Wed, Aug 20, 2025 at 11:30 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 8/20/2025 7:21 AM, Mickaël Salaün wrote:
> > On Wed, Jul 09, 2025 at 10:00:55AM +0200, Maxime Bélair wrote:
> >> Define two new LSM hooks: security_lsm_config_self_policy and
> >> security_lsm_config_system_policy and wire them into the corresponding
> >> lsm_config_*_policy() syscalls so that LSMs can register a unified
> >> interface for policy management. This initial, minimal implementation
> >> only supports the LSM_POLICY_LOAD operation to limit changes.
> >>
> >> Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
> >> ---
> >> include/linux/lsm_hook_defs.h | 4 +++
> >> include/linux/security.h | 20 ++++++++++++
> >> include/uapi/linux/lsm.h | 8 +++++
> >> security/lsm_syscalls.c | 17 ++++++++--
> >> security/security.c | 60 +++++++++++++++++++++++++++++++++++
> >> 5 files changed, 107 insertions(+), 2 deletions(-)
...
> >> diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
> >> index 938593dfd5da..2b9432a30cdc 100644
> >> --- a/include/uapi/linux/lsm.h
> >> +++ b/include/uapi/linux/lsm.h
> >> @@ -90,4 +90,12 @@ struct lsm_ctx {
> >> */
> >> #define LSM_FLAG_SINGLE 0x0001
> >>
> >> +/*
> >> + * LSM_POLICY_XXX definitions identify the different operations
> >> + * to configure LSM policies
> >> + */
> >> +
> >> +#define LSM_POLICY_UNDEF 0
> >> +#define LSM_POLICY_LOAD 100
> > Why the gap between 0 and 100?
>
> It's conventional in LSM syscalls to start identifiers at 100.
> No compelling reason other than to appease the LSM maintainer.
If you guys make me repeat all the reasons why, I'm going to get even
crankier than usual :-P
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v4 00/30] Live Update Orchestrator
From: Pasha Tatashin @ 2025-10-10 14:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Samiullah Khawaja, pratyush, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
chrisl, steven.sistare
In-Reply-To: <20251010144248.GB3901471@nvidia.com>
On Fri, Oct 10, 2025 at 10:42 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Oct 09, 2025 at 06:42:09PM -0400, Pasha Tatashin wrote:
> >
> > It looks like the combination of an enforced ordering:
> > Preservation: A->B->C->D
> > Un-preservation: D->C->B->A
> > Retrieval: A->B->C->D
> >
> > and the FLB Global State (where data is automatically created and
> > destroyed when a particular file type participates in a live update)
> > solves the need for this query mechanism. For example, the IOMMU
> > driver/core can add its data only when an iommufd is preserved and add
> > more data as more iommufds are added. The preserved data is also
> > automatically removed once the live update is finished or canceled.
>
> IDK I think we should try to be flexible on the restoration order.
It is easier to be inflexible at first and then relax the requirement
than the other way around. I think it is alright to enforce the order
for now, as it is driven only by userspace.
> Eg, if we project ahead to when we might need to preserve kvm and
> iommufd FDs as well, the order would likely be:
>
> Preservation: memfd -> kvm -> iommufd -> vfio
> Retrieval: iommud_domain (early boot) kvm -> iommufd -> vfio -> memfd
At some point, we will implement orphaned VMs, where a VM can run
without a VMM during the live-update period. This would allow us to
reduce the blackout time and later enable vCPUs to keep running even
during kexec.
With that, I would assume KVM itself would drive the live update and
would make LUO calls to preserve the resources in an orderly fashion
and then restore them in the same order during boot.
Pasha
^ permalink raw reply
* Re: [PATCH v6 4/5] SELinux: add support for lsm_config_system_policy
From: Paul Moore @ 2025-10-10 14:57 UTC (permalink / raw)
To: Stephen Smalley
Cc: Maxime Bélair, linux-security-module, john.johansen, jmorris,
serge, mic, kees, casey, takedakn, penguin-kernel, song, rdunlap,
linux-api, apparmor, linux-kernel, SElinux list, Ondrej Mosnacek
In-Reply-To: <CAEjxPJ6Xcwsic_zyLTPdHHaY9r7-ZTySzyELQ76aVZCFbh8FMQ@mail.gmail.com>
On Fri, Oct 10, 2025 at 9:59 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> 2. The SELinux namespaces support [1], [2] is based on instantiating a
> separate selinuxfs instance for each namespace; you load a policy for
> a namespace by mounting a new selinuxfs instance after unsharing your
> SELinux namespace and then write to its /sys/fs/selinux/load
> interface, only affecting policy for the new namespace. Your interface
> doesn't appear to support such an approach and IIUC will currently
> always load the init SELinux namespace's policy rather than the
> current process' SELinux namespace.
I'm distracted on other things at the moment, but my current thinking
is that while policy loading and namespace management APIs are largely
separate, there is some minor overlap when it comes to loading policy
as others have mentioned. For that reason, I think we need to resolve
the namespace API first, keeping in mind the potential for a policy
load API, and then implement the policy loading API, if desired.
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v4 00/30] Live Update Orchestrator
From: Jason Gunthorpe @ 2025-10-10 14:42 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, pratyush, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
chrisl, steven.sistare
In-Reply-To: <CA+CK2bAe3yk4NocURmihcuTNPUcb2-K0JCaQQ5GJ4B58YLEwEw@mail.gmail.com>
On Thu, Oct 09, 2025 at 06:42:09PM -0400, Pasha Tatashin wrote:
>
> It looks like the combination of an enforced ordering:
> Preservation: A->B->C->D
> Un-preservation: D->C->B->A
> Retrieval: A->B->C->D
>
> and the FLB Global State (where data is automatically created and
> destroyed when a particular file type participates in a live update)
> solves the need for this query mechanism. For example, the IOMMU
> driver/core can add its data only when an iommufd is preserved and add
> more data as more iommufds are added. The preserved data is also
> automatically removed once the live update is finished or canceled.
IDK I think we should try to be flexible on the restoration order.
Eg, if we project ahead to when we might need to preserve kvm and
iommufd FDs as well, the order would likely be:
Preservation: memfd -> kvm -> iommufd -> vfio
Retrieval: iommud_domain (early boot) kvm -> iommufd -> vfio -> memfd
Just because of how the dependencies work, and the desire to push the
memfd as late as possible.
I don't see an issue with this, the kernel enforcing the ordering
should fall out naturally based on the sanity checks each step will
do.
ie I can't get back the KVM fd if luo says it is out of order.
Jason
^ permalink raw reply
* Re: [PATCH v6 4/5] SELinux: add support for lsm_config_system_policy
From: Stephen Smalley @ 2025-10-10 14:42 UTC (permalink / raw)
To: Maxime Bélair
Cc: linux-security-module, john.johansen, paul, jmorris, serge, mic,
kees, casey, takedakn, penguin-kernel, song, rdunlap, linux-api,
apparmor, linux-kernel, SElinux list, Ondrej Mosnacek
In-Reply-To: <CAEjxPJ6Xcwsic_zyLTPdHHaY9r7-ZTySzyELQ76aVZCFbh8FMQ@mail.gmail.com>
On Fri, Oct 10, 2025 at 9:58 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> On Fri, Oct 10, 2025 at 9:27 AM Maxime Bélair
> <maxime.belair@canonical.com> wrote:
> >
> > Enable users to manage SELinux policies through the new hook
> > lsm_config_system_policy. This feature is restricted to CAP_MAC_ADMIN.
>
> (added selinux mailing list and Fedora/Red Hat SELinux kernel maintainer to cc)
>
> A couple of observations:
> 1. We do not currently require CAP_MAC_ADMIN for loading SELinux
> policy, since it was only added later for Smack and SELinux implements
> its own permission checks. When loading policy via selinuxfs, one
> requires uid-0 or CAP_DAC_OVERRIDE to write to /sys/fs/selinux/load
> plus the corresponding SELinux permissions, but this is just an
> artifact of the filesystem-based interface. I'm not opposed to using
> CAP_MAC_ADMIN for loading policy via the new system call but wanted to
> note it as a difference.
>
> 2. The SELinux namespaces support [1], [2] is based on instantiating a
> separate selinuxfs instance for each namespace; you load a policy for
> a namespace by mounting a new selinuxfs instance after unsharing your
> SELinux namespace and then write to its /sys/fs/selinux/load
> interface, only affecting policy for the new namespace. Your interface
> doesn't appear to support such an approach and IIUC will currently
> always load the init SELinux namespace's policy rather than the
> current process' SELinux namespace.
Actually, on second thought, checking CAP_MAC_ADMIN via capable() will
require the process to have that capability in the global/init
namespace, which IIUC would prevent systemd running in a non-init user
namespace from loading the SELinux policy at all. That's problematic
for a different reason since it would prevent us from using this
interface for loading the namespace policy using this system call.
>
> [1] https://github.com/stephensmalley/selinuxns
> [2] https://lore.kernel.org/selinux/20250814132637.1659-1-stephen.smalley.work@gmail.com/
>
> >
> > Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
> > ---
> > security/selinux/hooks.c | 27 +++++++++++++++++++++++++++
> > security/selinux/include/security.h | 7 +++++++
> > security/selinux/selinuxfs.c | 16 ++++++++++++----
> > 3 files changed, 46 insertions(+), 4 deletions(-)
> >
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index e7a7dcab81db..3d14d4e47937 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -7196,6 +7196,31 @@ static int selinux_uring_allowed(void)
> > }
> > #endif /* CONFIG_IO_URING */
> >
> > +/**
> > + * selinux_lsm_config_system_policy - Manage a LSM policy
> > + * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
> > + * @buf: User-supplied buffer
> > + * @size: size of @buf
> > + * @flags: reserved for future use; must be zero
> > + *
> > + * Returns: number of written rules on success, negative value on error
> > + */
> > +static int selinux_lsm_config_system_policy(u32 op, void __user *buf,
> > + size_t size, u32 flags)
> > +{
> > + loff_t pos = 0;
> > +
> > + if (op != LSM_POLICY_LOAD || flags)
> > + return -EOPNOTSUPP;
> > +
> > + if (!selinux_null.dentry || !selinux_null.dentry->d_sb ||
> > + !selinux_null.dentry->d_sb->s_fs_info)
> > + return -ENODEV;
> > +
> > + return __sel_write_load(selinux_null.dentry->d_sb->s_fs_info, buf, size,
> > + &pos);
> > +}
> > +
> > static const struct lsm_id selinux_lsmid = {
> > .name = "selinux",
> > .id = LSM_ID_SELINUX,
> > @@ -7499,6 +7524,8 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
> > #ifdef CONFIG_PERF_EVENTS
> > LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
> > #endif
> > + LSM_HOOK_INIT(lsm_config_system_policy, selinux_lsm_config_system_policy),
> > +
> > };
> >
> > static __init int selinux_init(void)
> > diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
> > index e7827ed7be5f..7b779ea43cc3 100644
> > --- a/security/selinux/include/security.h
> > +++ b/security/selinux/include/security.h
> > @@ -389,7 +389,14 @@ struct selinux_kernel_status {
> > extern void selinux_status_update_setenforce(bool enforcing);
> > extern void selinux_status_update_policyload(u32 seqno);
> > extern void selinux_complete_init(void);
> > +
> > +struct selinux_fs_info;
> > +
> > extern struct path selinux_null;
> > +extern ssize_t __sel_write_load(struct selinux_fs_info *fsi,
> > + const char __user *buf, size_t count,
> > + loff_t *ppos);
> > +
> > extern void selnl_notify_setenforce(int val);
> > extern void selnl_notify_policyload(u32 seqno);
> > extern int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm);
> > diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
> > index 47480eb2189b..1f7e611d8300 100644
> > --- a/security/selinux/selinuxfs.c
> > +++ b/security/selinux/selinuxfs.c
> > @@ -567,11 +567,11 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
> > return ret;
> > }
> >
> > -static ssize_t sel_write_load(struct file *file, const char __user *buf,
> > - size_t count, loff_t *ppos)
> > +ssize_t __sel_write_load(struct selinux_fs_info *fsi,
> > + const char __user *buf, size_t count,
> > + loff_t *ppos)
> >
> > {
> > - struct selinux_fs_info *fsi;
> > struct selinux_load_state load_state;
> > ssize_t length;
> > void *data = NULL;
> > @@ -605,7 +605,6 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
> > pr_warn_ratelimited("SELinux: failed to load policy\n");
> > goto out;
> > }
> > - fsi = file_inode(file)->i_sb->s_fs_info;
> > length = sel_make_policy_nodes(fsi, load_state.policy);
> > if (length) {
> > pr_warn_ratelimited("SELinux: failed to initialize selinuxfs\n");
> > @@ -626,6 +625,15 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
> > return length;
> > }
> >
> > +static ssize_t sel_write_load(struct file *file, const char __user *buf,
> > + size_t count, loff_t *ppos)
> > +{
> > + struct selinux_fs_info *fsi = file_inode(file)->i_sb->s_fs_info;
> > +
> > + return __sel_write_load(fsi, buf, count, ppos);
> > +}
> > +
> > +
> > static const struct file_operations sel_load_ops = {
> > .write = sel_write_load,
> > .llseek = generic_file_llseek,
> > --
> > 2.48.1
> >
^ permalink raw reply
* Re: [PATCH v4 00/30] Live Update Orchestrator
From: Jason Gunthorpe @ 2025-10-10 14:35 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Samiullah Khawaja, pratyush, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu, hughd,
chrisl, steven.sistare
In-Reply-To: <CA+CK2bBtrkdos6YmCatggS19rwWYBXXDLwiUWmUrs2+ye23cXA@mail.gmail.com>
On Thu, Oct 09, 2025 at 02:37:44PM -0400, Pasha Tatashin wrote:
> On Thu, Oct 9, 2025 at 1:39 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Thu, Oct 09, 2025 at 11:01:25AM -0400, Pasha Tatashin wrote:
> > > In this case we can enforce strict
> > > ordering during retrieval. If "struct file" can be retrieved by
> > > anything within the kernel, then that could be any kernel process
> > > during boot, meaning that charging is not going to be properly applied
> > > when kernel allocations are performed.
> >
> > Ugh, yeah, OK that's irritating and might burn us, but we did decide
> > on that strategy.
> >
> > > > I would argue it should always cause a preservation...
> > > >
> > > > But this is still backwards, what we need is something like
> > > >
> > > > liveupdate_preserve_file(session, file, &token);
> > > > my_preserve_blob.file_token = token
> > >
> > > We cannot do that, the user should have already preserved that file
> > > and provided us with a token to use, if that file was not preserved by
> > > the user it is a bug. With this proposal, we would have to generate a
> > > token, and it was argued that the kernel should not do that.
> >
> > The token is the label used as ABI across the kexec. Each entity doing
> > a serialization can operate it's labels however it needs.
> >
> > Here I am suggeting that when a kernel entity goes to record a struct
> > file in a kernel ABI structure it can get a kernel generated token for
> > it.
>
> Sure, we can consider allowing the kernel to preserve dependent FDs
> automatically in the future, but is there a compelling use case that
> requires it right now?
Right now for the three prototype series.. Hmm, yes, I think we can
avoid implementing this.
In the future I suspect iommufd will need to restore the KVM fd since
stuff in the KVM sometimes becomes entangled with the iommu in some
cases on some arches.
The issue here is not order, it is straight up 'what value does
iommufd write to it's kexec ABI struct to refer to the KVM fd'.
Jason
^ permalink raw reply
* Re: [PATCH v6 4/5] SELinux: add support for lsm_config_system_policy
From: Stephen Smalley @ 2025-10-10 13:58 UTC (permalink / raw)
To: Maxime Bélair
Cc: linux-security-module, john.johansen, paul, jmorris, serge, mic,
kees, casey, takedakn, penguin-kernel, song, rdunlap, linux-api,
apparmor, linux-kernel, SElinux list, Ondrej Mosnacek
In-Reply-To: <20251010132610.12001-5-maxime.belair@canonical.com>
On Fri, Oct 10, 2025 at 9:27 AM Maxime Bélair
<maxime.belair@canonical.com> wrote:
>
> Enable users to manage SELinux policies through the new hook
> lsm_config_system_policy. This feature is restricted to CAP_MAC_ADMIN.
(added selinux mailing list and Fedora/Red Hat SELinux kernel maintainer to cc)
A couple of observations:
1. We do not currently require CAP_MAC_ADMIN for loading SELinux
policy, since it was only added later for Smack and SELinux implements
its own permission checks. When loading policy via selinuxfs, one
requires uid-0 or CAP_DAC_OVERRIDE to write to /sys/fs/selinux/load
plus the corresponding SELinux permissions, but this is just an
artifact of the filesystem-based interface. I'm not opposed to using
CAP_MAC_ADMIN for loading policy via the new system call but wanted to
note it as a difference.
2. The SELinux namespaces support [1], [2] is based on instantiating a
separate selinuxfs instance for each namespace; you load a policy for
a namespace by mounting a new selinuxfs instance after unsharing your
SELinux namespace and then write to its /sys/fs/selinux/load
interface, only affecting policy for the new namespace. Your interface
doesn't appear to support such an approach and IIUC will currently
always load the init SELinux namespace's policy rather than the
current process' SELinux namespace.
[1] https://github.com/stephensmalley/selinuxns
[2] https://lore.kernel.org/selinux/20250814132637.1659-1-stephen.smalley.work@gmail.com/
>
> Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
> ---
> security/selinux/hooks.c | 27 +++++++++++++++++++++++++++
> security/selinux/include/security.h | 7 +++++++
> security/selinux/selinuxfs.c | 16 ++++++++++++----
> 3 files changed, 46 insertions(+), 4 deletions(-)
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index e7a7dcab81db..3d14d4e47937 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -7196,6 +7196,31 @@ static int selinux_uring_allowed(void)
> }
> #endif /* CONFIG_IO_URING */
>
> +/**
> + * selinux_lsm_config_system_policy - Manage a LSM policy
> + * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
> + * @buf: User-supplied buffer
> + * @size: size of @buf
> + * @flags: reserved for future use; must be zero
> + *
> + * Returns: number of written rules on success, negative value on error
> + */
> +static int selinux_lsm_config_system_policy(u32 op, void __user *buf,
> + size_t size, u32 flags)
> +{
> + loff_t pos = 0;
> +
> + if (op != LSM_POLICY_LOAD || flags)
> + return -EOPNOTSUPP;
> +
> + if (!selinux_null.dentry || !selinux_null.dentry->d_sb ||
> + !selinux_null.dentry->d_sb->s_fs_info)
> + return -ENODEV;
> +
> + return __sel_write_load(selinux_null.dentry->d_sb->s_fs_info, buf, size,
> + &pos);
> +}
> +
> static const struct lsm_id selinux_lsmid = {
> .name = "selinux",
> .id = LSM_ID_SELINUX,
> @@ -7499,6 +7524,8 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
> #ifdef CONFIG_PERF_EVENTS
> LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
> #endif
> + LSM_HOOK_INIT(lsm_config_system_policy, selinux_lsm_config_system_policy),
> +
> };
>
> static __init int selinux_init(void)
> diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
> index e7827ed7be5f..7b779ea43cc3 100644
> --- a/security/selinux/include/security.h
> +++ b/security/selinux/include/security.h
> @@ -389,7 +389,14 @@ struct selinux_kernel_status {
> extern void selinux_status_update_setenforce(bool enforcing);
> extern void selinux_status_update_policyload(u32 seqno);
> extern void selinux_complete_init(void);
> +
> +struct selinux_fs_info;
> +
> extern struct path selinux_null;
> +extern ssize_t __sel_write_load(struct selinux_fs_info *fsi,
> + const char __user *buf, size_t count,
> + loff_t *ppos);
> +
> extern void selnl_notify_setenforce(int val);
> extern void selnl_notify_policyload(u32 seqno);
> extern int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm);
> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
> index 47480eb2189b..1f7e611d8300 100644
> --- a/security/selinux/selinuxfs.c
> +++ b/security/selinux/selinuxfs.c
> @@ -567,11 +567,11 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
> return ret;
> }
>
> -static ssize_t sel_write_load(struct file *file, const char __user *buf,
> - size_t count, loff_t *ppos)
> +ssize_t __sel_write_load(struct selinux_fs_info *fsi,
> + const char __user *buf, size_t count,
> + loff_t *ppos)
>
> {
> - struct selinux_fs_info *fsi;
> struct selinux_load_state load_state;
> ssize_t length;
> void *data = NULL;
> @@ -605,7 +605,6 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
> pr_warn_ratelimited("SELinux: failed to load policy\n");
> goto out;
> }
> - fsi = file_inode(file)->i_sb->s_fs_info;
> length = sel_make_policy_nodes(fsi, load_state.policy);
> if (length) {
> pr_warn_ratelimited("SELinux: failed to initialize selinuxfs\n");
> @@ -626,6 +625,15 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
> return length;
> }
>
> +static ssize_t sel_write_load(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + struct selinux_fs_info *fsi = file_inode(file)->i_sb->s_fs_info;
> +
> + return __sel_write_load(fsi, buf, count, ppos);
> +}
> +
> +
> static const struct file_operations sel_load_ops = {
> .write = sel_write_load,
> .llseek = generic_file_llseek,
> --
> 2.48.1
>
^ permalink raw reply
* [PATCH v6 0/5] lsm: introduce lsm_config_self_policy() and lsm_config_system_policy() syscalls
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
This patchset introduces two new syscalls: lsm_config_self_policy(),
lsm_config_system_policy() and the associated Linux Security Module hooks
security_lsm_config_*_policy(), providing a unified interface for loading
and managing LSM policies. These syscalls complement the existing per‑LSM
pseudo‑filesystem mechanism and work even when those filesystems are not
mounted or available.
With these new syscalls, users and administrators may lock down access to
the pseudo‑filesystem yet still manage LSM policies. Two tightly-scoped
entry points then replace the many file operations exposed by those
filesystems, significantly reducing the attack surface. This is
particularly useful in containers or processes already confined by
Landlock, where these pseudo‑filesystems are typically unavailable.
Because they provide a logical and unified interface, these syscalls are
simpler to use than several heterogeneous pseudo‑filesystems and avoid
edge cases such as partially loaded policies. They also eliminates VFS
overhead, yielding performance gains notably when many policies are
loaded, for instance at boot time.
This initial implementation is intentionally minimal to limit the scope
of changes. Currently, only policy loading is supported. This new LSM
hook is currently registered by AppArmor, SELinux and Smack. However, any
LSM can adopt this interface, and future patches could extend this
syscall to support more operations, such as replacing, removing, or
querying loaded policies.
Landlock already provides three Landlock‑specific syscalls (e.g.
landlock_add_rule()) to restrict ambient rights for sets of processes
without touching any pseudo-filesystem. lsm_config_*_policy() generalizes
that approach to the entire LSM layer, so any module can choose to
support either or both of these syscalls, and expose its policy
operations through a uniform interface and reap the advantages outlined
above.
This patchset is available at [1], a minimal user space example
showing how to use lsm_config_system_policy with AppArmor is at [2] and a
performance benchmark of both syscalls is available at [3].
[1] https://github.com/emixam16/linux/tree/lsm_syscall_v6
[2] https://gitlab.com/emixam16/apparmor/tree/lsm_syscall_v6
[3] https://gitlab.com/-/snippets/4864908
---
Changes in v6
- Add support for SELinux and Smack
Changes in v5
- Improve syscall input verification
- Do not export security_lsm_config_*_policy symbols
Changes in v4
- Make the syscall's maximum buffer size defined per module
- Fix a memory leak
Changes in v3
- Fix typos
Changes in v2
- Split lsm_manage_policy() into two distinct syscalls:
lsm_config_self_policy() and lsm_config_system_policy()
- The LSM hook now calls only the appropriate LSM (and not all LSMs)
- Add a configuration variable to limit the buffer size of these
syscalls
- AppArmor now allows stacking policies through lsm_config_self_policy()
and loading policies in any namespace through
lsm_config_system_policy()
---
Maxime Bélair (5):
Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
lsm: introduce security_lsm_config_*_policy hooks
AppArmor: add support for lsm_config_self_policy and
lsm_config_system_policy
SELinux: add support for lsm_config_system_policy
Smack: add support for lsm_config_self_policy and
lsm_config_system_policy
arch/alpha/kernel/syscalls/syscall.tbl | 2 +
arch/arm/tools/syscall.tbl | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 2 +
arch/microblaze/kernel/syscalls/syscall.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +
arch/parisc/kernel/syscalls/syscall.tbl | 2 +
arch/powerpc/kernel/syscalls/syscall.tbl | 2 +
arch/s390/kernel/syscalls/syscall.tbl | 2 +
arch/sh/kernel/syscalls/syscall.tbl | 2 +
arch/sparc/kernel/syscalls/syscall.tbl | 2 +
arch/x86/entry/syscalls/syscall_32.tbl | 2 +
arch/x86/entry/syscalls/syscall_64.tbl | 2 +
arch/xtensa/kernel/syscalls/syscall.tbl | 2 +
include/linux/lsm_hook_defs.h | 4 +
include/linux/security.h | 20 +++++
include/linux/syscalls.h | 5 ++
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/lsm.h | 8 ++
kernel/sys_ni.c | 2 +
security/apparmor/apparmorfs.c | 31 +++++++
security/apparmor/include/apparmor.h | 4 +
security/apparmor/include/apparmorfs.h | 3 +
security/apparmor/lsm.c | 84 +++++++++++++++++++
security/lsm_syscalls.c | 21 +++++
security/security.c | 60 +++++++++++++
security/selinux/hooks.c | 27 ++++++
security/selinux/include/security.h | 7 ++
security/selinux/selinuxfs.c | 16 +++-
security/smack/smack.h | 8 ++
security/smack/smack_lsm.c | 73 ++++++++++++++++
security/smack/smackfs.c | 2 +-
tools/include/uapi/asm-generic/unistd.h | 6 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 2 +
35 files changed, 412 insertions(+), 7 deletions(-)
base-commit: 9c32cda43eb78f78c73aee4aa344b777714e259b
--
2.48.1
^ permalink raw reply
* [PATCH v6 2/5] lsm: introduce security_lsm_config_*_policy hooks
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Define two new LSM hooks: security_lsm_config_self_policy and
security_lsm_config_system_policy and wire them into the corresponding
lsm_config_*_policy() syscalls so that LSMs can register a unified
interface for policy management. This initial, minimal implementation
only supports the LSM_POLICY_LOAD operation to limit changes.
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
include/linux/lsm_hook_defs.h | 4 +++
include/linux/security.h | 20 ++++++++++++
include/uapi/linux/lsm.h | 8 +++++
security/lsm_syscalls.c | 13 ++++++--
security/security.c | 60 +++++++++++++++++++++++++++++++++++
5 files changed, 103 insertions(+), 2 deletions(-)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index bf3bbac4e02a..50b6e8aed787 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -464,3 +464,7 @@ LSM_HOOK(int, 0, bdev_alloc_security, struct block_device *bdev)
LSM_HOOK(void, LSM_RET_VOID, bdev_free_security, struct block_device *bdev)
LSM_HOOK(int, 0, bdev_setintegrity, struct block_device *bdev,
enum lsm_integrity_type type, const void *value, size_t size)
+LSM_HOOK(int, -EINVAL, lsm_config_self_policy, u32 op, void __user *buf,
+ size_t size, u32 flags)
+LSM_HOOK(int, -EINVAL, lsm_config_system_policy, u32 op,
+ void __user *buf, size_t size, u32 flags)
diff --git a/include/linux/security.h b/include/linux/security.h
index cc9b54d95d22..54acaee4a994 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -581,6 +581,11 @@ void security_bdev_free(struct block_device *bdev);
int security_bdev_setintegrity(struct block_device *bdev,
enum lsm_integrity_type type, const void *value,
size_t size);
+int security_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags);
+int security_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags);
+
#else /* CONFIG_SECURITY */
/**
@@ -1603,6 +1608,21 @@ static inline int security_bdev_setintegrity(struct block_device *bdev,
return 0;
}
+static inline int security_lsm_config_self_policy(u32 lsm_id, u32 op,
+ void __user *buf,
+ size_t size, u32 flags)
+{
+
+ return -EOPNOTSUPP;
+}
+
+static inline int security_lsm_config_system_policy(u32 lsm_id, u32 op,
+ void __user *buf,
+ size_t size, u32 flags)
+{
+
+ return -EOPNOTSUPP;
+}
#endif /* CONFIG_SECURITY */
#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
index 938593dfd5da..2b9432a30cdc 100644
--- a/include/uapi/linux/lsm.h
+++ b/include/uapi/linux/lsm.h
@@ -90,4 +90,12 @@ struct lsm_ctx {
*/
#define LSM_FLAG_SINGLE 0x0001
+/*
+ * LSM_POLICY_XXX definitions identify the different operations
+ * to configure LSM policies
+ */
+
+#define LSM_POLICY_UNDEF 0
+#define LSM_POLICY_LOAD 100
+
#endif /* _UAPI_LINUX_LSM_H */
diff --git a/security/lsm_syscalls.c b/security/lsm_syscalls.c
index b02a7623dea6..0796673b6f19 100644
--- a/security/lsm_syscalls.c
+++ b/security/lsm_syscalls.c
@@ -122,11 +122,20 @@ SYSCALL_DEFINE3(lsm_list_modules, u64 __user *, ids, u32 __user *, size,
SYSCALL_DEFINE6(lsm_config_self_policy, u32, lsm_id, u32, op, void __user *,
buf, u32 __user, size, u32, common_flags, u32, flags)
{
- return 0;
+ if (common_flags) // Reserved for future use
+ return -EINVAL;
+
+ return security_lsm_config_self_policy(lsm_id, op, buf, size, flags);
}
SYSCALL_DEFINE6(lsm_config_system_policy, u32, lsm_id, u32, op, void __user *,
buf, u32 __user, size, u32, common_flags, u32, flags)
{
- return 0;
+ if (common_flags) // Reserved for future use
+ return -EINVAL;
+
+ if (!capable(CAP_MAC_ADMIN))
+ return -EPERM;
+
+ return security_lsm_config_system_policy(lsm_id, op, buf, size, flags);
}
diff --git a/security/security.c b/security/security.c
index fb57e8fddd91..eeb61b27cd56 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5883,6 +5883,66 @@ int security_bdev_setintegrity(struct block_device *bdev,
}
EXPORT_SYMBOL(security_bdev_setintegrity);
+/**
+ * security_lsm_config_self_policy() - Configure caller's LSM policies
+ * @lsm_id: id of the LSM to target
+ * @op: Operation to perform (one of the LSM_POLICY_XXX values)
+ * @buf: userspace pointer to policy data
+ * @size: size of @buf
+ * @flags: lsm policy configuration flags
+ *
+ * Configure the policies of a LSM for the current domain/user. This notably
+ * allows to update them even when the lsmfs is unavailable or restricted.
+ * Currently, only LSM_POLICY_LOAD is supported.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ int rc = LSM_RET_DEFAULT(lsm_config_self_policy);
+ struct lsm_static_call *scall;
+
+ lsm_for_each_hook(scall, lsm_config_self_policy) {
+ if ((scall->hl->lsmid->id) == lsm_id) {
+ rc = scall->hl->hook.lsm_config_self_policy(op, buf, size, flags);
+ break;
+ }
+ }
+
+ return rc;
+}
+
+/**
+ * security_lsm_config_system_policy() - Configure system LSM policies
+ * @lsm_id: id of the lsm to target
+ * @op: Operation to perform (one of the LSM_POLICY_XXX values)
+ * @buf: userspace pointer to policy data
+ * @size: size of @buf
+ * @flags: lsm policy configuration flags
+ *
+ * Configure the policies of a LSM for the whole system. This notably allows
+ * to update them even when the lsmfs is unavailable or restricted. Currently,
+ * only LSM_POLICY_LOAD is supported.
+ *
+ * Return: Returns 0 on success, error on failure.
+ */
+int security_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ int rc = LSM_RET_DEFAULT(lsm_config_system_policy);
+ struct lsm_static_call *scall;
+
+ lsm_for_each_hook(scall, lsm_config_system_policy) {
+ if ((scall->hl->lsmid->id) == lsm_id) {
+ rc = scall->hl->hook.lsm_config_system_policy(op, buf, size, flags);
+ break;
+ }
+ }
+
+ return rc;
+}
+
#ifdef CONFIG_PERF_EVENTS
/**
* security_perf_event_open() - Check if a perf event open is allowed
--
2.48.1
^ permalink raw reply related
* [PATCH v6 3/5] AppArmor: add support for lsm_config_self_policy and lsm_config_system_policy
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Enable users to manage AppArmor policies through the new hooks
lsm_config_self_policy and lsm_config_system_policy.
lsm_config_self_policy allows stacking existing policies in the kernel.
This ensures that it can only further restrict the caller and can never
be used to gain new privileges.
lsm_config_system_policy allows loading or replacing AppArmor policies in
any AppArmor namespace and is restricted to CAP_MAC_ADMIN.
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
security/apparmor/apparmorfs.c | 31 ++++++++++
security/apparmor/include/apparmor.h | 4 ++
security/apparmor/include/apparmorfs.h | 3 +
security/apparmor/lsm.c | 84 ++++++++++++++++++++++++++
4 files changed, 122 insertions(+)
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 6039afae4bfc..6df43299b045 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -439,6 +439,37 @@ static ssize_t policy_update(u32 mask, const char __user *buf, size_t size,
return error;
}
+/**
+ * aa_profile_load_ns_name - load a profile into the current namespace identified by name
+ * @name: The name of the namesapce to load the policy in. "" for root_ns
+ * @name_size: size of @name. 0 For root ns
+ * @buf: buffer containing the user-provided policy
+ * @size: size of @buf
+ * @ppos: position pointer in the file
+ *
+ * Returns: 0 on success, negative value on error
+ */
+ssize_t aa_profile_load_ns_name(char *name, size_t name_size, const void __user *buf,
+ size_t size, loff_t *ppos)
+{
+ struct aa_ns *ns;
+
+ if (name_size == 0)
+ ns = aa_get_ns(root_ns);
+ else
+ ns = aa_lookupn_ns(root_ns, name, name_size);
+
+ if (!ns)
+ return -EINVAL;
+
+ int error = policy_update(AA_MAY_LOAD_POLICY | AA_MAY_REPLACE_POLICY,
+ buf, size, ppos, ns);
+
+ aa_put_ns(ns);
+
+ return error >= 0 ? 0 : error;
+}
+
/* .load file hook fn to load policy */
static ssize_t profile_load(struct file *f, const char __user *buf, size_t size,
loff_t *pos)
diff --git a/security/apparmor/include/apparmor.h b/security/apparmor/include/apparmor.h
index f83934913b0f..1d9a2881a8b9 100644
--- a/security/apparmor/include/apparmor.h
+++ b/security/apparmor/include/apparmor.h
@@ -62,5 +62,9 @@ extern unsigned int aa_g_path_max;
#define AA_DEFAULT_CLEVEL 0
#endif /* CONFIG_SECURITY_APPARMOR_EXPORT_BINARY */
+/* Syscall-related buffer size limits */
+
+#define AA_PROFILE_NAME_MAX_SIZE (1 << 9)
+#define AA_PROFILE_MAX_SIZE (1 << 28)
#endif /* __APPARMOR_H */
diff --git a/security/apparmor/include/apparmorfs.h b/security/apparmor/include/apparmorfs.h
index 1e94904f68d9..fd415afb7659 100644
--- a/security/apparmor/include/apparmorfs.h
+++ b/security/apparmor/include/apparmorfs.h
@@ -112,6 +112,9 @@ int __aafs_profile_mkdir(struct aa_profile *profile, struct dentry *parent);
void __aafs_ns_rmdir(struct aa_ns *ns);
int __aafs_ns_mkdir(struct aa_ns *ns, struct dentry *parent, const char *name,
struct dentry *dent);
+ssize_t aa_profile_load_ns_name(char *name, size_t name_len, const void __user *buf,
+ size_t size, loff_t *ppos);
+
struct aa_loaddata;
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 9b6c2f157f83..0c127f9dae19 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1275,6 +1275,86 @@ static int apparmor_socket_shutdown(struct socket *sock, int how)
return aa_sock_perm(OP_SHUTDOWN, AA_MAY_SHUTDOWN, sock);
}
+/**
+ * apparmor_lsm_config_self_policy - Stack a profile
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: buffer containing the user-provided name of the profile to stack
+ * @size: size of @buf
+ * @flags: reserved for future use; must be zero
+ *
+ * Returns: 0 on success, negative value on error
+ */
+static int apparmor_lsm_config_self_policy(u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ char *name;
+ long name_size;
+ int ret;
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+ if (size == 0)
+ return -EINVAL;
+ if (size > AA_PROFILE_NAME_MAX_SIZE)
+ return -E2BIG;
+
+ name = kmalloc(size, GFP_KERNEL);
+ if (!name)
+ return -ENOMEM;
+
+ name_size = strncpy_from_user(name, buf, size);
+ if (name_size <= 0) {
+ kfree(name);
+ return name_size;
+ } else if (name_size == size) {
+ kfree(name);
+ return -E2BIG;
+ }
+
+ ret = aa_change_profile(name, AA_CHANGE_STACK);
+
+ kfree(name);
+
+ return ret;
+}
+
+/**
+ * apparmor_lsm_config_system_policy - Load or replace a system policy
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: user-supplied buffer in the form "<ns>\0<policy>"
+ * <ns> is the namespace to load the policy into (empty string for root)
+ * <policy> is the policy to load
+ * @size: size of @buf
+ * @flags: reserved for future uses; must be zero
+ *
+ * Returns: 0 on success, negative value on error
+ */
+static int apparmor_lsm_config_system_policy(u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ loff_t pos = 0; // Partial writing is not currently supported
+ char ns_name[AA_PROFILE_NAME_MAX_SIZE];
+ size_t ns_size;
+ size_t max_ns_size = min(size, AA_PROFILE_NAME_MAX_SIZE);
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+ if (size < 2)
+ return -EINVAL;
+ if (size > AA_PROFILE_MAX_SIZE)
+ return -E2BIG;
+
+ ns_size = strncpy_from_user(ns_name, buf, max_ns_size);
+ if (ns_size < 0)
+ return ns_size;
+ if (ns_size == max_ns_size)
+ return -E2BIG;
+
+ return aa_profile_load_ns_name(ns_name, ns_size, buf + ns_size + 1,
+ size - ns_size - 1, &pos);
+}
+
+
#ifdef CONFIG_NETWORK_SECMARK
/**
* apparmor_socket_sock_rcv_skb - check perms before associating skb to sk
@@ -1483,6 +1563,10 @@ static struct security_hook_list apparmor_hooks[] __ro_after_init = {
LSM_HOOK_INIT(socket_getsockopt, apparmor_socket_getsockopt),
LSM_HOOK_INIT(socket_setsockopt, apparmor_socket_setsockopt),
LSM_HOOK_INIT(socket_shutdown, apparmor_socket_shutdown),
+
+ LSM_HOOK_INIT(lsm_config_self_policy, apparmor_lsm_config_self_policy),
+ LSM_HOOK_INIT(lsm_config_system_policy,
+ apparmor_lsm_config_system_policy),
#ifdef CONFIG_NETWORK_SECMARK
LSM_HOOK_INIT(socket_sock_rcv_skb, apparmor_socket_sock_rcv_skb),
#endif
--
2.48.1
^ permalink raw reply related
* [PATCH v6 1/5] Wire up lsm_config_self_policy and lsm_config_system_policy syscalls
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Add support for the new lsm_config_self_policy and
lsm_config_system_policy syscalls, providing a unified API for loading
and modifying LSM policies, for the current user and for the entire
system, respectively without requiring the LSM’s pseudo-filesystems.
Benefits:
- Works even if the LSM pseudo-filesystem isn’t mounted or available
(e.g. in containers)
- Offers a logical and unified interface rather than multiple
heterogeneous pseudo-filesystems
- Avoids the overhead of other kernel interfaces for better efficiency
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
arch/alpha/kernel/syscalls/syscall.tbl | 2 ++
arch/arm/tools/syscall.tbl | 2 ++
arch/m68k/kernel/syscalls/syscall.tbl | 2 ++
arch/microblaze/kernel/syscalls/syscall.tbl | 2 ++
arch/mips/kernel/syscalls/syscall_n32.tbl | 2 ++
arch/mips/kernel/syscalls/syscall_n64.tbl | 2 ++
arch/mips/kernel/syscalls/syscall_o32.tbl | 2 ++
arch/parisc/kernel/syscalls/syscall.tbl | 2 ++
arch/powerpc/kernel/syscalls/syscall.tbl | 2 ++
arch/s390/kernel/syscalls/syscall.tbl | 2 ++
arch/sh/kernel/syscalls/syscall.tbl | 2 ++
arch/sparc/kernel/syscalls/syscall.tbl | 2 ++
arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
arch/xtensa/kernel/syscalls/syscall.tbl | 2 ++
include/linux/syscalls.h | 5 +++++
include/uapi/asm-generic/unistd.h | 6 +++++-
kernel/sys_ni.c | 2 ++
security/lsm_syscalls.c | 12 ++++++++++++
tools/include/uapi/asm-generic/unistd.h | 6 +++++-
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
21 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 2dd6340de6b4..4fc75352220d 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -507,3 +507,5 @@
575 common listxattrat sys_listxattrat
576 common removexattrat sys_removexattrat
577 common open_tree_attr sys_open_tree_attr
+578 common lsm_config_self_policy sys_lsm_config_self_policy
+579 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 27c1d5ebcd91..326483cb94a4 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -482,3 +482,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 9fe47112c586..d37364df1cd7 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -467,3 +467,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 7b6e97828e55..9d58ebfcf967 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -473,3 +473,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index aa70e371bb54..8627b5f56280 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -406,3 +406,5 @@
465 n32 listxattrat sys_listxattrat
466 n32 removexattrat sys_removexattrat
467 n32 open_tree_attr sys_open_tree_attr
+468 n32 lsm_config_self_policy sys_lsm_config_self_policy
+469 n32 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 1e8c44c7b614..813207b61f58 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -382,3 +382,5 @@
465 n64 listxattrat sys_listxattrat
466 n64 removexattrat sys_removexattrat
467 n64 open_tree_attr sys_open_tree_attr
+468 n64 lsm_config_self_policy sys_lsm_config_self_policy
+469 n64 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 114a5a1a6230..9cd0946b4370 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -455,3 +455,5 @@
465 o32 listxattrat sys_listxattrat
466 o32 removexattrat sys_removexattrat
467 o32 open_tree_attr sys_open_tree_attr
+468 o32 lsm_config_self_policy sys_lsm_config_self_policy
+469 o32 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 94df3cb957e9..9db01dd55793 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -466,3 +466,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 9a084bdb8926..97714acb39ab 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -558,3 +558,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index a4569b96ef06..d2b0f14fb516 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -470,3 +470,5 @@
465 common listxattrat sys_listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 52a7652fcff6..210d7118ce16 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -471,3 +471,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 83e45eb6c095..494417d80680 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -513,3 +513,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ac007ea00979..36c2c538e04f 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -473,3 +473,5 @@
465 i386 listxattrat sys_listxattrat
466 i386 removexattrat sys_removexattrat
467 i386 open_tree_attr sys_open_tree_attr
+468 i386 lsm_config_self_policy sys_lsm_config_self_policy
+469 i386 lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index cfb5ca41e30d..7eefbccfe531 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -391,6 +391,8 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index f657a77314f8..90d86a54a952 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -438,3 +438,5 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e5603cc91963..43b53fbd44be 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -988,6 +988,11 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, struct lsm_ctx __user *
asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __user *ctx,
u32 size, u32 flags);
asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 flags);
+asmlinkage long sys_lsm_config_self_policy(u32 lsm_id, u32 op, void __user *buf,
+ u32 __user size, u32 common_flags, u32 flags);
+asmlinkage long sys_lsm_config_system_policy(u32 lsm_id, u32 op, void __user *buf,
+ u32 __user size, u32 common_flags u32 flags);
+
/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 2892a45023af..021d0689c929 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -851,9 +851,13 @@ __SYSCALL(__NR_listxattrat, sys_listxattrat)
__SYSCALL(__NR_removexattrat, sys_removexattrat)
#define __NR_open_tree_attr 467
__SYSCALL(__NR_open_tree_attr, sys_open_tree_attr)
+#define __NR_lsm_config_self_policy 468
+__SYSCALL(__NR_lsm_config_self_policy, sys_lsm_config_self_policy)
+#define __NR_lsm_config_system_policy 469
+__SYSCALL(__NR_lsm_config_system_policy, sys_lsm_config_system_policy)
#undef __NR_syscalls
-#define __NR_syscalls 468
+#define __NR_syscalls 470
/*
* 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c00a86931f8c..3ecebcd3fbe0 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -172,6 +172,8 @@ COND_SYSCALL_COMPAT(fadvise64_64);
COND_SYSCALL(lsm_get_self_attr);
COND_SYSCALL(lsm_set_self_attr);
COND_SYSCALL(lsm_list_modules);
+COND_SYSCALL(lsm_config_self_policy);
+COND_SYSCALL(lsm_config_system_policy);
/* CONFIG_MMU only */
COND_SYSCALL(swapon);
diff --git a/security/lsm_syscalls.c b/security/lsm_syscalls.c
index 8440948a690c..b02a7623dea6 100644
--- a/security/lsm_syscalls.c
+++ b/security/lsm_syscalls.c
@@ -118,3 +118,15 @@ SYSCALL_DEFINE3(lsm_list_modules, u64 __user *, ids, u32 __user *, size,
return lsm_active_cnt;
}
+
+SYSCALL_DEFINE6(lsm_config_self_policy, u32, lsm_id, u32, op, void __user *,
+ buf, u32 __user, size, u32, common_flags, u32, flags)
+{
+ return 0;
+}
+
+SYSCALL_DEFINE6(lsm_config_system_policy, u32, lsm_id, u32, op, void __user *,
+ buf, u32 __user, size, u32, common_flags, u32, flags)
+{
+ return 0;
+}
diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index 2892a45023af..021d0689c929 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -851,9 +851,13 @@ __SYSCALL(__NR_listxattrat, sys_listxattrat)
__SYSCALL(__NR_removexattrat, sys_removexattrat)
#define __NR_open_tree_attr 467
__SYSCALL(__NR_open_tree_attr, sys_open_tree_attr)
+#define __NR_lsm_config_self_policy 468
+__SYSCALL(__NR_lsm_config_self_policy, sys_lsm_config_self_policy)
+#define __NR_lsm_config_system_policy 469
+__SYSCALL(__NR_lsm_config_system_policy, sys_lsm_config_system_policy)
#undef __NR_syscalls
-#define __NR_syscalls 468
+#define __NR_syscalls 470
/*
* 32 bit systems traditionally used different
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index cfb5ca41e30d..7eefbccfe531 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -391,6 +391,8 @@
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
467 common open_tree_attr sys_open_tree_attr
+468 common lsm_config_self_policy sys_lsm_config_self_policy
+469 common lsm_config_system_policy sys_lsm_config_system_policy
#
# Due to a historical design error, certain syscalls are numbered differently
--
2.48.1
^ permalink raw reply related
* [PATCH v6 4/5] SELinux: add support for lsm_config_system_policy
From: Maxime Bélair @ 2025-10-10 13:25 UTC (permalink / raw)
To: linux-security-module
Cc: john.johansen, paul, jmorris, serge, mic, kees,
stephen.smalley.work, casey, takedakn, penguin-kernel, song,
rdunlap, linux-api, apparmor, linux-kernel, Maxime Bélair
In-Reply-To: <20251010132610.12001-1-maxime.belair@canonical.com>
Enable users to manage SELinux policies through the new hook
lsm_config_system_policy. This feature is restricted to CAP_MAC_ADMIN.
Signed-off-by: Maxime Bélair <maxime.belair@canonical.com>
---
security/selinux/hooks.c | 27 +++++++++++++++++++++++++++
security/selinux/include/security.h | 7 +++++++
security/selinux/selinuxfs.c | 16 ++++++++++++----
3 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e7a7dcab81db..3d14d4e47937 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -7196,6 +7196,31 @@ static int selinux_uring_allowed(void)
}
#endif /* CONFIG_IO_URING */
+/**
+ * selinux_lsm_config_system_policy - Manage a LSM policy
+ * @op: operation to perform. Currently, only LSM_POLICY_LOAD is supported
+ * @buf: User-supplied buffer
+ * @size: size of @buf
+ * @flags: reserved for future use; must be zero
+ *
+ * Returns: number of written rules on success, negative value on error
+ */
+static int selinux_lsm_config_system_policy(u32 op, void __user *buf,
+ size_t size, u32 flags)
+{
+ loff_t pos = 0;
+
+ if (op != LSM_POLICY_LOAD || flags)
+ return -EOPNOTSUPP;
+
+ if (!selinux_null.dentry || !selinux_null.dentry->d_sb ||
+ !selinux_null.dentry->d_sb->s_fs_info)
+ return -ENODEV;
+
+ return __sel_write_load(selinux_null.dentry->d_sb->s_fs_info, buf, size,
+ &pos);
+}
+
static const struct lsm_id selinux_lsmid = {
.name = "selinux",
.id = LSM_ID_SELINUX,
@@ -7499,6 +7524,8 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
#ifdef CONFIG_PERF_EVENTS
LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),
#endif
+ LSM_HOOK_INIT(lsm_config_system_policy, selinux_lsm_config_system_policy),
+
};
static __init int selinux_init(void)
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index e7827ed7be5f..7b779ea43cc3 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -389,7 +389,14 @@ struct selinux_kernel_status {
extern void selinux_status_update_setenforce(bool enforcing);
extern void selinux_status_update_policyload(u32 seqno);
extern void selinux_complete_init(void);
+
+struct selinux_fs_info;
+
extern struct path selinux_null;
+extern ssize_t __sel_write_load(struct selinux_fs_info *fsi,
+ const char __user *buf, size_t count,
+ loff_t *ppos);
+
extern void selnl_notify_setenforce(int val);
extern void selnl_notify_policyload(u32 seqno);
extern int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm);
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 47480eb2189b..1f7e611d8300 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -567,11 +567,11 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
return ret;
}
-static ssize_t sel_write_load(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
+ssize_t __sel_write_load(struct selinux_fs_info *fsi,
+ const char __user *buf, size_t count,
+ loff_t *ppos)
{
- struct selinux_fs_info *fsi;
struct selinux_load_state load_state;
ssize_t length;
void *data = NULL;
@@ -605,7 +605,6 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
pr_warn_ratelimited("SELinux: failed to load policy\n");
goto out;
}
- fsi = file_inode(file)->i_sb->s_fs_info;
length = sel_make_policy_nodes(fsi, load_state.policy);
if (length) {
pr_warn_ratelimited("SELinux: failed to initialize selinuxfs\n");
@@ -626,6 +625,15 @@ static ssize_t sel_write_load(struct file *file, const char __user *buf,
return length;
}
+static ssize_t sel_write_load(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct selinux_fs_info *fsi = file_inode(file)->i_sb->s_fs_info;
+
+ return __sel_write_load(fsi, buf, count, ppos);
+}
+
+
static const struct file_operations sel_load_ops = {
.write = sel_write_load,
.llseek = generic_file_llseek,
--
2.48.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox