* [PATCH] smack: simplify write handlers of sysfs entries
From: Dmitry Antipov @ 2026-03-20 11:31 UTC (permalink / raw)
To: Casey Schaufler, Paul Moore, James Morris, Serge E. Hallyn,
Konstantin Andreev
Cc: linux-security-module, Dmitry Antipov
Use the convenient 'kstrto{u,s}32_from_user()' to simplify write
handlers of /smack/{doi,direct,mapped,logging,ptrace} sysfs entries.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
---
security/smack/smackfs.c | 81 +++++++++++-----------------------------
1 file changed, 22 insertions(+), 59 deletions(-)
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 6e62dcb36f74..f60d5469043e 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -1598,24 +1598,17 @@ static ssize_t smk_read_doi(struct file *filp, char __user *buf,
static ssize_t smk_write_doi(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
- char temp[80];
- unsigned long u;
+ int ret;
+ u32 u;
if (!smack_privileged(CAP_MAC_ADMIN))
return -EPERM;
- if (count >= sizeof(temp) || count == 0)
- return -EINVAL;
-
- if (copy_from_user(temp, buf, count) != 0)
- return -EFAULT;
-
- temp[count] = '\0';
+ ret = kstrtou32_from_user(buf, count, 10, &u);
+ if (unlikely(ret))
+ return ret;
- if (kstrtoul(temp, 10, &u))
- return -EINVAL;
-
- if (u == CIPSO_V4_DOI_UNKNOWN || u > U32_MAX)
+ if (u == CIPSO_V4_DOI_UNKNOWN)
return -EINVAL;
return smk_cipso_doi(u, GFP_KERNEL) ? : count;
@@ -1664,22 +1657,14 @@ static ssize_t smk_write_direct(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct smack_known *skp;
- char temp[80];
- int i;
+ int i, ret;
if (!smack_privileged(CAP_MAC_ADMIN))
return -EPERM;
- if (count >= sizeof(temp) || count == 0)
- return -EINVAL;
-
- if (copy_from_user(temp, buf, count) != 0)
- return -EFAULT;
-
- temp[count] = '\0';
-
- if (sscanf(temp, "%d", &i) != 1)
- return -EINVAL;
+ ret = kstrtos32_from_user(buf, count, 10, &i);
+ if (unlikely(ret))
+ return ret;
/*
* Don't do anything if the value hasn't actually changed.
@@ -1742,22 +1727,14 @@ static ssize_t smk_write_mapped(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct smack_known *skp;
- char temp[80];
- int i;
+ int i, ret;
if (!smack_privileged(CAP_MAC_ADMIN))
return -EPERM;
- if (count >= sizeof(temp) || count == 0)
- return -EINVAL;
-
- if (copy_from_user(temp, buf, count) != 0)
- return -EFAULT;
-
- temp[count] = '\0';
-
- if (sscanf(temp, "%d", &i) != 1)
- return -EINVAL;
+ ret = kstrtos32_from_user(buf, count, 10, &i);
+ if (unlikely(ret))
+ return ret;
/*
* Don't do anything if the value hasn't actually changed.
@@ -2179,22 +2156,15 @@ static ssize_t smk_read_logging(struct file *filp, char __user *buf,
static ssize_t smk_write_logging(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
- char temp[32];
- int i;
+ int i, ret;
if (!smack_privileged(CAP_MAC_ADMIN))
return -EPERM;
- if (count >= sizeof(temp) || count == 0)
- return -EINVAL;
-
- if (copy_from_user(temp, buf, count) != 0)
- return -EFAULT;
+ ret = kstrtos32_from_user(buf, count, 10, &i);
+ if (unlikely(ret))
+ return ret;
- temp[count] = '\0';
-
- if (sscanf(temp, "%d", &i) != 1)
- return -EINVAL;
if (i < 0 || i > 3)
return -EINVAL;
log_policy = i;
@@ -2838,22 +2808,15 @@ static ssize_t smk_read_ptrace(struct file *filp, char __user *buf,
static ssize_t smk_write_ptrace(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
- char temp[32];
- int i;
+ int i, ret;
if (!smack_privileged(CAP_MAC_ADMIN))
return -EPERM;
- if (*ppos != 0 || count >= sizeof(temp) || count == 0)
- return -EINVAL;
-
- if (copy_from_user(temp, buf, count) != 0)
- return -EFAULT;
+ ret = kstrtos32_from_user(buf, count, 10, &i);
+ if (unlikely(ret))
+ return ret;
- temp[count] = '\0';
-
- if (sscanf(temp, "%d", &i) != 1)
- return -EINVAL;
if (i < SMACK_PTRACE_DEFAULT || i > SMACK_PTRACE_MAX)
return -EINVAL;
smack_ptrace_rule = i;
--
2.53.0
^ permalink raw reply related
* Re: [PATCH] fs: allow vfs code to open an O_PATH file with negative dentry
From: Amir Goldstein @ 2026-03-20 12:15 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, Miklos Szeredi, Gao Xiang, linux-security-module,
selinux, linux-erofs, linux-fsdevel, linux-unionfs,
syzbot+f34aab278bf5d664e2be, Paul Moore
In-Reply-To: <CAOQ4uxinSLXKWjMgwyA2A_UU5e+6ZjbdsFUY-+f9DMfQcxH0qA@mail.gmail.com>
On Thu, Mar 19, 2026 at 7:30 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Mar 19, 2026 at 4:55 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Thu, Mar 19, 2026 at 10:50 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > On Thu, Mar 19, 2026 at 2:13 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > On Thu, Mar 19, 2026 at 01:46:16PM +0100, Amir Goldstein wrote:
> > > > > The fields f_mapping, f_wb_err, f_sb_err are irrelevant for O_PATH file.
> > > > > Skip setting them for O_PATH file, so that the O_PATH file could be
> > > > > opened with a negative dentry.
> > > > >
> > > > > This is not something that a user should be able to do, but vfs code,
> > > > > such as ovl_tmpfile() can use this to open a backing O_PATH tmpfile
> > > > > before instantiating the dentry.
> > > > >
> > > > > Reported-by: syzbot+f34aab278bf5d664e2be@syzkaller.appspotmail.com
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > ---
> > > > >
> > > > > Christian,
> > > > >
> > > > > This patch fixes the syzbot report [1] that the
> > > > > backing_file_user_path_file() patch [2] introduces.
> > > > >
> > > > > This is not the only possible fix, but it is the cleanest one IMO.
> > > > > There is a small risk in introducing a state of an O_PATH file with
> > > > > NULL f_inode, but I (and the bots that I asked) did not find any
> > > > > obvious risk in this state.
> > > > >
> > > > > Note that specifically, the user path inode is accessed via d_inode()
> > > > > and not via file_inode(), which makes this safe for file_user_inode()
> > > > > callers.
> > > > >
> > > > > BTW, I missed this regression with the original patch because I
> > > > > only ran the quick overlayfs sanity test.
> > > > >
> > > > > Now I ran a full quick fstest cycle and verified that the O_TMPFILE
> > > > > test case is covered and that the bug is detected.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Thanks,
> > > > > Amir.
> > > > >
> > > > > [1] https://syzkaller.appspot.com/bug?extid=f34aab278bf5d664e2be
> > > > > [2] https://lore.kernel.org/linux-fsdevel/20260318131258.1457101-1-amir73il@gmail.com/
> > > > >
> > > > > fs/open.c | 7 ++++---
> > > > > 1 file changed, 4 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/open.c b/fs/open.c
> > > > > index 91f1139591abe..2004a8c0d9c97 100644
> > > > > --- a/fs/open.c
> > > > > +++ b/fs/open.c
> > > > > @@ -893,9 +893,6 @@ static int do_dentry_open(struct file *f,
> > > > >
> > > > > path_get(&f->f_path);
> > > > > f->f_inode = inode;
> > > > > - f->f_mapping = inode->i_mapping;
> > > > > - f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
> > > > > - f->f_sb_err = file_sample_sb_err(f);
> > > > >
> > > > > if (unlikely(f->f_flags & O_PATH)) {
> > > > > f->f_mode = FMODE_PATH | FMODE_OPENED;
> > > > > @@ -904,6 +901,10 @@ static int do_dentry_open(struct file *f,
> > > > > return 0;
> > > > > }
> > > > >
> > > > > + f->f_mapping = inode->i_mapping;
> > > > > + f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
> > > > > + f->f_sb_err = file_sample_sb_err(f);
> > > > > +
> > > > > if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) {
> > > > > i_readcount_inc(inode);
> > > > > } else if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) {
> > > >
> > > > I think this is really ugly and I'm really unhappy that we should adjust
> > > > initialization of generic vfs code for this. My preference is to push
> > > > the pain into the backing file stuff. And my ultimate preference is for
> > > > this backing file stuff to be removed again for a simple struct path.
> > > > We're working around some more fundamental cleanup here imho.
> > >
> > > Fair enough, we can rip the entire thing from vfs if you don't like it.
> > > The user path file can be opened and stored internally by selinux
> > > without adding all the associated risks in vfs.
> > >
> > > Paul,
> > >
> > > Please see compile tested code at:
> > > https://github.com/amir73il/linux/commits/user_path_file/
> >
> > No. Definitely no. Ignoring the fact that there is no reason we
> > should pushing this into the LSM, doing it in this way means it is
> > very likely that each LSM wanting to provide mmap/mprotect controls on
> > overlayfs will have to create a new O_PATH file. No.
> >
> > ... and let me preemptively comment that this doesn't belong in the
> > LSM framework either.
> >
> > As Christian already mentioned, this really needs to be addressed in
> > the backing file code, please do it there.
> >
>
> OK, will give it another try.
>
Christian,
I pushed another version of the syzbot ovl O_TMPFILE crash fix to:
https://github.com/amir73il/linux/commits/user_path_file/
In a nutshell, created a helper kernel_path_file_open()
instead of modifying do_dentry_open(), with a bit more magic in
ovl_tmpfile().
It's not super pretty, but at least it does not touch any non-backing_file
code paths, so maybe you will be ok with it.
Thanks,
Amir.
^ permalink raw reply
* Re: [PATCH v6 3/9] landlock: Control pathname UNIX domain socket resolution by path
From: Günther Noack @ 2026-03-20 12:28 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Justin Suess, Sebastian Andrzej Siewior, John Johansen,
Tingmao Wang, Kuniyuki Iwashima, Jann Horn, linux-security-module,
Samasth Norway Ananda, Matthieu Buffet, Mikhail Ivanov,
konstantin.meskhidze, Demi Marie Obenour, Alyssa Ross,
Tahera Fahimi
In-Reply-To: <20260318.aequoaDaeb7h@digikod.net>
On Wed, Mar 18, 2026 at 06:52:57PM +0100, Mickaël Salaün wrote:
> On Wed, Mar 18, 2026 at 12:43:55PM -0400, Justin Suess wrote:
> > On Wed, Mar 18, 2026 at 05:26:20PM +0100, Mickaël Salaün wrote:
> > > On Wed, Mar 18, 2026 at 04:05:59PM +0100, Sebastian Andrzej Siewior wrote:
> > > > On 2026-03-18 10:14:52 [-0400], Justin Suess wrote:
> > > > > Sebastian,
> > > > Justin,
> > > >
> > > > > In short: dom_other is a pointer to a landlock-owned refcounted struct.
> > > > …
> > > > >
> > > > > But we copy the domain pointer, which points to a landlock allocated
> > > > > and controlled object.
> > > >
> > > > and this is not going away while we are here and preempted after
> > > > dropping the lock? (if the landlock policy is updated/ changed/ …)
> > >
> > > I agree with Sebastian, this is a bug, see my original proposal:
> > > https://lore.kernel.org/all/20260217.lievaS8eeng8@digikod.net/
> > Mickaël,
> >
> > Just to make sure we're speaking of the same thing (I spotted a bug
> > shortly after replying to Sebastian).
> >
> > This is a potential UAF if the dom_other is freed before the access
> > check takes place correct?
>
> Yes
>
> >
> > dom_other = landlock_cred(other->sk_socket->file->f_cred)->domain;
> > unix_state_unlock(other);
> >
> > unmask_scoped_access(subject->domain, dom_other, &layer_masks,
> > fs_resolve_unix.fs);
> >
> > If the dom_other->usage reaches zero, then the domain could be
> > freed after the unix_state_unlock while we're checking access??
> >
> > (I guess I assumed the sock_hold on the @other would prevent the task
> > @other belongs to from being freed.)
> >
> > Would it be better to move the access check under the unix_state_lock or
> > to acquire another reference to the ruleset (or something else)?
>
> Because the unmask_scoped_access() only read a bounded array, it's
> simpler to unlock just after.
>
> The other alternatives would be to use an RCU lock or a new reference
> but I don't think it's worth it.
Thank you, Sebastian, Mickaël and Justin for spotting this!
I agree, holding the existing lock across the unmask_scoped_access()
call seems like the simplest solution. This function only walks a
previously loaded bounded memory structure, so it does not seem worth
switching to a different lock for that.
I'm changing my WIP for V7 to hold the unix_state_lock across that
function call.
–Günther
^ permalink raw reply
* [PATCH v2] fs: allow backing file code to open an O_PATH file with negative dentry
From: Amir Goldstein @ 2026-03-20 12:29 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, Miklos Szeredi, Paul Moore, Gao Xiang,
linux-security-module, selinux, linux-erofs, linux-fsdevel,
linux-unionfs, syzbot+f34aab278bf5d664e2be
The fields f_mapping and f_wb_err are irrelevant for the O_PATH file
in backing_file_user_path_file().
Create a dedicated helper kernel_path_file_open(), which skips all the
generic code in do_dentry_open() and does only the essentials, so that
the internal O_PATH file could be opened with a negative dentry.
This is needed for backing_tmpfile_open() to open a backing O_PATH
tmpfile before instantiating the dentry.
The callers of backing_tmpfile_open() are responsible for calling
backing_tmpfile_finish() after making the path positive.
Reported-by: syzbot+f34aab278bf5d664e2be@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
Christian,
This patch fixes the syzbot report [1] that the
backing_file_user_path_file() patch [2] introduces.
Following your feedback on v1, this version makes an effort to stay
out of the way of main vfs execution paths and restrict the changes
to backing_file users.
This still introduced a temporary state of an O_PATH file with negative
path, but only for a short time and only for backing_file users and
ones that use backing_tmpfile_open() (i.e. only overlayfs), so the
risk is minimal.
WDYT?
Thanks,
Amir.
Changes since v1:
- Create helper for internal O_PATH open with negative path
- Create backing_tmpfile_finish() API to fixup the negative path
[1] https://syzkaller.appspot.com/bug?extid=f34aab278bf5d664e2be
[2] https://lore.kernel.org/linux-fsdevel/20260318131258.1457101-1-amir73il@gmail.com/
fs/backing-file.c | 6 ++++++
fs/file_table.c | 14 ++++++++++++--
fs/internal.h | 7 +++++++
fs/open.c | 34 ++++++++++++++++++++++++++++++----
fs/overlayfs/dir.c | 2 ++
include/linux/backing-file.h | 1 +
6 files changed, 58 insertions(+), 6 deletions(-)
diff --git a/fs/backing-file.c b/fs/backing-file.c
index d0a64c2103907..3357d624eac96 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -80,6 +80,12 @@ struct file *backing_tmpfile_open(const struct path *user_path,
}
EXPORT_SYMBOL(backing_tmpfile_open);
+void backing_tmpfile_finish(struct file *file)
+{
+ backing_file_set_user_path_inode(file);
+}
+EXPORT_SYMBOL_GPL(backing_tmpfile_finish);
+
struct backing_aio {
struct kiocb iocb;
refcount_t ref;
diff --git a/fs/file_table.c b/fs/file_table.c
index e8b4eb2bbff85..a4d1064d50896 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -69,11 +69,21 @@ EXPORT_SYMBOL_GPL(backing_file_user_path_file);
int backing_file_open_user_path(struct file *f, const struct path *path)
{
- /* open an O_PATH file to reference the user path - should not fail */
- return WARN_ON(vfs_open(path, &backing_file(f)->user_path_file));
+ if (WARN_ON(!(f->f_mode & FMODE_BACKING)))
+ return -EIO;
+ kernel_path_file_open(&backing_file(f)->user_path_file, path);
+ return 0;
}
EXPORT_SYMBOL_GPL(backing_file_open_user_path);
+void backing_file_set_user_path_inode(struct file *f)
+{
+ if (WARN_ON(!(f->f_mode & FMODE_BACKING)))
+ return;
+ file_set_d_inode(&backing_file(f)->user_path_file);
+}
+EXPORT_SYMBOL_GPL(backing_file_set_user_path_inode);
+
static void destroy_file(struct file *f)
{
security_file_free(f);
diff --git a/fs/internal.h b/fs/internal.h
index 7c44a58627ba3..4a9e5e00678d9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -109,6 +109,13 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
struct file *alloc_empty_backing_file(int flags, const struct cred *cred,
const struct cred *user_cred);
int backing_file_open_user_path(struct file *f, const struct path *path);
+void backing_file_set_user_path_inode(struct file *f);
+void kernel_path_file_open(struct file *f, const struct path *path);
+
+static inline void file_set_d_inode(struct file *f)
+{
+ f->f_inode = d_inode(f->f_path.dentry);
+}
static inline void file_put_write_access(struct file *file)
{
diff --git a/fs/open.c b/fs/open.c
index 91f1139591abe..a7b3b04cd9ae7 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -884,10 +884,38 @@ static inline int file_get_write_access(struct file *f)
return error;
}
+static const struct file_operations empty_fops = {};
+
+static void do_path_file_open(struct file *f)
+{
+ f->f_mode = FMODE_PATH | FMODE_OPENED;
+ file_set_fsnotify_mode(f, FMODE_NONOTIFY);
+ f->f_op = &empty_fops;
+}
+
+/**
+ * kernel_path_file_open - open an O_PATH file for kernel internal use
+ * @f: pre-allocated file with f_flags and f_cred initialized
+ * @path: path to reference (may have a negative dentry)
+ *
+ * Open a minimal O_PATH file that only references a path.
+ * Unlike vfs_open(), this does not require a positive dentry and does not
+ * set up f_mapping and other fields not needed for O_PATH.
+ * If path is negative at the time of this call, the caller is responsible for
+ * callingn backing_file_set_user_path_inode() after making the path positive.
+
+ */
+void kernel_path_file_open(struct file *f, const struct path *path)
+{
+ f->__f_path = *path;
+ path_get(&f->f_path);
+ file_set_d_inode(f);
+ do_path_file_open(f);
+}
+
static int do_dentry_open(struct file *f,
int (*open)(struct inode *, struct file *))
{
- static const struct file_operations empty_fops = {};
struct inode *inode = f->f_path.dentry->d_inode;
int error;
@@ -898,9 +926,7 @@ static int do_dentry_open(struct file *f,
f->f_sb_err = file_sample_sb_err(f);
if (unlikely(f->f_flags & O_PATH)) {
- f->f_mode = FMODE_PATH | FMODE_OPENED;
- file_set_fsnotify_mode(f, FMODE_NONOTIFY);
- f->f_op = &empty_fops;
+ do_path_file_open(f);
return 0;
}
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 5fd32ccc134d2..4010c87e10196 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -1408,6 +1408,8 @@ static int ovl_create_tmpfile(struct file *file, struct dentry *dentry,
err = ovl_instantiate(dentry, inode, newdentry, false, file);
if (!err) {
file->private_data = of;
+ /* user_path_file was opened with a negative path */
+ backing_tmpfile_finish(realfile);
} else {
dput(newdentry);
ovl_file_free(of);
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
index 8afba93f3ce07..52ac51ada6ff9 100644
--- a/include/linux/backing-file.h
+++ b/include/linux/backing-file.h
@@ -47,6 +47,7 @@ struct file *backing_tmpfile_open(const struct path *user_path,
const struct cred *user_cred, int flags,
const struct path *real_parentpath,
umode_t mode, const struct cred *cred);
+void backing_tmpfile_finish(struct file *file);
ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
struct kiocb *iocb, int flags,
struct backing_file_ctx *ctx);
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v2] fs: allow backing file code to open an O_PATH file with negative dentry
From: Amir Goldstein @ 2026-03-20 12:33 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, Miklos Szeredi, Paul Moore, Gao Xiang,
linux-security-module, selinux, linux-erofs, linux-fsdevel,
linux-unionfs, syzbot+f34aab278bf5d664e2be
In-Reply-To: <20260320122918.1726043-1-amir73il@gmail.com>
On Fri, Mar 20, 2026 at 1:29 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> The fields f_mapping and f_wb_err are irrelevant for the O_PATH file
> in backing_file_user_path_file().
>
> Create a dedicated helper kernel_path_file_open(), which skips all the
> generic code in do_dentry_open() and does only the essentials, so that
> the internal O_PATH file could be opened with a negative dentry.
>
> This is needed for backing_tmpfile_open() to open a backing O_PATH
> tmpfile before instantiating the dentry.
>
> The callers of backing_tmpfile_open() are responsible for calling
> backing_tmpfile_finish() after making the path positive.
>
> Reported-by: syzbot+f34aab278bf5d664e2be@syzkaller.appspotmail.com
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Christian,
>
> This patch fixes the syzbot report [1] that the
> backing_file_user_path_file() patch [2] introduces.
Sorry, forgot to mention that this patch applies on top of [2]
so the vfs CI is likely to fail to apply it.
The previous patch + fix are also available on my github
https://github.com/amir73il/linux/commits/user_path_file/
Thanks,
Amir.
>
> Following your feedback on v1, this version makes an effort to stay
> out of the way of main vfs execution paths and restrict the changes
> to backing_file users.
>
> This still introduced a temporary state of an O_PATH file with negative
> path, but only for a short time and only for backing_file users and
> ones that use backing_tmpfile_open() (i.e. only overlayfs), so the
> risk is minimal.
>
> WDYT?
>
> Thanks,
> Amir.
>
> Changes since v1:
> - Create helper for internal O_PATH open with negative path
> - Create backing_tmpfile_finish() API to fixup the negative path
>
> [1] https://syzkaller.appspot.com/bug?extid=f34aab278bf5d664e2be
> [2] https://lore.kernel.org/linux-fsdevel/20260318131258.1457101-1-amir73il@gmail.com/
>
> fs/backing-file.c | 6 ++++++
> fs/file_table.c | 14 ++++++++++++--
> fs/internal.h | 7 +++++++
> fs/open.c | 34 ++++++++++++++++++++++++++++++----
> fs/overlayfs/dir.c | 2 ++
> include/linux/backing-file.h | 1 +
> 6 files changed, 58 insertions(+), 6 deletions(-)
>
> diff --git a/fs/backing-file.c b/fs/backing-file.c
> index d0a64c2103907..3357d624eac96 100644
> --- a/fs/backing-file.c
> +++ b/fs/backing-file.c
> @@ -80,6 +80,12 @@ struct file *backing_tmpfile_open(const struct path *user_path,
> }
> EXPORT_SYMBOL(backing_tmpfile_open);
>
> +void backing_tmpfile_finish(struct file *file)
> +{
> + backing_file_set_user_path_inode(file);
> +}
> +EXPORT_SYMBOL_GPL(backing_tmpfile_finish);
> +
> struct backing_aio {
> struct kiocb iocb;
> refcount_t ref;
> diff --git a/fs/file_table.c b/fs/file_table.c
> index e8b4eb2bbff85..a4d1064d50896 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -69,11 +69,21 @@ EXPORT_SYMBOL_GPL(backing_file_user_path_file);
>
> int backing_file_open_user_path(struct file *f, const struct path *path)
> {
> - /* open an O_PATH file to reference the user path - should not fail */
> - return WARN_ON(vfs_open(path, &backing_file(f)->user_path_file));
> + if (WARN_ON(!(f->f_mode & FMODE_BACKING)))
> + return -EIO;
> + kernel_path_file_open(&backing_file(f)->user_path_file, path);
> + return 0;
> }
> EXPORT_SYMBOL_GPL(backing_file_open_user_path);
>
> +void backing_file_set_user_path_inode(struct file *f)
> +{
> + if (WARN_ON(!(f->f_mode & FMODE_BACKING)))
> + return;
> + file_set_d_inode(&backing_file(f)->user_path_file);
> +}
> +EXPORT_SYMBOL_GPL(backing_file_set_user_path_inode);
> +
> static void destroy_file(struct file *f)
> {
> security_file_free(f);
> diff --git a/fs/internal.h b/fs/internal.h
> index 7c44a58627ba3..4a9e5e00678d9 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -109,6 +109,13 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
> struct file *alloc_empty_backing_file(int flags, const struct cred *cred,
> const struct cred *user_cred);
> int backing_file_open_user_path(struct file *f, const struct path *path);
> +void backing_file_set_user_path_inode(struct file *f);
> +void kernel_path_file_open(struct file *f, const struct path *path);
> +
> +static inline void file_set_d_inode(struct file *f)
> +{
> + f->f_inode = d_inode(f->f_path.dentry);
> +}
>
> static inline void file_put_write_access(struct file *file)
> {
> diff --git a/fs/open.c b/fs/open.c
> index 91f1139591abe..a7b3b04cd9ae7 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -884,10 +884,38 @@ static inline int file_get_write_access(struct file *f)
> return error;
> }
>
> +static const struct file_operations empty_fops = {};
> +
> +static void do_path_file_open(struct file *f)
> +{
> + f->f_mode = FMODE_PATH | FMODE_OPENED;
> + file_set_fsnotify_mode(f, FMODE_NONOTIFY);
> + f->f_op = &empty_fops;
> +}
> +
> +/**
> + * kernel_path_file_open - open an O_PATH file for kernel internal use
> + * @f: pre-allocated file with f_flags and f_cred initialized
> + * @path: path to reference (may have a negative dentry)
> + *
> + * Open a minimal O_PATH file that only references a path.
> + * Unlike vfs_open(), this does not require a positive dentry and does not
> + * set up f_mapping and other fields not needed for O_PATH.
> + * If path is negative at the time of this call, the caller is responsible for
> + * callingn backing_file_set_user_path_inode() after making the path positive.
> +
> + */
> +void kernel_path_file_open(struct file *f, const struct path *path)
> +{
> + f->__f_path = *path;
> + path_get(&f->f_path);
> + file_set_d_inode(f);
> + do_path_file_open(f);
> +}
> +
> static int do_dentry_open(struct file *f,
> int (*open)(struct inode *, struct file *))
> {
> - static const struct file_operations empty_fops = {};
> struct inode *inode = f->f_path.dentry->d_inode;
> int error;
>
> @@ -898,9 +926,7 @@ static int do_dentry_open(struct file *f,
> f->f_sb_err = file_sample_sb_err(f);
>
> if (unlikely(f->f_flags & O_PATH)) {
> - f->f_mode = FMODE_PATH | FMODE_OPENED;
> - file_set_fsnotify_mode(f, FMODE_NONOTIFY);
> - f->f_op = &empty_fops;
> + do_path_file_open(f);
> return 0;
> }
>
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 5fd32ccc134d2..4010c87e10196 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -1408,6 +1408,8 @@ static int ovl_create_tmpfile(struct file *file, struct dentry *dentry,
> err = ovl_instantiate(dentry, inode, newdentry, false, file);
> if (!err) {
> file->private_data = of;
> + /* user_path_file was opened with a negative path */
> + backing_tmpfile_finish(realfile);
> } else {
> dput(newdentry);
> ovl_file_free(of);
> diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
> index 8afba93f3ce07..52ac51ada6ff9 100644
> --- a/include/linux/backing-file.h
> +++ b/include/linux/backing-file.h
> @@ -47,6 +47,7 @@ struct file *backing_tmpfile_open(const struct path *user_path,
> const struct cred *user_cred, int flags,
> const struct path *real_parentpath,
> umode_t mode, const struct cred *cred);
> +void backing_tmpfile_finish(struct file *file);
> ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
> struct kiocb *iocb, int flags,
> struct backing_file_ctx *ctx);
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH v3 3/3] ima: Add support for staging measurements for deletion
From: Mimi Zohar @ 2026-03-20 12:41 UTC (permalink / raw)
To: steven chen, Roberto Sassu, corbet, skhan, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, nramas, Roberto Sassu
In-Reply-To: <25e0a273-9044-4e0d-9812-0171ec99e1b7@linux.microsoft.com>
On Thu, 2026-03-19 at 14:31 -0700, steven chen wrote:
> > - Support for deleting N measurement records (and pre-pending the remaining
> > measurement records)
>
> Is there any problem to bring work of "stage" step together to the
> deletion step?
>
> "Trim N" method does everything that "staged" method can do, right?
> what's the "stage" method can do but "trim N" method can't do?
>
> in user space, if in "staged" state, no other user space agent can
> access the IMA measure list, right?
>
> Could you explain the benefit of bringing the "stage" step?
The performance improvement is because "staging" the IMA measurement list takes
the lock in order to move the measurement list pointer and then releases it.
New measurements can then be appended to a new measurement list. Deleting
records is done without taking the lock to walk the staged measurement list.
Without staging the measurement list, walking the measurement list to trim N
records requires taking and holding the lock. The performance is dependent on
the size of the measurement list.
Your question isn't really about "staging" the measurement list records, but
requiring a userspace signal to delete them. To answer that question, deleting
N records (third patch) could imply staging all the measurement records and
immediately deleting N records without an explicit userspace signal.
I expect the requested "documentation" patch will provide the motivation for the
delayed deletion of the measurement list.
Mimi
^ permalink raw reply
* [PATCH v7] backing_file: store an internal O_PATH user_path_file
From: Amir Goldstein @ 2026-03-20 13:53 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, Miklos Szeredi, Paul Moore, Gao Xiang,
linux-security-module, selinux, linux-erofs, linux-fsdevel,
linux-unionfs
Instead of storing the user_path, store an internal O_PATH file,
embedded in the backing_file struct for the user_path with a copy of
the original user file creds and with an independent a security context.
This internal O_PATH file is going to be used by LSM mprotect hooks
to check the permissions to mprotect against a copy of the original
mmaped file credentials.
The internal user_path_file is only exported as a const pointer and
its refcnt is initialized to FILE_REF_DEAD, because it is not an actual
refcounted object. The file_ref_init() helper was changed to accept
the FILE_REF_ constant instead of the fake +1 integer count.
The internal O_PATH file is opened using a new kernel helper
kernel_path_file_open(), which skips all the generic code in
do_dentry_open() and does only the essentials, so that the internal
O_PATH file could be opened with a negative dentry.
This is needed for backing_tmpfile_open() to open a backing O_PATH
tmpfile before instantiating the dentry.
The callers of backing_tmpfile_open() are responsible for calling
backing_tmpfile_finish() after making the path positive.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
Christian,
7th time is a charm (?).
Here is another try to introduce backing_file_user_path_file(),
after dealing with the the syzbot report [1] v6 [2] introduces.
Thanks,
Amir.
Changes since v6:
- Create helper for internal O_PATH open with negative path
- Create backing_tmpfile_finish() API to fixup the negative path
[1] https://syzkaller.appspot.com/bug?extid=f34aab278bf5d664e2be
[2] https://lore.kernel.org/linux-fsdevel/20260318131258.1457101-1-amir73il@gmail.com/
fs/backing-file.c | 32 +++++++++++-------
fs/erofs/ishare.c | 13 ++++++--
fs/file_table.c | 63 +++++++++++++++++++++++++++++-------
fs/fuse/passthrough.c | 3 +-
fs/internal.h | 12 +++++--
fs/open.c | 34 ++++++++++++++++---
fs/overlayfs/dir.c | 5 ++-
fs/overlayfs/file.c | 1 +
include/linux/backing-file.h | 30 +++++++++++++++--
include/linux/file_ref.h | 4 +--
10 files changed, 159 insertions(+), 38 deletions(-)
diff --git a/fs/backing-file.c b/fs/backing-file.c
index 45da8600d5644..77935c93333ab 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -11,6 +11,7 @@
#include <linux/fs.h>
#include <linux/backing-file.h>
#include <linux/splice.h>
+#include <linux/uio.h>
#include <linux/mm.h>
#include "internal.h"
@@ -18,9 +19,10 @@
/**
* backing_file_open - open a backing file for kernel internal use
* @user_path: path that the user reuqested to open
+ * @user_cred: credentials that the user used for open
* @flags: open flags
* @real_path: path of the backing file
- * @cred: credentials for open
+ * @cred: credentials for open of the backing file
*
* Open a backing file for a stackable filesystem (e.g., overlayfs).
* @user_path may be on the stackable filesystem and @real_path on the
@@ -29,20 +31,21 @@
* returned file into a container structure that also stores the stacked
* file's path, which can be retrieved using backing_file_user_path().
*/
-struct file *backing_file_open(const struct path *user_path, int flags,
+struct file *backing_file_open(const struct path *user_path,
+ const struct cred *user_cred, int flags,
const struct path *real_path,
const struct cred *cred)
{
struct file *f;
int error;
- f = alloc_empty_backing_file(flags, cred);
+ f = alloc_empty_backing_file(flags, cred, user_cred);
if (IS_ERR(f))
return f;
- path_get(user_path);
- backing_file_set_user_path(f, user_path);
- error = vfs_open(real_path, f);
+ error = backing_file_open_user_path(f, user_path);
+ if (!error)
+ error = vfs_open(real_path, f);
if (error) {
fput(f);
f = ERR_PTR(error);
@@ -52,7 +55,8 @@ struct file *backing_file_open(const struct path *user_path, int flags,
}
EXPORT_SYMBOL_GPL(backing_file_open);
-struct file *backing_tmpfile_open(const struct path *user_path, int flags,
+struct file *backing_tmpfile_open(const struct path *user_path,
+ const struct cred *user_cred, int flags,
const struct path *real_parentpath,
umode_t mode, const struct cred *cred)
{
@@ -60,13 +64,13 @@ struct file *backing_tmpfile_open(const struct path *user_path, int flags,
struct file *f;
int error;
- f = alloc_empty_backing_file(flags, cred);
+ f = alloc_empty_backing_file(flags, cred, user_cred);
if (IS_ERR(f))
return f;
- path_get(user_path);
- backing_file_set_user_path(f, user_path);
- error = vfs_tmpfile(real_idmap, real_parentpath, f, mode);
+ error = backing_file_open_user_path(f, user_path);
+ if (!error)
+ error = vfs_tmpfile(real_idmap, real_parentpath, f, mode);
if (error) {
fput(f);
f = ERR_PTR(error);
@@ -75,6 +79,12 @@ struct file *backing_tmpfile_open(const struct path *user_path, int flags,
}
EXPORT_SYMBOL(backing_tmpfile_open);
+void backing_tmpfile_finish(struct file *file)
+{
+ backing_file_set_user_path_inode(file);
+}
+EXPORT_SYMBOL_GPL(backing_tmpfile_finish);
+
struct backing_aio {
struct kiocb iocb;
refcount_t ref;
diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c
index 829d50d5c717d..f3a5fb0bffaf0 100644
--- a/fs/erofs/ishare.c
+++ b/fs/erofs/ishare.c
@@ -103,18 +103,25 @@ static int erofs_ishare_file_open(struct inode *inode, struct file *file)
{
struct inode *sharedinode = EROFS_I(inode)->sharedinode;
struct file *realfile;
+ int err;
if (file->f_flags & O_DIRECT)
return -EINVAL;
- realfile = alloc_empty_backing_file(O_RDONLY|O_NOATIME, current_cred());
+ realfile = alloc_empty_backing_file(O_RDONLY|O_NOATIME, current_cred(),
+ file->f_cred);
if (IS_ERR(realfile))
return PTR_ERR(realfile);
+
+ err = backing_file_open_user_path(realfile, &file->f_path);
+ if (err) {
+ fput(realfile);
+ return err;
+ }
+
ihold(sharedinode);
realfile->f_op = &erofs_file_fops;
realfile->f_inode = sharedinode;
realfile->f_mapping = sharedinode->i_mapping;
- path_get(&file->f_path);
- backing_file_set_user_path(realfile, &file->f_path);
file_ra_state_init(&realfile->f_ra, file->f_mapping);
realfile->private_data = EROFS_I(inode);
diff --git a/fs/file_table.c b/fs/file_table.c
index aaa5faaace1e9..a4d1064d50896 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -27,6 +27,7 @@
#include <linux/task_work.h>
#include <linux/swap.h>
#include <linux/kmemleak.h>
+#include <linux/backing-file.h>
#include <linux/atomic.h>
@@ -43,11 +44,11 @@ static struct kmem_cache *bfilp_cachep __ro_after_init;
static struct percpu_counter nr_files __cacheline_aligned_in_smp;
-/* Container for backing file with optional user path */
+/* Container for backing file with optional user path file */
struct backing_file {
struct file file;
union {
- struct path user_path;
+ struct file user_path_file;
freeptr_t bf_freeptr;
};
};
@@ -56,24 +57,54 @@ struct backing_file {
const struct path *backing_file_user_path(const struct file *f)
{
- return &backing_file(f)->user_path;
+ return &backing_file(f)->user_path_file.f_path;
}
EXPORT_SYMBOL_GPL(backing_file_user_path);
-void backing_file_set_user_path(struct file *f, const struct path *path)
+const struct file *backing_file_user_path_file(const struct file *f)
{
- backing_file(f)->user_path = *path;
+ return &backing_file(f)->user_path_file;
}
-EXPORT_SYMBOL_GPL(backing_file_set_user_path);
+EXPORT_SYMBOL_GPL(backing_file_user_path_file);
-static inline void file_free(struct file *f)
+int backing_file_open_user_path(struct file *f, const struct path *path)
+{
+ if (WARN_ON(!(f->f_mode & FMODE_BACKING)))
+ return -EIO;
+ kernel_path_file_open(&backing_file(f)->user_path_file, path);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(backing_file_open_user_path);
+
+void backing_file_set_user_path_inode(struct file *f)
+{
+ if (WARN_ON(!(f->f_mode & FMODE_BACKING)))
+ return;
+ file_set_d_inode(&backing_file(f)->user_path_file);
+}
+EXPORT_SYMBOL_GPL(backing_file_set_user_path_inode);
+
+static void destroy_file(struct file *f)
{
security_file_free(f);
+ put_cred(f->f_cred);
+}
+
+static inline void file_free(struct file *f)
+{
+ destroy_file(f);
if (likely(!(f->f_mode & FMODE_NOACCOUNT)))
percpu_counter_dec(&nr_files);
- put_cred(f->f_cred);
if (unlikely(f->f_mode & FMODE_BACKING)) {
- path_put(backing_file_user_path(f));
+ struct file *user_path_file = &backing_file(f)->user_path_file;
+
+ /*
+ * no refcount on the user_path_file - they die together,
+ * so __fput() is not called for user_path_file. path_put()
+ * is the only relevant cleanup from __fput().
+ */
+ destroy_file(user_path_file);
+ path_put(&user_path_file->__f_path);
kmem_cache_free(bfilp_cachep, backing_file(f));
} else {
kmem_cache_free(filp_cachep, f);
@@ -201,7 +232,7 @@ static int init_file(struct file *f, int flags, const struct cred *cred)
* fget-rcu pattern users need to be able to handle spurious
* refcount bumps we should reinitialize the reused file first.
*/
- file_ref_init(&f->f_ref, 1);
+ file_ref_init(&f->f_ref, FILE_REF_ONEREF);
return 0;
}
@@ -290,7 +321,8 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred)
* This is only for kernel internal use, and the allocate file must not be
* installed into file tables or such.
*/
-struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred,
+ const struct cred *user_cred)
{
struct backing_file *ff;
int error;
@@ -305,6 +337,15 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
return ERR_PTR(error);
}
+ error = init_file(&ff->user_path_file, O_PATH, user_cred);
+ /* user_path_file is not refcounterd - it dies with the backing file */
+ file_ref_init(&ff->user_path_file.f_ref, FILE_REF_DEAD);
+ if (unlikely(error)) {
+ destroy_file(&ff->file);
+ kmem_cache_free(bfilp_cachep, ff);
+ return ERR_PTR(error);
+ }
+
ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT;
return &ff->file;
}
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 72de97c03d0ee..60018c6359342 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -10,6 +10,7 @@
#include <linux/file.h>
#include <linux/backing-file.h>
#include <linux/splice.h>
+#include <linux/uio.h>
static void fuse_file_accessed(struct file *file)
{
@@ -167,7 +168,7 @@ struct fuse_backing *fuse_passthrough_open(struct file *file, int backing_id)
goto out;
/* Allocate backing file per fuse file to store fuse path */
- backing_file = backing_file_open(&file->f_path, file->f_flags,
+ backing_file = backing_file_open(&file->f_path, file->f_cred, file->f_flags,
&fb->file->f_path, fb->cred);
err = PTR_ERR(backing_file);
if (IS_ERR(backing_file)) {
diff --git a/fs/internal.h b/fs/internal.h
index cbc384a1aa096..4a9e5e00678d9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -106,8 +106,16 @@ extern void chroot_fs_refs(const struct path *, const struct path *);
*/
struct file *alloc_empty_file(int flags, const struct cred *cred);
struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
-struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
-void backing_file_set_user_path(struct file *f, const struct path *path);
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred,
+ const struct cred *user_cred);
+int backing_file_open_user_path(struct file *f, const struct path *path);
+void backing_file_set_user_path_inode(struct file *f);
+void kernel_path_file_open(struct file *f, const struct path *path);
+
+static inline void file_set_d_inode(struct file *f)
+{
+ f->f_inode = d_inode(f->f_path.dentry);
+}
static inline void file_put_write_access(struct file *file)
{
diff --git a/fs/open.c b/fs/open.c
index 91f1139591abe..a7b3b04cd9ae7 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -884,10 +884,38 @@ static inline int file_get_write_access(struct file *f)
return error;
}
+static const struct file_operations empty_fops = {};
+
+static void do_path_file_open(struct file *f)
+{
+ f->f_mode = FMODE_PATH | FMODE_OPENED;
+ file_set_fsnotify_mode(f, FMODE_NONOTIFY);
+ f->f_op = &empty_fops;
+}
+
+/**
+ * kernel_path_file_open - open an O_PATH file for kernel internal use
+ * @f: pre-allocated file with f_flags and f_cred initialized
+ * @path: path to reference (may have a negative dentry)
+ *
+ * Open a minimal O_PATH file that only references a path.
+ * Unlike vfs_open(), this does not require a positive dentry and does not
+ * set up f_mapping and other fields not needed for O_PATH.
+ * If path is negative at the time of this call, the caller is responsible for
+ * callingn backing_file_set_user_path_inode() after making the path positive.
+
+ */
+void kernel_path_file_open(struct file *f, const struct path *path)
+{
+ f->__f_path = *path;
+ path_get(&f->f_path);
+ file_set_d_inode(f);
+ do_path_file_open(f);
+}
+
static int do_dentry_open(struct file *f,
int (*open)(struct inode *, struct file *))
{
- static const struct file_operations empty_fops = {};
struct inode *inode = f->f_path.dentry->d_inode;
int error;
@@ -898,9 +926,7 @@ static int do_dentry_open(struct file *f,
f->f_sb_err = file_sample_sb_err(f);
if (unlikely(f->f_flags & O_PATH)) {
- f->f_mode = FMODE_PATH | FMODE_OPENED;
- file_set_fsnotify_mode(f, FMODE_NONOTIFY);
- f->f_op = &empty_fops;
+ do_path_file_open(f);
return 0;
}
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index ff3dbd1ca61f2..24e961f165a78 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -1374,7 +1374,8 @@ static int ovl_create_tmpfile(struct file *file, struct dentry *dentry,
return PTR_ERR(cred);
ovl_path_upper(dentry->d_parent, &realparentpath);
- realfile = backing_tmpfile_open(&file->f_path, flags, &realparentpath,
+ realfile = backing_tmpfile_open(&file->f_path, file->f_cred,
+ flags, &realparentpath,
mode, current_cred());
err = PTR_ERR_OR_ZERO(realfile);
pr_debug("tmpfile/open(%pd2, 0%o) = %i\n", realparentpath.dentry, mode, err);
@@ -1392,6 +1393,8 @@ static int ovl_create_tmpfile(struct file *file, struct dentry *dentry,
err = ovl_instantiate(dentry, inode, newdentry, false, file);
if (!err) {
file->private_data = of;
+ /* user_path_file was opened with a negative path */
+ backing_tmpfile_finish(realfile);
} else {
dput(newdentry);
ovl_file_free(of);
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 97bed2286030d..767c128407fcc 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -49,6 +49,7 @@ static struct file *ovl_open_realfile(const struct file *file,
flags &= ~O_NOATIME;
realfile = backing_file_open(file_user_path(file),
+ file_user_cred(file),
flags, realpath, current_cred());
}
}
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
index 1476a6ed1bfd7..52ac51ada6ff9 100644
--- a/include/linux/backing-file.h
+++ b/include/linux/backing-file.h
@@ -9,21 +9,45 @@
#define _LINUX_BACKING_FILE_H
#include <linux/file.h>
-#include <linux/uio.h>
#include <linux/fs.h>
+/*
+ * When mmapping a file on a stackable filesystem (e.g., overlayfs), the file
+ * stored in ->vm_file is a backing file whose f_inode is on the underlying
+ * filesystem.
+ *
+ * LSM can use file_user_path_file() to store context related to the user path
+ * that was opened and mmaped.
+ */
+const struct file *backing_file_user_path_file(const struct file *f);
+
+static inline const struct file *file_user_path_file(const struct file *f)
+{
+ if (f && unlikely(f->f_mode & FMODE_BACKING))
+ return backing_file_user_path_file(f);
+ return f;
+}
+
+static inline const struct cred *file_user_cred(const struct file *f)
+{
+ return file_user_path_file(f)->f_cred;
+}
+
struct backing_file_ctx {
const struct cred *cred;
void (*accessed)(struct file *file);
void (*end_write)(struct kiocb *iocb, ssize_t);
};
-struct file *backing_file_open(const struct path *user_path, int flags,
+struct file *backing_file_open(const struct path *user_path,
+ const struct cred *user_cred, int flags,
const struct path *real_path,
const struct cred *cred);
-struct file *backing_tmpfile_open(const struct path *user_path, int flags,
+struct file *backing_tmpfile_open(const struct path *user_path,
+ const struct cred *user_cred, int flags,
const struct path *real_parentpath,
umode_t mode, const struct cred *cred);
+void backing_tmpfile_finish(struct file *file);
ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
struct kiocb *iocb, int flags,
struct backing_file_ctx *ctx);
diff --git a/include/linux/file_ref.h b/include/linux/file_ref.h
index 31551e4cb8f34..c7512ce70f9c4 100644
--- a/include/linux/file_ref.h
+++ b/include/linux/file_ref.h
@@ -54,11 +54,11 @@ typedef struct {
/**
* file_ref_init - Initialize a file reference count
* @ref: Pointer to the reference count
- * @cnt: The initial reference count typically '1'
+ * @cnt: The initial reference count typically FILE_REF_ONEREF
*/
static inline void file_ref_init(file_ref_t *ref, unsigned long cnt)
{
- atomic_long_set(&ref->refcnt, cnt - 1);
+ atomic_long_set(&ref->refcnt, cnt);
}
bool __file_ref_put(file_ref_t *ref, unsigned long cnt);
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v6 3/9] landlock: Control pathname UNIX domain socket resolution by path
From: Günther Noack @ 2026-03-20 16:15 UTC (permalink / raw)
To: Mickaël Salaün
Cc: John Johansen, Tingmao Wang, Justin Suess,
Sebastian Andrzej Siewior, Kuniyuki Iwashima, Jann Horn,
linux-security-module, Samasth Norway Ananda, Matthieu Buffet,
Mikhail Ivanov, konstantin.meskhidze, Demi Marie Obenour,
Alyssa Ross, Tahera Fahimi
In-Reply-To: <20260318.peecoo2Ooyep@digikod.net>
Hello!
On Wed, Mar 18, 2026 at 05:52:48PM +0100, Mickaël Salaün wrote:
> On Sun, Mar 15, 2026 at 11:21:44PM +0100, Günther Noack wrote:
> > * Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
> > controls the look up operations for named UNIX domain sockets. The
>
> lookup
Done.
> > resolution happens during connect() and sendmsg() (depending on
> > socket type).
> > * Hook into the path lookup in unix_find_bsd() in af_unix.c, using a
> > LSM hook. Make policy decisions based on the new access rights
> > * Increment the Landlock ABI version.
> > * Minor test adaptions to keep the tests working.
>
> adaptations
Done.
> > * Document the design rationale for scoped access rights,
> > and cross-reference it from the header documentation.
> >
> > With this access right, access is granted if either of the following
> > conditions is met:
> >
> > * The target socket's filesystem path was allow-listed using a
> > LANDLOCK_RULE_PATH_BENEATH rule, *or*:
> > * The target socket was created in the same Landlock domain in which
> > LANDLOCK_ACCESS_FS_RESOLVE_UNIX was restricted.
> >
> > In case of a denial, connect() and sendmsg() return EACCES, which is
> > the same error as it is returned if the user does not have the write
> > bit in the traditional Unix file system permissions of that file.
>
> UNIX
DONE
> > Document the (possible future) interaction between scoped flags and
> > other access rights in struct landlock_ruleset_attr, and summarize the
> > rationale, as discussed in code review leading up to [2].
> >
> > This feature was created with substantial discussion and input from
> > Justin Suess, Tingmao Wang and Mickaël Salaün.
> >
> > Cc: Tingmao Wang <m@maowtm.org>
> > Cc: Justin Suess <utilityemal77@gmail.com>
> > Cc: Mickaël Salaün <mic@digikod.net>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: Kuniyuki Iwashima <kuniyu@google.com>
> > Suggested-by: Jann Horn <jannh@google.com>
> > Link[1]: https://github.com/landlock-lsm/linux/issues/36
> > Link[2]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
> > Signed-off-by: Günther Noack <gnoack3000@gmail.com>
> > ---
> > Documentation/security/landlock.rst | 40 +++++++
> > include/uapi/linux/landlock.h | 19 ++++
> > security/landlock/access.h | 2 +-
> > security/landlock/audit.c | 1 +
> > security/landlock/fs.c | 110 ++++++++++++++++++-
> > security/landlock/limits.h | 2 +-
> > security/landlock/syscalls.c | 2 +-
> > tools/testing/selftests/landlock/base_test.c | 2 +-
> > tools/testing/selftests/landlock/fs_test.c | 5 +-
> > 9 files changed, 176 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > index 3e4d4d04cfae..4bbe250a6829 100644
> > --- a/Documentation/security/landlock.rst
> > +++ b/Documentation/security/landlock.rst
> > @@ -89,6 +89,46 @@ this is required to keep access controls consistent over the whole system, and
> > this avoids unattended bypasses through file descriptor passing (i.e. confused
> > deputy attack).
> >
> > +.. _scoped-flags-interaction:
> > +
> > +Interaction between scoped flags and other access rights
> > +--------------------------------------------------------
> > +
> > +The ``scoped`` flags in ``struct landlock_ruleset_attr`` restrict the
> > +use of *outgoing* IPC from the created Landlock domain, while they
> > +permit reaching out to IPC endpoints *within* the created Landlock
> > +domain.
> > +
> > +In the future, scoped flags *may* interact with other access rights,
> > +e.g. so that abstract UNIX sockets can be allow-listed by name, or so
> > +that signals can be allow-listed by signal number or target process.
> > +
> > +When introducing ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``, we defined it to
> > +implicitly have the same scoping semantics as a
> > +``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` flag would have: connecting to
> > +UNIX sockets within the same domain (where
> > +``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` is used) is unconditionally
> > +allowed.
> > +
> > +The reasoning is:
> > +
> > +* Like other IPC mechanisms, connecting to named UNIX sockets in the
> > + same domain should be expected and harmless. (If needed, users can
> > + further refine their Landlock policies with nested domains or by
> > + restricting ``LANDLOCK_ACCESS_FS_MAKE_SOCK``.)
> > +* We reserve the option to still introduce
> > + ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the future. (This would
> > + be useful if we wanted to have a Landlock rule to permit IPC access
> > + to other Landlock domains.)
> > +* But we can postpone the point in time when users have to deal with
> > + two interacting flags visible in the userspace API. (In particular,
> > + it is possible that it won't be needed in practice, in which case we
> > + can avoid the second flag altogether.)
> > +* If we *do* introduce ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the
> > + future, setting this scoped flag in a ruleset does *not reduce* the
> > + restrictions, because access within the same scope is already
> > + allowed based on ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``.
> > +
> > Tests
> > =====
> >
> > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > index f88fa1f68b77..751e3c143cba 100644
> > --- a/include/uapi/linux/landlock.h
> > +++ b/include/uapi/linux/landlock.h
> > @@ -248,6 +248,24 @@ struct landlock_net_port_attr {
> > *
> > * This access right is available since the fifth version of the Landlock
> > * ABI.
> > + * - %LANDLOCK_ACCESS_FS_RESOLVE_UNIX: Look up pathname UNIX domain sockets
> > + * (:manpage:`unix(7)`). On UNIX domain sockets, this restricts both calls to
> > + * :manpage:`connect(2)` as well as calls to :manpage:`sendmsg(2)` with an
> > + * explicit recipient address.
> > + *
> > + * This access right only applies to connections to UNIX server sockets which
> > + * were created outside of the newly created Landlock domain (e.g. from within
> > + * a parent domain or from an unrestricted process). Newly created UNIX
> > + * servers within the same Landlock domain continue to be accessible. In this
> > + * regard, %LANDLOCK_ACCESS_RESOLVE_UNIX has the same semantics as the
>
> LANDLOCK_ACCESS_FS_RESOLVE_UNIX
Whoops, done.
> > + * ``LANDLOCK_SCOPE_*`` flags.
> > + *
> > + * If a resolve attempt is denied, the operation returns an ``EACCES`` error,
> > + * in line with other filesystem access rights (but different to denials for
> > + * abstract UNIX domain sockets).
>
> This access right is available since the ninth version of the Landlock ABI.
Thanks, added.
> > + *
> > + * The rationale for this design is described in
> > + * :ref:`Documentation/security/landlock.rst <scoped-flags-interaction>`.
> > *
> > * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
> > * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> > @@ -333,6 +351,7 @@ struct landlock_net_port_attr {
> > #define LANDLOCK_ACCESS_FS_REFER (1ULL << 13)
> > #define LANDLOCK_ACCESS_FS_TRUNCATE (1ULL << 14)
> > #define LANDLOCK_ACCESS_FS_IOCTL_DEV (1ULL << 15)
> > +#define LANDLOCK_ACCESS_FS_RESOLVE_UNIX (1ULL << 16)
> > /* clang-format on */
> >
> > /**
> > diff --git a/security/landlock/access.h b/security/landlock/access.h
> > index 42c95747d7bd..89dc8e7b93da 100644
> > --- a/security/landlock/access.h
> > +++ b/security/landlock/access.h
> > @@ -34,7 +34,7 @@
> > LANDLOCK_ACCESS_FS_IOCTL_DEV)
> > /* clang-format on */
> >
> > -typedef u16 access_mask_t;
> > +typedef u32 access_mask_t;
>
> This change and the underlying implications are not explained in the
> commit message, especially regarding the stack delta.
Thanks, will add it.
> > /* Makes sure all filesystem access rights can be stored. */
> > static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
> > diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> > index 60ff217ab95b..8d0edf94037d 100644
> > --- a/security/landlock/audit.c
> > +++ b/security/landlock/audit.c
> > @@ -37,6 +37,7 @@ static const char *const fs_access_strings[] = {
> > [BIT_INDEX(LANDLOCK_ACCESS_FS_REFER)] = "fs.refer",
> > [BIT_INDEX(LANDLOCK_ACCESS_FS_TRUNCATE)] = "fs.truncate",
> > [BIT_INDEX(LANDLOCK_ACCESS_FS_IOCTL_DEV)] = "fs.ioctl_dev",
> > + [BIT_INDEX(LANDLOCK_ACCESS_FS_RESOLVE_UNIX)] = "fs.resolve_unix",
> > };
> >
> > static_assert(ARRAY_SIZE(fs_access_strings) == LANDLOCK_NUM_ACCESS_FS);
> > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > index 97065d51685a..0486f5ab06c9 100644
> > --- a/security/landlock/fs.c
> > +++ b/security/landlock/fs.c
> > @@ -27,6 +27,7 @@
> > #include <linux/lsm_hooks.h>
> > #include <linux/mount.h>
> > #include <linux/namei.h>
> > +#include <linux/net.h>
> > #include <linux/path.h>
> > #include <linux/pid.h>
> > #include <linux/rcupdate.h>
> > @@ -36,6 +37,7 @@
> > #include <linux/types.h>
> > #include <linux/wait_bit.h>
> > #include <linux/workqueue.h>
> > +#include <net/af_unix.h>
> > #include <uapi/linux/fiemap.h>
> > #include <uapi/linux/landlock.h>
> >
> > @@ -314,7 +316,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
> > LANDLOCK_ACCESS_FS_WRITE_FILE | \
> > LANDLOCK_ACCESS_FS_READ_FILE | \
> > LANDLOCK_ACCESS_FS_TRUNCATE | \
> > - LANDLOCK_ACCESS_FS_IOCTL_DEV)
> > + LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> > + LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
> > /* clang-format on */
> >
> > /*
> > @@ -1557,6 +1560,110 @@ static int hook_path_truncate(const struct path *const path)
> > return current_check_access_path(path, LANDLOCK_ACCESS_FS_TRUNCATE);
> > }
> >
> > +/**
> > + * unmask_scoped_access - Remove access right bits in @masks in all layers
> > + * where @client and @server have the same domain
> > + *
> > + * This does the same as domain_is_scoped(), but unmasks bits in @masks.
> > + * It can not return early as domain_is_scoped() does.
>
> I'd like a summary of your previous excellent explanation of
> unmask_scoped_access() in this comment.
Adding:
A scoped access for a given access right bit is allowed iff, for all
layer depths where the access bit is set, the client and server
domain are the same. This function clears the access rights @access
in @masks at all layer depths where the client and server domain are
the same, so that, when they are all cleared, the access is allowed.
It's not as detailed as drawing a picture in the other mail, but I
hope it helps.
> > + * @client: Client domain
> > + * @server: Server domain
> > + * @masks: Layer access masks to unmask
> > + * @access: Access bit that controls scoping
> > + */
> > +static void unmask_scoped_access(const struct landlock_ruleset *const client,
> > + const struct landlock_ruleset *const server,
> > + struct layer_access_masks *const masks,
> > + const access_mask_t access)
> > +{
> > + int client_layer, server_layer;
> > + const struct landlock_hierarchy *client_walker, *server_walker;
> > +
> > + /* This should not happen. */
> > + if (WARN_ON_ONCE(!client))
> > + return;
> > +
> > + /* Server has no Landlock domain; nothing to clear. */
> > + if (!server)
> > + return;
> > +
>
> Please also copy the BUILD_BUG_ON() from domain_is_scoped().
I don't understand what this check is good for. It says:
/*
* client_layer must be a signed integer with greater capacity
* than client->num_layers to ensure the following loop stops.
*/
BUILD_BUG_ON(sizeof(client_layer) > sizeof(client->num_layers));
For the loop to terminate, in my understanding, client_layer must be
able to store client->num_layers - 1 down to - 1, but that is anyway a
given since num_layers can't exceed 16 and client_layer is signed. It
seems that the termination of this would anyway be caught in our tests
as well?
Could you please clarify what this BUILD_BUG_ON() is trying to assert?
> > + client_layer = client->num_layers - 1;
> > + client_walker = client->hierarchy;
> > + server_layer = server->num_layers - 1;
> > + server_walker = server->hierarchy;
> > +
> > + /*
> > + * Clears the access bits at all layers where the client domain is the
> > + * same as the server domain. We start the walk at min(client_layer,
> > + * server_layer). The layer bits until there can not be cleared because
> > + * either the client or the server domain is missing.
> > + */
> > + for (; client_layer > server_layer; client_layer--)
> > + client_walker = client_walker->parent;
> > +
> > + for (; server_layer > client_layer; server_layer--)
> > + server_walker = server_walker->parent;
> > +
> > + for (; client_layer >= 0; client_layer--) {
> > + if (masks->access[client_layer] & access &&
> > + client_walker == server_walker)
> > + masks->access[client_layer] &= ~access;
> > +
> > + client_walker = client_walker->parent;
> > + server_walker = server_walker->parent;
> > + }
> > +}
> > +
> > +static int hook_unix_find(const struct path *const path, struct sock *other,
> > + int flags)
> > +{
> > + const struct landlock_ruleset *dom_other;
> > + const struct landlock_cred_security *subject;
> > + struct layer_access_masks layer_masks;
> > + struct landlock_request request = {};
> > + static const struct access_masks fs_resolve_unix = {
> > + .fs = LANDLOCK_ACCESS_FS_RESOLVE_UNIX,
> > + };
> > +
> > + /* Lookup for the purpose of saving coredumps is OK. */
> > + if (unlikely(flags & SOCK_COREDUMP))
> > + return 0;
> > +
> > + /* Access to the same (or a lower) domain is always allowed. */
>
> This comment is related to the unmask_scoped_access() call.
Thanks, I moved it down.
> > + subject = landlock_get_applicable_subject(current_cred(),
> > + fs_resolve_unix, NULL);
> > +
> > + if (!subject)
> > + return 0;
> > +
> > + if (!landlock_init_layer_masks(subject->domain, fs_resolve_unix.fs,
> > + &layer_masks, LANDLOCK_KEY_INODE))
>
> This case is not possible because landlock_get_applicable_subject()
> already check it. Other hooks just ignore the returned value in this
> case.
Hm, fair enough. I added a comment to explain why we are ignoring the
return value, as it wasn't as obvious to me. In the other places, we
are using the result of the landlock_init_layer_masks() function
(because in the generic case, it can be a subset of the original
access rights).
> > + return 0;
> > +
> > + /* Checks the layers in which we are connecting within the same domain. */
> > + unix_state_lock(other);
> > + if (unlikely(sock_flag(other, SOCK_DEAD) || !other->sk_socket ||
> > + !other->sk_socket->file)) {
> > + unix_state_unlock(other);
> > + return 0;
>
> Is it safe to not return -ECONNREFUSED?
Yes. My reasoning is:
In all three places where this gets called in af_unix.c (stream
connect, dgram connect, dgram send), these functions check for socket
death shortly after, and if they find the socket to be SOCK_DEAD, they
will *retry* the UNIX lookup. The code commentary about this says
that this is for a race condition where the VFS has "overslept" the
socket death, so I presume that the retry aims at getting a race-free
sitation on the next attempt.
Since sock_orphan() is a one-way teardown operation, when we observe
SOCK_DEAD in our hook, we can be sure that the caller will see it as
well when it does the same check a bit later after our hook.
If we *were* to return -ECONNREFUSED, the caller would immediately
return an error though, and it would not retry as it normally does
when it encounters this race condition. So we have to return 0 here.
> > + }
> > + dom_other = landlock_cred(other->sk_socket->file->f_cred)->domain;
> > + unix_state_unlock(other);
> > +
> > + unmask_scoped_access(subject->domain, dom_other, &layer_masks,
> > + fs_resolve_unix.fs);
>
> dom_other is not safe to use without the lock.
Thanks, fixed by extending the lock scope across that function call,
as discussed in other thread in more detail.
> > + /* Checks the connections to allow-listed paths. */
> > + if (is_access_to_paths_allowed(subject->domain, path,
> > + fs_resolve_unix.fs, &layer_masks,
> > + &request, NULL, 0, NULL, NULL, NULL))
> > + return 0;
> > +
> > + landlock_log_denial(subject, &request);
> > + return -EACCES;
> > +}
> > +
> > /* File hooks */
> >
> > /**
> > @@ -1834,6 +1941,7 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
> > LSM_HOOK_INIT(path_unlink, hook_path_unlink),
> > LSM_HOOK_INIT(path_rmdir, hook_path_rmdir),
> > LSM_HOOK_INIT(path_truncate, hook_path_truncate),
> > + LSM_HOOK_INIT(unix_find, hook_unix_find),
> >
> > LSM_HOOK_INIT(file_alloc_security, hook_file_alloc_security),
> > LSM_HOOK_INIT(file_open, hook_file_open),
> > diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> > index eb584f47288d..b454ad73b15e 100644
> > --- a/security/landlock/limits.h
> > +++ b/security/landlock/limits.h
> > @@ -19,7 +19,7 @@
> > #define LANDLOCK_MAX_NUM_LAYERS 16
> > #define LANDLOCK_MAX_NUM_RULES U32_MAX
> >
> > -#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_IOCTL_DEV
> > +#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_RESOLVE_UNIX
> > #define LANDLOCK_MASK_ACCESS_FS ((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
> > #define LANDLOCK_NUM_ACCESS_FS __const_hweight64(LANDLOCK_MASK_ACCESS_FS)
> >
> > diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> > index 3b33839b80c7..a6e23657f3ce 100644
> > --- a/security/landlock/syscalls.c
> > +++ b/security/landlock/syscalls.c
> > @@ -166,7 +166,7 @@ static const struct file_operations ruleset_fops = {
> > * If the change involves a fix that requires userspace awareness, also update
> > * the errata documentation in Documentation/userspace-api/landlock.rst .
> > */
> > -const int landlock_abi_version = 8;
> > +const int landlock_abi_version = 9;
> >
> > /**
> > * sys_landlock_create_ruleset - Create a new ruleset
> > diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
> > index 0fea236ef4bd..30d37234086c 100644
> > --- a/tools/testing/selftests/landlock/base_test.c
> > +++ b/tools/testing/selftests/landlock/base_test.c
> > @@ -76,7 +76,7 @@ TEST(abi_version)
> > const struct landlock_ruleset_attr ruleset_attr = {
> > .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
> > };
> > - ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
> > + ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
> > LANDLOCK_CREATE_RULESET_VERSION));
> >
> > ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
> > diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
> > index 968a91c927a4..b318627e7561 100644
> > --- a/tools/testing/selftests/landlock/fs_test.c
> > +++ b/tools/testing/selftests/landlock/fs_test.c
> > @@ -575,9 +575,10 @@ TEST_F_FORK(layout1, inval)
> > LANDLOCK_ACCESS_FS_WRITE_FILE | \
> > LANDLOCK_ACCESS_FS_READ_FILE | \
> > LANDLOCK_ACCESS_FS_TRUNCATE | \
> > - LANDLOCK_ACCESS_FS_IOCTL_DEV)
> > + LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> > + LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
> >
> > -#define ACCESS_LAST LANDLOCK_ACCESS_FS_IOCTL_DEV
> > +#define ACCESS_LAST LANDLOCK_ACCESS_FS_RESOLVE_UNIX
> >
> > #define ACCESS_ALL ( \
> > ACCESS_FILE | \
> > --
> > 2.53.0
> >
> >
–Günther
^ permalink raw reply
* Re: [PATCH v6 7/9] landlock/selftests: Check that coredump sockets stay unrestricted
From: Günther Noack @ 2026-03-20 16:44 UTC (permalink / raw)
To: Mickaël Salaün
Cc: John Johansen, linux-security-module, Tingmao Wang, Justin Suess,
Samasth Norway Ananda, Matthieu Buffet, Mikhail Ivanov,
konstantin.meskhidze, Demi Marie Obenour, Alyssa Ross, Jann Horn,
Tahera Fahimi, Sebastian Andrzej Siewior, Kuniyuki Iwashima
In-Reply-To: <20260318.bahDieJohf1u@digikod.net>
On Wed, Mar 18, 2026 at 05:53:59PM +0100, Mickaël Salaün wrote:
> On Sun, Mar 15, 2026 at 11:21:48PM +0100, Günther Noack wrote:
> > Even when a process is restricted with the new
> > LANDLOCK_ACCESS_FS_RESOLVE_SOCKET right, the kernel can continue
>
> LANDLOCK_ACCESS_FS_RESOLVE_UNIX (twice)
Thanks, well spotted! Fixed.
–Günther
^ permalink raw reply
* Re: [PATCH v3 3/3] ima: Add support for staging measurements for deletion
From: steven chen @ 2026-03-20 16:58 UTC (permalink / raw)
To: Mimi Zohar, Roberto Sassu, corbet, skhan, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, nramas, Roberto Sassu, steven chen
In-Reply-To: <ffe1d4645a66a690892163be8e16c4b5d24a690d.camel@linux.ibm.com>
On 3/20/2026 5:41 AM, Mimi Zohar wrote:
> On Thu, 2026-03-19 at 14:31 -0700, steven chen wrote:
>
>>> - Support for deleting N measurement records (and pre-pending the remaining
>>> measurement records)
>> Is there any problem to bring work of "stage" step together to the
>> deletion step?
>>
>> "Trim N" method does everything that "staged" method can do, right?
>> what's the "stage" method can do but "trim N" method can't do?
>>
>> in user space, if in "staged" state, no other user space agent can
>> access the IMA measure list, right?
>>
>> Could you explain the benefit of bringing the "stage" step?
> The performance improvement is because "staging" the IMA measurement list takes
> the lock in order to move the measurement list pointer and then releases it.
> New measurements can then be appended to a new measurement list. Deleting
> records is done without taking the lock to walk the staged measurement list.
>
> Without staging the measurement list, walking the measurement list to trim N
> records requires taking and holding the lock. The performance is dependent on
> the size of the measurement list.
>
> Your question isn't really about "staging" the measurement list records, but
> requiring a userspace signal to delete them. To answer that question, deleting
> N records (third patch) could imply staging all the measurement records and
> immediately deleting N records without an explicit userspace signal.
>
> I expect the requested "documentation" patch will provide the motivation for the
> delayed deletion of the measurement list.
>
> Mimi
"Staging" is great on reducing kernel IMA measurement list locking time.
How about just do "stage N" entries and then delete the staged list in
one shot?
It means merge two APIs into one API
int ima_queue_stage(void)
int ima_queue_delete_staged(unsigned long req_value)
The kernel lock time will be the same. And user space lock time will be
reduced.
Thanks,
Steven
>
>
>
>
>
^ permalink raw reply
* Re: [PATCH v6 9/9] landlock: Document FS access right for pathname UNIX sockets
From: Günther Noack @ 2026-03-20 17:04 UTC (permalink / raw)
To: Mickaël Salaün
Cc: John Johansen, Justin Suess, linux-security-module, Tingmao Wang,
Samasth Norway Ananda, Matthieu Buffet, Mikhail Ivanov,
konstantin.meskhidze, Demi Marie Obenour, Alyssa Ross, Jann Horn,
Tahera Fahimi, Sebastian Andrzej Siewior, Kuniyuki Iwashima
In-Reply-To: <20260318.geom7Taxahpi@digikod.net>
On Wed, Mar 18, 2026 at 05:54:19PM +0100, Mickaël Salaün wrote:
> Please always add some minimal description.
Done.
> Also, as already requested, could you run the check-linux.sh all on each
> patch? That would avoid me to fix things like the date (which would now
> be OK because of the new patch in my next branch, but still).
Will do.
> On Sun, Mar 15, 2026 at 11:21:50PM +0100, Günther Noack wrote:
> > Cc: Justin Suess <utilityemal77@gmail.com>
> > Cc: Mickaël Salaün <mic@digikod.net>
> > Signed-off-by: Günther Noack <gnoack3000@gmail.com>
> > ---
> > Documentation/userspace-api/landlock.rst | 15 ++++++++++++++-
> > 1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> > index 13134bccdd39..e60ebd07c5cc 100644
> > --- a/Documentation/userspace-api/landlock.rst
> > +++ b/Documentation/userspace-api/landlock.rst
> > @@ -77,7 +77,8 @@ to be explicit about the denied-by-default access rights.
> > LANDLOCK_ACCESS_FS_MAKE_SYM |
> > LANDLOCK_ACCESS_FS_REFER |
> > LANDLOCK_ACCESS_FS_TRUNCATE |
> > - LANDLOCK_ACCESS_FS_IOCTL_DEV,
> > + LANDLOCK_ACCESS_FS_IOCTL_DEV |
> > + LANDLOCK_ACCESS_FS_RESOLVE_UNIX,
> > .handled_access_net =
> > LANDLOCK_ACCESS_NET_BIND_TCP |
> > LANDLOCK_ACCESS_NET_CONNECT_TCP,
> > @@ -127,6 +128,11 @@ version, and only use the available subset of access rights:
> > /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
> > ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> > LANDLOCK_SCOPE_SIGNAL);
> > + __attribute__((fallthrough));
>
> Case 6 should be handled too:
>
> case 6 ... 8:
Thank you, good catch!
>
> > + case 7:
> > + case 8:
> > + /* Removes LANDLOCK_ACCESS_FS_RESOLVE_UNIX for ABI < 9 */
> > + ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_RESOLVE_UNIX;
> > }
> >
-Günther
^ permalink raw reply
* Re: [PATCH v3 3/3] ima: Add support for staging measurements for deletion
From: steven chen @ 2026-03-20 17:24 UTC (permalink / raw)
To: Roberto Sassu, Mimi Zohar, corbet, skhan, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, nramas, Roberto Sassu, steven chen
In-Reply-To: <332fc1447c03893988620189a40501cccaa8b4c5.camel@huaweicloud.com>
On 3/20/2026 10:10 AM, Roberto Sassu wrote:
> On Fri, 2026-03-20 at 09:58 -0700, steven chen wrote:
>> On 3/20/2026 5:41 AM, Mimi Zohar wrote:
>>> On Thu, 2026-03-19 at 14:31 -0700, steven chen wrote:
>>>
>>>>> - Support for deleting N measurement records (and pre-pending the remaining
>>>>> measurement records)
>>>> Is there any problem to bring work of "stage" step together to the
>>>> deletion step?
>>>>
>>>> "Trim N" method does everything that "staged" method can do, right?
>>>> what's the "stage" method can do but "trim N" method can't do?
>>>>
>>>> in user space, if in "staged" state, no other user space agent can
>>>> access the IMA measure list, right?
>>>>
>>>> Could you explain the benefit of bringing the "stage" step?
>>> The performance improvement is because "staging" the IMA measurement list takes
>>> the lock in order to move the measurement list pointer and then releases it.
>>> New measurements can then be appended to a new measurement list. Deleting
>>> records is done without taking the lock to walk the staged measurement list.
>>>
>>> Without staging the measurement list, walking the measurement list to trim N
>>> records requires taking and holding the lock. The performance is dependent on
>>> the size of the measurement list.
>>>
>>> Your question isn't really about "staging" the measurement list records, but
>>> requiring a userspace signal to delete them. To answer that question, deleting
>>> N records (third patch) could imply staging all the measurement records and
>>> immediately deleting N records without an explicit userspace signal.
>>>
>>> I expect the requested "documentation" patch will provide the motivation for the
>>> delayed deletion of the measurement list.
>>>
>>> Mimi
>> "Staging" is great on reducing kernel IMA measurement list locking time.
>>
>> How about just do "stage N" entries and then delete the staged list in
>> one shot?
>> It means merge two APIs into one API
>> int ima_queue_stage(void)
>> int ima_queue_delete_staged(unsigned long req_value)
>>
>> The kernel lock time will be the same. And user space lock time will be
>> reduced.
> It is not the same. The walk on the staged list is done without holding
> ima_extend_list_mutex.
>
> Roberto
Is it possible to merge two APIs work into one API?
int ima_queue_stage(void)
int ima_queue_delete_staged(unsigned long req_value)
Thank,
Steven
^ permalink raw reply
* Re: [PATCH v3 3/3] ima: Add support for staging measurements for deletion
From: Roberto Sassu @ 2026-03-20 17:10 UTC (permalink / raw)
To: steven chen, Mimi Zohar, corbet, skhan, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, nramas, Roberto Sassu
In-Reply-To: <c9258708-2db2-4c08-998f-e67a681781da@linux.microsoft.com>
On Fri, 2026-03-20 at 09:58 -0700, steven chen wrote:
> On 3/20/2026 5:41 AM, Mimi Zohar wrote:
> > On Thu, 2026-03-19 at 14:31 -0700, steven chen wrote:
> >
> > > > - Support for deleting N measurement records (and pre-pending the remaining
> > > > measurement records)
> > > Is there any problem to bring work of "stage" step together to the
> > > deletion step?
> > >
> > > "Trim N" method does everything that "staged" method can do, right?
> > > what's the "stage" method can do but "trim N" method can't do?
> > >
> > > in user space, if in "staged" state, no other user space agent can
> > > access the IMA measure list, right?
> > >
> > > Could you explain the benefit of bringing the "stage" step?
> > The performance improvement is because "staging" the IMA measurement list takes
> > the lock in order to move the measurement list pointer and then releases it.
> > New measurements can then be appended to a new measurement list. Deleting
> > records is done without taking the lock to walk the staged measurement list.
> >
> > Without staging the measurement list, walking the measurement list to trim N
> > records requires taking and holding the lock. The performance is dependent on
> > the size of the measurement list.
> >
> > Your question isn't really about "staging" the measurement list records, but
> > requiring a userspace signal to delete them. To answer that question, deleting
> > N records (third patch) could imply staging all the measurement records and
> > immediately deleting N records without an explicit userspace signal.
> >
> > I expect the requested "documentation" patch will provide the motivation for the
> > delayed deletion of the measurement list.
> >
> > Mimi
>
> "Staging" is great on reducing kernel IMA measurement list locking time.
>
> How about just do "stage N" entries and then delete the staged list in
> one shot?
> It means merge two APIs into one API
> int ima_queue_stage(void)
> int ima_queue_delete_staged(unsigned long req_value)
>
> The kernel lock time will be the same. And user space lock time will be
> reduced.
It is not the same. The walk on the staged list is done without holding
ima_extend_list_mutex.
Roberto
^ permalink raw reply
* Re: [PATCH v3 3/3] ima: Add support for staging measurements for deletion
From: steven chen @ 2026-03-20 17:40 UTC (permalink / raw)
To: Roberto Sassu, Mimi Zohar, corbet, skhan, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, nramas, Roberto Sassu
In-Reply-To: <a523c0cf05e10838bf08e4d2e9a05df402f4c9b0.camel@huaweicloud.com>
On 3/20/2026 10:26 AM, Roberto Sassu wrote:
> On Fri, 2026-03-20 at 10:24 -0700, steven chen wrote:
>> On 3/20/2026 10:10 AM, Roberto Sassu wrote:
>>> On Fri, 2026-03-20 at 09:58 -0700, steven chen wrote:
>>>> On 3/20/2026 5:41 AM, Mimi Zohar wrote:
>>>>> On Thu, 2026-03-19 at 14:31 -0700, steven chen wrote:
>>>>>
>>>>>>> - Support for deleting N measurement records (and pre-pending the remaining
>>>>>>> measurement records)
>>>>>> Is there any problem to bring work of "stage" step together to the
>>>>>> deletion step?
>>>>>>
>>>>>> "Trim N" method does everything that "staged" method can do, right?
>>>>>> what's the "stage" method can do but "trim N" method can't do?
>>>>>>
>>>>>> in user space, if in "staged" state, no other user space agent can
>>>>>> access the IMA measure list, right?
>>>>>>
>>>>>> Could you explain the benefit of bringing the "stage" step?
>>>>> The performance improvement is because "staging" the IMA measurement list takes
>>>>> the lock in order to move the measurement list pointer and then releases it.
>>>>> New measurements can then be appended to a new measurement list. Deleting
>>>>> records is done without taking the lock to walk the staged measurement list.
>>>>>
>>>>> Without staging the measurement list, walking the measurement list to trim N
>>>>> records requires taking and holding the lock. The performance is dependent on
>>>>> the size of the measurement list.
>>>>>
>>>>> Your question isn't really about "staging" the measurement list records, but
>>>>> requiring a userspace signal to delete them. To answer that question, deleting
>>>>> N records (third patch) could imply staging all the measurement records and
>>>>> immediately deleting N records without an explicit userspace signal.
>>>>>
>>>>> I expect the requested "documentation" patch will provide the motivation for the
>>>>> delayed deletion of the measurement list.
>>>>>
>>>>> Mimi
>>>> "Staging" is great on reducing kernel IMA measurement list locking time.
>>>>
>>>> How about just do "stage N" entries and then delete the staged list in
>>>> one shot?
>>>> It means merge two APIs into one API
>>>> int ima_queue_stage(void)
>>>> int ima_queue_delete_staged(unsigned long req_value)
>>>>
>>>> The kernel lock time will be the same. And user space lock time will be
>>>> reduced.
>>> It is not the same. The walk on the staged list is done without holding
>>> ima_extend_list_mutex.
>>>
>>> Roberto
>> Is it possible to merge two APIs work into one API?
>> int ima_queue_stage(void)
>> int ima_queue_delete_staged(unsigned long req_value)
> It will be done transparently for the user. IMA will call both
> functions for the same securityfs write.
>
> Roberto
If merge two APIs into one API, it will reduce user space measurement
list lock time.
Thanks,
Steven
^ permalink raw reply
* Re: [PATCH v3 3/3] ima: Add support for staging measurements for deletion
From: Roberto Sassu @ 2026-03-20 17:26 UTC (permalink / raw)
To: steven chen, Mimi Zohar, corbet, skhan, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, nramas, Roberto Sassu
In-Reply-To: <8f66014c-d7c8-4a33-be7b-cfd945af4a3a@linux.microsoft.com>
On Fri, 2026-03-20 at 10:24 -0700, steven chen wrote:
> On 3/20/2026 10:10 AM, Roberto Sassu wrote:
> > On Fri, 2026-03-20 at 09:58 -0700, steven chen wrote:
> > > On 3/20/2026 5:41 AM, Mimi Zohar wrote:
> > > > On Thu, 2026-03-19 at 14:31 -0700, steven chen wrote:
> > > >
> > > > > > - Support for deleting N measurement records (and pre-pending the remaining
> > > > > > measurement records)
> > > > > Is there any problem to bring work of "stage" step together to the
> > > > > deletion step?
> > > > >
> > > > > "Trim N" method does everything that "staged" method can do, right?
> > > > > what's the "stage" method can do but "trim N" method can't do?
> > > > >
> > > > > in user space, if in "staged" state, no other user space agent can
> > > > > access the IMA measure list, right?
> > > > >
> > > > > Could you explain the benefit of bringing the "stage" step?
> > > > The performance improvement is because "staging" the IMA measurement list takes
> > > > the lock in order to move the measurement list pointer and then releases it.
> > > > New measurements can then be appended to a new measurement list. Deleting
> > > > records is done without taking the lock to walk the staged measurement list.
> > > >
> > > > Without staging the measurement list, walking the measurement list to trim N
> > > > records requires taking and holding the lock. The performance is dependent on
> > > > the size of the measurement list.
> > > >
> > > > Your question isn't really about "staging" the measurement list records, but
> > > > requiring a userspace signal to delete them. To answer that question, deleting
> > > > N records (third patch) could imply staging all the measurement records and
> > > > immediately deleting N records without an explicit userspace signal.
> > > >
> > > > I expect the requested "documentation" patch will provide the motivation for the
> > > > delayed deletion of the measurement list.
> > > >
> > > > Mimi
> > > "Staging" is great on reducing kernel IMA measurement list locking time.
> > >
> > > How about just do "stage N" entries and then delete the staged list in
> > > one shot?
> > > It means merge two APIs into one API
> > > int ima_queue_stage(void)
> > > int ima_queue_delete_staged(unsigned long req_value)
> > >
> > > The kernel lock time will be the same. And user space lock time will be
> > > reduced.
> > It is not the same. The walk on the staged list is done without holding
> > ima_extend_list_mutex.
> >
> > Roberto
>
> Is it possible to merge two APIs work into one API?
> int ima_queue_stage(void)
> int ima_queue_delete_staged(unsigned long req_value)
It will be done transparently for the user. IMA will call both
functions for the same securityfs write.
Roberto
^ permalink raw reply
* Re: [PATCH v6 3/9] landlock: Control pathname UNIX domain socket resolution by path
From: Mickaël Salaün @ 2026-03-20 17:51 UTC (permalink / raw)
To: Günther Noack
Cc: John Johansen, Tingmao Wang, Justin Suess,
Sebastian Andrzej Siewior, Kuniyuki Iwashima, Jann Horn,
linux-security-module, Samasth Norway Ananda, Matthieu Buffet,
Mikhail Ivanov, konstantin.meskhidze, Demi Marie Obenour,
Alyssa Ross, Tahera Fahimi
In-Reply-To: <20260320.f59cddcb6c6b@gnoack.org>
On Fri, Mar 20, 2026 at 05:15:40PM +0100, Günther Noack wrote:
> Hello!
>
> On Wed, Mar 18, 2026 at 05:52:48PM +0100, Mickaël Salaün wrote:
> > On Sun, Mar 15, 2026 at 11:21:44PM +0100, Günther Noack wrote:
> > > * Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
> > > controls the look up operations for named UNIX domain sockets. The
> >
> > lookup
>
> Done.
>
>
> > > resolution happens during connect() and sendmsg() (depending on
> > > socket type).
> > > * Hook into the path lookup in unix_find_bsd() in af_unix.c, using a
> > > LSM hook. Make policy decisions based on the new access rights
> > > * Increment the Landlock ABI version.
> > > * Minor test adaptions to keep the tests working.
> >
> > adaptations
>
> Done.
>
>
> > > * Document the design rationale for scoped access rights,
> > > and cross-reference it from the header documentation.
> > >
> > > With this access right, access is granted if either of the following
> > > conditions is met:
> > >
> > > * The target socket's filesystem path was allow-listed using a
> > > LANDLOCK_RULE_PATH_BENEATH rule, *or*:
> > > * The target socket was created in the same Landlock domain in which
> > > LANDLOCK_ACCESS_FS_RESOLVE_UNIX was restricted.
> > >
> > > In case of a denial, connect() and sendmsg() return EACCES, which is
> > > the same error as it is returned if the user does not have the write
> > > bit in the traditional Unix file system permissions of that file.
> >
> > UNIX
>
> DONE
>
>
> > > Document the (possible future) interaction between scoped flags and
> > > other access rights in struct landlock_ruleset_attr, and summarize the
> > > rationale, as discussed in code review leading up to [2].
> > >
> > > This feature was created with substantial discussion and input from
> > > Justin Suess, Tingmao Wang and Mickaël Salaün.
> > >
> > > Cc: Tingmao Wang <m@maowtm.org>
> > > Cc: Justin Suess <utilityemal77@gmail.com>
> > > Cc: Mickaël Salaün <mic@digikod.net>
> > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > Cc: Kuniyuki Iwashima <kuniyu@google.com>
> > > Suggested-by: Jann Horn <jannh@google.com>
> > > Link[1]: https://github.com/landlock-lsm/linux/issues/36
> > > Link[2]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
> > > Signed-off-by: Günther Noack <gnoack3000@gmail.com>
> > > ---
> > > Documentation/security/landlock.rst | 40 +++++++
> > > include/uapi/linux/landlock.h | 19 ++++
> > > security/landlock/access.h | 2 +-
> > > security/landlock/audit.c | 1 +
> > > security/landlock/fs.c | 110 ++++++++++++++++++-
> > > security/landlock/limits.h | 2 +-
> > > security/landlock/syscalls.c | 2 +-
> > > tools/testing/selftests/landlock/base_test.c | 2 +-
> > > tools/testing/selftests/landlock/fs_test.c | 5 +-
> > > 9 files changed, 176 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > > index 3e4d4d04cfae..4bbe250a6829 100644
> > > --- a/Documentation/security/landlock.rst
> > > +++ b/Documentation/security/landlock.rst
> > > @@ -89,6 +89,46 @@ this is required to keep access controls consistent over the whole system, and
> > > this avoids unattended bypasses through file descriptor passing (i.e. confused
> > > deputy attack).
> > >
> > > +.. _scoped-flags-interaction:
> > > +
> > > +Interaction between scoped flags and other access rights
> > > +--------------------------------------------------------
> > > +
> > > +The ``scoped`` flags in ``struct landlock_ruleset_attr`` restrict the
> > > +use of *outgoing* IPC from the created Landlock domain, while they
> > > +permit reaching out to IPC endpoints *within* the created Landlock
> > > +domain.
> > > +
> > > +In the future, scoped flags *may* interact with other access rights,
> > > +e.g. so that abstract UNIX sockets can be allow-listed by name, or so
> > > +that signals can be allow-listed by signal number or target process.
> > > +
> > > +When introducing ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``, we defined it to
> > > +implicitly have the same scoping semantics as a
> > > +``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` flag would have: connecting to
> > > +UNIX sockets within the same domain (where
> > > +``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` is used) is unconditionally
> > > +allowed.
> > > +
> > > +The reasoning is:
> > > +
> > > +* Like other IPC mechanisms, connecting to named UNIX sockets in the
> > > + same domain should be expected and harmless. (If needed, users can
> > > + further refine their Landlock policies with nested domains or by
> > > + restricting ``LANDLOCK_ACCESS_FS_MAKE_SOCK``.)
> > > +* We reserve the option to still introduce
> > > + ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the future. (This would
> > > + be useful if we wanted to have a Landlock rule to permit IPC access
> > > + to other Landlock domains.)
> > > +* But we can postpone the point in time when users have to deal with
> > > + two interacting flags visible in the userspace API. (In particular,
> > > + it is possible that it won't be needed in practice, in which case we
> > > + can avoid the second flag altogether.)
> > > +* If we *do* introduce ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the
> > > + future, setting this scoped flag in a ruleset does *not reduce* the
> > > + restrictions, because access within the same scope is already
> > > + allowed based on ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``.
> > > +
> > > Tests
> > > =====
> > >
> > > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > > index f88fa1f68b77..751e3c143cba 100644
> > > --- a/include/uapi/linux/landlock.h
> > > +++ b/include/uapi/linux/landlock.h
> > > @@ -248,6 +248,24 @@ struct landlock_net_port_attr {
> > > *
> > > * This access right is available since the fifth version of the Landlock
> > > * ABI.
> > > + * - %LANDLOCK_ACCESS_FS_RESOLVE_UNIX: Look up pathname UNIX domain sockets
> > > + * (:manpage:`unix(7)`). On UNIX domain sockets, this restricts both calls to
> > > + * :manpage:`connect(2)` as well as calls to :manpage:`sendmsg(2)` with an
> > > + * explicit recipient address.
> > > + *
> > > + * This access right only applies to connections to UNIX server sockets which
> > > + * were created outside of the newly created Landlock domain (e.g. from within
> > > + * a parent domain or from an unrestricted process). Newly created UNIX
> > > + * servers within the same Landlock domain continue to be accessible. In this
> > > + * regard, %LANDLOCK_ACCESS_RESOLVE_UNIX has the same semantics as the
> >
> > LANDLOCK_ACCESS_FS_RESOLVE_UNIX
>
> Whoops, done.
>
>
> > > + * ``LANDLOCK_SCOPE_*`` flags.
> > > + *
> > > + * If a resolve attempt is denied, the operation returns an ``EACCES`` error,
> > > + * in line with other filesystem access rights (but different to denials for
> > > + * abstract UNIX domain sockets).
> >
> > This access right is available since the ninth version of the Landlock ABI.
>
> Thanks, added.
>
>
> > > + *
> > > + * The rationale for this design is described in
> > > + * :ref:`Documentation/security/landlock.rst <scoped-flags-interaction>`.
> > > *
> > > * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
> > > * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> > > @@ -333,6 +351,7 @@ struct landlock_net_port_attr {
> > > #define LANDLOCK_ACCESS_FS_REFER (1ULL << 13)
> > > #define LANDLOCK_ACCESS_FS_TRUNCATE (1ULL << 14)
> > > #define LANDLOCK_ACCESS_FS_IOCTL_DEV (1ULL << 15)
> > > +#define LANDLOCK_ACCESS_FS_RESOLVE_UNIX (1ULL << 16)
> > > /* clang-format on */
> > >
> > > /**
> > > diff --git a/security/landlock/access.h b/security/landlock/access.h
> > > index 42c95747d7bd..89dc8e7b93da 100644
> > > --- a/security/landlock/access.h
> > > +++ b/security/landlock/access.h
> > > @@ -34,7 +34,7 @@
> > > LANDLOCK_ACCESS_FS_IOCTL_DEV)
> > > /* clang-format on */
> > >
> > > -typedef u16 access_mask_t;
> > > +typedef u32 access_mask_t;
> >
> > This change and the underlying implications are not explained in the
> > commit message, especially regarding the stack delta.
>
> Thanks, will add it.
>
>
> > > /* Makes sure all filesystem access rights can be stored. */
> > > static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
> > > diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> > > index 60ff217ab95b..8d0edf94037d 100644
> > > --- a/security/landlock/audit.c
> > > +++ b/security/landlock/audit.c
> > > @@ -37,6 +37,7 @@ static const char *const fs_access_strings[] = {
> > > [BIT_INDEX(LANDLOCK_ACCESS_FS_REFER)] = "fs.refer",
> > > [BIT_INDEX(LANDLOCK_ACCESS_FS_TRUNCATE)] = "fs.truncate",
> > > [BIT_INDEX(LANDLOCK_ACCESS_FS_IOCTL_DEV)] = "fs.ioctl_dev",
> > > + [BIT_INDEX(LANDLOCK_ACCESS_FS_RESOLVE_UNIX)] = "fs.resolve_unix",
> > > };
> > >
> > > static_assert(ARRAY_SIZE(fs_access_strings) == LANDLOCK_NUM_ACCESS_FS);
> > > diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> > > index 97065d51685a..0486f5ab06c9 100644
> > > --- a/security/landlock/fs.c
> > > +++ b/security/landlock/fs.c
> > > @@ -27,6 +27,7 @@
> > > #include <linux/lsm_hooks.h>
> > > #include <linux/mount.h>
> > > #include <linux/namei.h>
> > > +#include <linux/net.h>
> > > #include <linux/path.h>
> > > #include <linux/pid.h>
> > > #include <linux/rcupdate.h>
> > > @@ -36,6 +37,7 @@
> > > #include <linux/types.h>
> > > #include <linux/wait_bit.h>
> > > #include <linux/workqueue.h>
> > > +#include <net/af_unix.h>
> > > #include <uapi/linux/fiemap.h>
> > > #include <uapi/linux/landlock.h>
> > >
> > > @@ -314,7 +316,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
> > > LANDLOCK_ACCESS_FS_WRITE_FILE | \
> > > LANDLOCK_ACCESS_FS_READ_FILE | \
> > > LANDLOCK_ACCESS_FS_TRUNCATE | \
> > > - LANDLOCK_ACCESS_FS_IOCTL_DEV)
> > > + LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> > > + LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
> > > /* clang-format on */
> > >
> > > /*
> > > @@ -1557,6 +1560,110 @@ static int hook_path_truncate(const struct path *const path)
> > > return current_check_access_path(path, LANDLOCK_ACCESS_FS_TRUNCATE);
> > > }
> > >
> > > +/**
> > > + * unmask_scoped_access - Remove access right bits in @masks in all layers
> > > + * where @client and @server have the same domain
> > > + *
> > > + * This does the same as domain_is_scoped(), but unmasks bits in @masks.
> > > + * It can not return early as domain_is_scoped() does.
> >
> > I'd like a summary of your previous excellent explanation of
> > unmask_scoped_access() in this comment.
>
> Adding:
>
> A scoped access for a given access right bit is allowed iff, for all
> layer depths where the access bit is set, the client and server
> domain are the same. This function clears the access rights @access
> in @masks at all layer depths where the client and server domain are
> the same, so that, when they are all cleared, the access is allowed.
>
> It's not as detailed as drawing a picture in the other mail, but I
> hope it helps.
Good
>
>
> > > + * @client: Client domain
> > > + * @server: Server domain
> > > + * @masks: Layer access masks to unmask
> > > + * @access: Access bit that controls scoping
> > > + */
> > > +static void unmask_scoped_access(const struct landlock_ruleset *const client,
> > > + const struct landlock_ruleset *const server,
> > > + struct layer_access_masks *const masks,
> > > + const access_mask_t access)
> > > +{
> > > + int client_layer, server_layer;
> > > + const struct landlock_hierarchy *client_walker, *server_walker;
> > > +
> > > + /* This should not happen. */
> > > + if (WARN_ON_ONCE(!client))
> > > + return;
> > > +
> > > + /* Server has no Landlock domain; nothing to clear. */
> > > + if (!server)
> > > + return;
> > > +
> >
> > Please also copy the BUILD_BUG_ON() from domain_is_scoped().
>
> I don't understand what this check is good for. It says:
>
> /*
> * client_layer must be a signed integer with greater capacity
> * than client->num_layers to ensure the following loop stops.
> */
> BUILD_BUG_ON(sizeof(client_layer) > sizeof(client->num_layers));
>
> For the loop to terminate, in my understanding, client_layer must be
> able to store client->num_layers - 1 down to - 1, but that is anyway a
> given since num_layers can't exceed 16 and client_layer is signed. It
> seems that the termination of this would anyway be caught in our tests
> as well?
>
> Could you please clarify what this BUILD_BUG_ON() is trying to assert?
The intent was to make sure client_layer is indeed an int and not an
unsigned int for instance. Hopefully tests catch that but using a
build-time assert catch the potential issue and document it. Also, it
would be weird to not have the same checks in both copies of code.
>
>
> > > + client_layer = client->num_layers - 1;
> > > + client_walker = client->hierarchy;
> > > + server_layer = server->num_layers - 1;
> > > + server_walker = server->hierarchy;
> > > +
> > > + /*
> > > + * Clears the access bits at all layers where the client domain is the
> > > + * same as the server domain. We start the walk at min(client_layer,
> > > + * server_layer). The layer bits until there can not be cleared because
> > > + * either the client or the server domain is missing.
> > > + */
> > > + for (; client_layer > server_layer; client_layer--)
> > > + client_walker = client_walker->parent;
> > > +
> > > + for (; server_layer > client_layer; server_layer--)
> > > + server_walker = server_walker->parent;
> > > +
> > > + for (; client_layer >= 0; client_layer--) {
> > > + if (masks->access[client_layer] & access &&
> > > + client_walker == server_walker)
> > > + masks->access[client_layer] &= ~access;
> > > +
> > > + client_walker = client_walker->parent;
> > > + server_walker = server_walker->parent;
> > > + }
> > > +}
> > > +
> > > +static int hook_unix_find(const struct path *const path, struct sock *other,
> > > + int flags)
> > > +{
> > > + const struct landlock_ruleset *dom_other;
> > > + const struct landlock_cred_security *subject;
> > > + struct layer_access_masks layer_masks;
> > > + struct landlock_request request = {};
> > > + static const struct access_masks fs_resolve_unix = {
> > > + .fs = LANDLOCK_ACCESS_FS_RESOLVE_UNIX,
> > > + };
> > > +
> > > + /* Lookup for the purpose of saving coredumps is OK. */
> > > + if (unlikely(flags & SOCK_COREDUMP))
> > > + return 0;
> > > +
> > > + /* Access to the same (or a lower) domain is always allowed. */
> >
> > This comment is related to the unmask_scoped_access() call.
>
> Thanks, I moved it down.
>
>
> > > + subject = landlock_get_applicable_subject(current_cred(),
> > > + fs_resolve_unix, NULL);
> > > +
> > > + if (!subject)
> > > + return 0;
> > > +
> > > + if (!landlock_init_layer_masks(subject->domain, fs_resolve_unix.fs,
> > > + &layer_masks, LANDLOCK_KEY_INODE))
> >
> > This case is not possible because landlock_get_applicable_subject()
> > already check it. Other hooks just ignore the returned value in this
> > case.
>
> Hm, fair enough. I added a comment to explain why we are ignoring the
> return value, as it wasn't as obvious to me. In the other places, we
> are using the result of the landlock_init_layer_masks() function
> (because in the generic case, it can be a subset of the original
> access rights).
Another way to document it would be to use a WARN_ON_ONCE(), but that
would not be very useful in this case and add code which cannot be
tested/covered.
>
>
> > > + return 0;
> > > +
> > > + /* Checks the layers in which we are connecting within the same domain. */
> > > + unix_state_lock(other);
> > > + if (unlikely(sock_flag(other, SOCK_DEAD) || !other->sk_socket ||
> > > + !other->sk_socket->file)) {
> > > + unix_state_unlock(other);
> > > + return 0;
> >
> > Is it safe to not return -ECONNREFUSED?
>
> Yes. My reasoning is:
>
> In all three places where this gets called in af_unix.c (stream
> connect, dgram connect, dgram send), these functions check for socket
> death shortly after, and if they find the socket to be SOCK_DEAD, they
> will *retry* the UNIX lookup. The code commentary about this says
> that this is for a race condition where the VFS has "overslept" the
> socket death, so I presume that the retry aims at getting a race-free
> sitation on the next attempt.
>
> Since sock_orphan() is a one-way teardown operation, when we observe
> SOCK_DEAD in our hook, we can be sure that the caller will see it as
> well when it does the same check a bit later after our hook.
>
> If we *were* to return -ECONNREFUSED, the caller would immediately
> return an error though, and it would not retry as it normally does
> when it encounters this race condition. So we have to return 0 here.
OK, sound good.
>
>
> > > + }
> > > + dom_other = landlock_cred(other->sk_socket->file->f_cred)->domain;
> > > + unix_state_unlock(other);
> > > +
> > > + unmask_scoped_access(subject->domain, dom_other, &layer_masks,
> > > + fs_resolve_unix.fs);
> >
> > dom_other is not safe to use without the lock.
>
> Thanks, fixed by extending the lock scope across that function call,
> as discussed in other thread in more detail.
>
>
> > > + /* Checks the connections to allow-listed paths. */
> > > + if (is_access_to_paths_allowed(subject->domain, path,
> > > + fs_resolve_unix.fs, &layer_masks,
> > > + &request, NULL, 0, NULL, NULL, NULL))
> > > + return 0;
> > > +
> > > + landlock_log_denial(subject, &request);
> > > + return -EACCES;
> > > +}
> > > +
> > > /* File hooks */
> > >
> > > /**
> > > @@ -1834,6 +1941,7 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
> > > LSM_HOOK_INIT(path_unlink, hook_path_unlink),
> > > LSM_HOOK_INIT(path_rmdir, hook_path_rmdir),
> > > LSM_HOOK_INIT(path_truncate, hook_path_truncate),
> > > + LSM_HOOK_INIT(unix_find, hook_unix_find),
> > >
> > > LSM_HOOK_INIT(file_alloc_security, hook_file_alloc_security),
> > > LSM_HOOK_INIT(file_open, hook_file_open),
> > > diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> > > index eb584f47288d..b454ad73b15e 100644
> > > --- a/security/landlock/limits.h
> > > +++ b/security/landlock/limits.h
> > > @@ -19,7 +19,7 @@
> > > #define LANDLOCK_MAX_NUM_LAYERS 16
> > > #define LANDLOCK_MAX_NUM_RULES U32_MAX
> > >
> > > -#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_IOCTL_DEV
> > > +#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_RESOLVE_UNIX
> > > #define LANDLOCK_MASK_ACCESS_FS ((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
> > > #define LANDLOCK_NUM_ACCESS_FS __const_hweight64(LANDLOCK_MASK_ACCESS_FS)
> > >
> > > diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> > > index 3b33839b80c7..a6e23657f3ce 100644
> > > --- a/security/landlock/syscalls.c
> > > +++ b/security/landlock/syscalls.c
> > > @@ -166,7 +166,7 @@ static const struct file_operations ruleset_fops = {
> > > * If the change involves a fix that requires userspace awareness, also update
> > > * the errata documentation in Documentation/userspace-api/landlock.rst .
> > > */
> > > -const int landlock_abi_version = 8;
> > > +const int landlock_abi_version = 9;
> > >
> > > /**
> > > * sys_landlock_create_ruleset - Create a new ruleset
> > > diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
> > > index 0fea236ef4bd..30d37234086c 100644
> > > --- a/tools/testing/selftests/landlock/base_test.c
> > > +++ b/tools/testing/selftests/landlock/base_test.c
> > > @@ -76,7 +76,7 @@ TEST(abi_version)
> > > const struct landlock_ruleset_attr ruleset_attr = {
> > > .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
> > > };
> > > - ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
> > > + ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
> > > LANDLOCK_CREATE_RULESET_VERSION));
> > >
> > > ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
> > > diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
> > > index 968a91c927a4..b318627e7561 100644
> > > --- a/tools/testing/selftests/landlock/fs_test.c
> > > +++ b/tools/testing/selftests/landlock/fs_test.c
> > > @@ -575,9 +575,10 @@ TEST_F_FORK(layout1, inval)
> > > LANDLOCK_ACCESS_FS_WRITE_FILE | \
> > > LANDLOCK_ACCESS_FS_READ_FILE | \
> > > LANDLOCK_ACCESS_FS_TRUNCATE | \
> > > - LANDLOCK_ACCESS_FS_IOCTL_DEV)
> > > + LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> > > + LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
> > >
> > > -#define ACCESS_LAST LANDLOCK_ACCESS_FS_IOCTL_DEV
> > > +#define ACCESS_LAST LANDLOCK_ACCESS_FS_RESOLVE_UNIX
> > >
> > > #define ACCESS_ALL ( \
> > > ACCESS_FILE | \
> > > --
> > > 2.53.0
> > >
> > >
>
> –Günther
>
^ permalink raw reply
* Re: [PATCH v3 0/8] module: Move 'struct module_signature' to UAPI
From: Nicolas Schier @ 2026-03-20 20:06 UTC (permalink / raw)
To: Thomas Weißschuh
Cc: David Howells, David Woodhouse, Luis Chamberlain, Petr Pavlu,
Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, Mimi Zohar, Roberto Sassu, Dmitry Kasatkin,
Eric Snowberg, Paul Moore, James Morris, Serge E. Hallyn,
Nathan Chancellor, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, keyrings, linux-kernel,
linux-modules, linux-s390, linux-integrity, linux-security-module,
linux-kbuild, bpf, linux-kselftest
In-Reply-To: <20260305-module-signature-uapi-v3-0-92f45ea6028c@linutronix.de>
[-- Attachment #1: Type: text/plain, Size: 2929 bytes --]
On Thu, Mar 05, 2026 at 10:31:36AM +0100, Thomas Weißschuh wrote:
> This structure definition is used outside the kernel proper.
> For example in kmod and the kernel build environment.
>
> To allow reuse, move it to a new UAPI header.
>
> While it is not a true UAPI, it is a common practice to have
> non-UAPI interface definitions in the kernel's UAPI headers.
>
> This came up as part of my CONFIG_MODULE_HASHES series [0].
> But it is useful on its own and so we get it out of the way.
>
> [0] https://lore.kernel.org/lkml/aZ3OfJJSJgfOb0rJ@levanger/
>
> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> ---
> Changes in v3:
> - Also adapt the include path for the custom sign-file rule in the bpf selftests.
> (My manual run of BPF CI still fails, due to an BUG() on s390,
> I don't see how this is due to this patch)
> - Link to v2: https://lore.kernel.org/r/20260305-module-signature-uapi-v2-0-dc4d81129dee@linutronix.de
>
> Changes in v2:
> - Drop spurious definition of MODULE_SIGNATURE_TYPE_MERKLE.
> - s/modules/module/ in two patch subjects.
> - Pick up review tags.
> - Link to v1: https://lore.kernel.org/r/20260302-module-signature-uapi-v1-0-207d955e0d69@linutronix.de
>
> ---
> Thomas Weißschuh (8):
> extract-cert: drop unused definition of PKEY_ID_PKCS7
> module: Drop unused signature types
> module: Give 'enum pkey_id_type' a more specific name
> module: Give MODULE_SIG_STRING a more descriptive name
> module: Move 'struct module_signature' to UAPI
> tools uapi headers: add linux/module_signature.h
> sign-file: use 'struct module_signature' from the UAPI headers
> selftests/bpf: verify_pkcs7_sig: Use 'struct module_signature' from the UAPI headers
>
> arch/s390/kernel/machine_kexec_file.c | 6 ++--
> certs/extract-cert.c | 2 --
> include/linux/module_signature.h | 30 +---------------
> include/uapi/linux/module_signature.h | 41 ++++++++++++++++++++++
> kernel/module/signing.c | 4 +--
> kernel/module_signature.c | 2 +-
> scripts/Makefile | 1 +
> scripts/sign-file.c | 19 +++-------
> security/integrity/ima/ima_modsig.c | 6 ++--
> tools/include/uapi/linux/module_signature.h | 41 ++++++++++++++++++++++
> tools/testing/selftests/bpf/Makefile | 1 +
> .../selftests/bpf/prog_tests/verify_pkcs7_sig.c | 28 ++-------------
> 12 files changed, 101 insertions(+), 80 deletions(-)
> ---
> base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
> change-id: 20260302-module-signature-uapi-61fa80b1e2bb
>
Thanks for these patches!
For the whole series:
Reviewed-by: Nicolas Schier <nsc@kernel.org>
--
Nicolas
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v6 3/9] landlock: Control pathname UNIX domain socket resolution by path
From: Günther Noack @ 2026-03-20 22:25 UTC (permalink / raw)
To: Mickaël Salaün
Cc: John Johansen, Tingmao Wang, Justin Suess,
Sebastian Andrzej Siewior, Kuniyuki Iwashima, Jann Horn,
linux-security-module, Samasth Norway Ananda, Matthieu Buffet,
Mikhail Ivanov, konstantin.meskhidze, Demi Marie Obenour,
Alyssa Ross, Tahera Fahimi
In-Reply-To: <20260320.eez3sheeThul@digikod.net>
On Fri, Mar 20, 2026 at 06:51:13PM +0100, Mickaël Salaün wrote:
> On Fri, Mar 20, 2026 at 05:15:40PM +0100, Günther Noack wrote:
> > On Wed, Mar 18, 2026 at 05:52:48PM +0100, Mickaël Salaün wrote:
> > > On Sun, Mar 15, 2026 at 11:21:44PM +0100, Günther Noack wrote:
> > > > + * @client: Client domain
> > > > + * @server: Server domain
> > > > + * @masks: Layer access masks to unmask
> > > > + * @access: Access bit that controls scoping
> > > > + */
> > > > +static void unmask_scoped_access(const struct landlock_ruleset *const client,
> > > > + const struct landlock_ruleset *const server,
> > > > + struct layer_access_masks *const masks,
> > > > + const access_mask_t access)
> > > > +{
> > > > + int client_layer, server_layer;
> > > > + const struct landlock_hierarchy *client_walker, *server_walker;
> > > > +
> > > > + /* This should not happen. */
> > > > + if (WARN_ON_ONCE(!client))
> > > > + return;
> > > > +
> > > > + /* Server has no Landlock domain; nothing to clear. */
> > > > + if (!server)
> > > > + return;
> > > > +
> > >
> > > Please also copy the BUILD_BUG_ON() from domain_is_scoped().
> >
> > I don't understand what this check is good for. It says:
> >
> > /*
> > * client_layer must be a signed integer with greater capacity
> > * than client->num_layers to ensure the following loop stops.
> > */
> > BUILD_BUG_ON(sizeof(client_layer) > sizeof(client->num_layers));
> >
> > For the loop to terminate, in my understanding, client_layer must be
> > able to store client->num_layers - 1 down to - 1, but that is anyway a
> > given since num_layers can't exceed 16 and client_layer is signed. It
> > seems that the termination of this would anyway be caught in our tests
> > as well?
> >
> > Could you please clarify what this BUILD_BUG_ON() is trying to assert?
>
> The intent was to make sure client_layer is indeed an int and not an
> unsigned int for instance. Hopefully tests catch that but using a
> build-time assert catch the potential issue and document it. Also, it
> would be weird to not have the same checks in both copies of code.
It isn't clear to me why the BUILD_BUG_ON is checking based on the
sizeof() of these variables then. The number of bytes in an integer
type is independent of its signedness, after all. Should we rather do
this maybe?
/*
* client_layer must be a signed integer to ensure the following
* loop stops.
*/
BUILD_BUG_ON(!is_signed_type(typeof(client_layer)));
(Although that is also a bit of a triviality given that the same
variable is being defined as a signed integer just a few lines above,
but at least it is very explicit about it.)
I'd drop the remark about the capacity, as even a signed 8-bit integer
is large enough to hold the layer indices and the -1.
Does that sound reasonable? I can do it in the other function as well
if you want to keep things consistent.
–Günther
^ permalink raw reply
* Re: [PATCH v6 3/9] landlock: Control pathname UNIX domain socket resolution by path
From: Mickaël Salaün @ 2026-03-21 9:09 UTC (permalink / raw)
To: Günther Noack
Cc: John Johansen, Tingmao Wang, Justin Suess,
Sebastian Andrzej Siewior, Kuniyuki Iwashima, Jann Horn,
linux-security-module, Samasth Norway Ananda, Matthieu Buffet,
Mikhail Ivanov, konstantin.meskhidze, Demi Marie Obenour,
Alyssa Ross, Tahera Fahimi, Kees Cook
In-Reply-To: <20260320.ee35c8125f6f@gnoack.org>
On Fri, Mar 20, 2026 at 11:25:04PM +0100, Günther Noack wrote:
> On Fri, Mar 20, 2026 at 06:51:13PM +0100, Mickaël Salaün wrote:
> > On Fri, Mar 20, 2026 at 05:15:40PM +0100, Günther Noack wrote:
> > > On Wed, Mar 18, 2026 at 05:52:48PM +0100, Mickaël Salaün wrote:
> > > > On Sun, Mar 15, 2026 at 11:21:44PM +0100, Günther Noack wrote:
> > > > > + * @client: Client domain
> > > > > + * @server: Server domain
> > > > > + * @masks: Layer access masks to unmask
> > > > > + * @access: Access bit that controls scoping
> > > > > + */
> > > > > +static void unmask_scoped_access(const struct landlock_ruleset *const client,
> > > > > + const struct landlock_ruleset *const server,
> > > > > + struct layer_access_masks *const masks,
> > > > > + const access_mask_t access)
> > > > > +{
> > > > > + int client_layer, server_layer;
> > > > > + const struct landlock_hierarchy *client_walker, *server_walker;
> > > > > +
> > > > > + /* This should not happen. */
> > > > > + if (WARN_ON_ONCE(!client))
> > > > > + return;
> > > > > +
> > > > > + /* Server has no Landlock domain; nothing to clear. */
> > > > > + if (!server)
> > > > > + return;
> > > > > +
> > > >
> > > > Please also copy the BUILD_BUG_ON() from domain_is_scoped().
> > >
> > > I don't understand what this check is good for. It says:
> > >
> > > /*
> > > * client_layer must be a signed integer with greater capacity
> > > * than client->num_layers to ensure the following loop stops.
> > > */
> > > BUILD_BUG_ON(sizeof(client_layer) > sizeof(client->num_layers));
> > >
> > > For the loop to terminate, in my understanding, client_layer must be
> > > able to store client->num_layers - 1 down to - 1, but that is anyway a
> > > given since num_layers can't exceed 16 and client_layer is signed. It
> > > seems that the termination of this would anyway be caught in our tests
> > > as well?
> > >
> > > Could you please clarify what this BUILD_BUG_ON() is trying to assert?
> >
> > The intent was to make sure client_layer is indeed an int and not an
> > unsigned int for instance. Hopefully tests catch that but using a
> > build-time assert catch the potential issue and document it. Also, it
> > would be weird to not have the same checks in both copies of code.
>
> It isn't clear to me why the BUILD_BUG_ON is checking based on the
> sizeof() of these variables then. The number of bytes in an integer
> type is independent of its signedness, after all. Should we rather do
> this maybe?
>
> /*
> * client_layer must be a signed integer to ensure the following
> * loop stops.
> */
> BUILD_BUG_ON(!is_signed_type(typeof(client_layer)));
I didn't know about this macro, looks good.
>
> (Although that is also a bit of a triviality given that the same
> variable is being defined as a signed integer just a few lines above,
> but at least it is very explicit about it.)
Yeah, I know, but I added this canary check because I was bitten by a
bug. Even if it might be trivial now, it might help when working on
other parts, and it's just a build-time check that also serves as doc.
If in doubt, let's keep build-time checks that were most likely added
for a good reason. I prefer to have build-time errors than run-time
ones.
>
> I'd drop the remark about the capacity, as even a signed 8-bit integer
> is large enough to hold the layer indices and the -1.
Large enough for now... All these checks are useful to easily spot
issues if/when the current invariant change (e.g. if one day we extend
the number of max layers). Integer C types can silently cast to other
integer types (with different capacity or sign/unsigned), which might be
the source of bugs. So yeah, please keep the current capacity check and
add the sign one, it won't do any harm.
>
> Does that sound reasonable? I can do it in the other function as well
> if you want to keep things consistent.
Yes, please update the other function as well.
^ permalink raw reply
* Re: [PATCH 6/7] tomoyo: Convert from sb_mount to granular mount hooks
From: Tetsuo Handa @ 2026-03-21 12:54 UTC (permalink / raw)
To: Song Liu, viro
Cc: paul, jmorris, serge, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn, herton,
kernel-team, selinux, apparmor, linux-fsdevel,
linux-security-module
In-Reply-To: <20260318184400.3502908-7-song@kernel.org>
On 2026/03/19 3:43, Song Liu wrote:
> Replace tomoyo_sb_mount() with granular mount hooks. Each hook
> reconstructs the MS_* flags expected by tomoyo_mount_permission()
> using the original flags parameter where available.
>
> Key changes:
> - mount_bind: passes the pre-resolved source path to
> tomoyo_mount_acl() via a new dev_path parameter, instead of
> re-resolving dev_name via kern_path(). This eliminates a TOCTOU
> vulnerability.
> - mount_new, mount_remount, mount_reconfigure: use the original
> mount(2) flags for policy matching.
> - mount_move: passes pre-resolved paths for both source and
> destination.
> - mount_change_type: passes raw ms_flags directly.
>
> Also removes the unused data_page parameter from
> tomoyo_mount_permission().
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Signed-off-by: Song Liu <song@kernel.org>
Basically OK. One question to Al Viro.
Do you still refuse the idea of resolving dev_path argument before calling LSM hooks
(the proposal you NAKed at https://lkml.kernel.org/r/20250709102410.GU1880847@ZenIV )
despite this series removes security_sb_mount() and security_move_mount() hooks?
> diff --git a/security/tomoyo/mount.c b/security/tomoyo/mount.c
> index 322dfd188ada..82ffe7d02814 100644
> --- a/security/tomoyo/mount.c
> +++ b/security/tomoyo/mount.c
> @@ -70,6 +70,7 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
> * @dir: Pointer to "struct path".
> * @type: Name of filesystem type.
> * @flags: Mount options.
> + * @dev_path: Pre-resolved device/source path. Maybe NULL.
I guess that we can avoid passing maybe-NULL dev_name if Al Viro can accept
resolving maybe-NULL dev_path argument before calling LSM hooks.
^ permalink raw reply
* Re: [PATCH 6/7] tomoyo: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-03-22 1:06 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Song Liu, viro@zeniv.linux.org.uk, paul@paul-moore.com,
jmorris@namei.org, serge@hallyn.com, brauner@kernel.org,
jack@suse.cz, john.johansen@canonical.com,
stephen.smalley.work@gmail.com, omosnace@redhat.com,
mic@digikod.net, gnoack@google.com, takedakn@nttdata.co.jp,
herton@canonical.com, Kernel Team, selinux@vger.kernel.org,
apparmor@lists.ubuntu.com, linux-fsdevel@vger.kernel.org,
linux-security-module@vger.kernel.org
In-Reply-To: <b720f521-e930-4f35-9505-1bfdf9e2818c@I-love.SAKURA.ne.jp>
> On Mar 21, 2026, at 5:54 AM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>
> On 2026/03/19 3:43, Song Liu wrote:
>> Replace tomoyo_sb_mount() with granular mount hooks. Each hook
>> reconstructs the MS_* flags expected by tomoyo_mount_permission()
>> using the original flags parameter where available.
>>
>> Key changes:
>> - mount_bind: passes the pre-resolved source path to
>> tomoyo_mount_acl() via a new dev_path parameter, instead of
>> re-resolving dev_name via kern_path(). This eliminates a TOCTOU
>> vulnerability.
>> - mount_new, mount_remount, mount_reconfigure: use the original
>> mount(2) flags for policy matching.
>> - mount_move: passes pre-resolved paths for both source and
>> destination.
>> - mount_change_type: passes raw ms_flags directly.
>>
>> Also removes the unused data_page parameter from
>> tomoyo_mount_permission().
>>
>> Code generated with the assistance of Claude, reviewed by human.
>>
>> Signed-off-by: Song Liu <song@kernel.org>
>
> Basically OK. One question to Al Viro.
>
> Do you still refuse the idea of resolving dev_path argument before calling LSM hooks
> (the proposal you NAKed at https://lkml.kernel.org/r/20250709102410.GU1880847@ZenIV )
> despite this series removes security_sb_mount() and security_move_mount() hooks?
>
>> diff --git a/security/tomoyo/mount.c b/security/tomoyo/mount.c
>> index 322dfd188ada..82ffe7d02814 100644
>> --- a/security/tomoyo/mount.c
>> +++ b/security/tomoyo/mount.c
>> @@ -70,6 +70,7 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
>> * @dir: Pointer to "struct path".
>> * @type: Name of filesystem type.
>> * @flags: Mount options.
>> + * @dev_path: Pre-resolved device/source path. Maybe NULL.
>
> I guess that we can avoid passing maybe-NULL dev_name if Al Viro can accept
> resolving maybe-NULL dev_path argument before calling LSM hooks.
If I understand the code correctly, tomoyo records requested_dev_name for
new mount. namespace.c, OTOH, does NOT do kern_path() for new mount. This
is why we keep the maybe-NULL dev_name here. I personally think this is
the best approach without changing tomoyo behavior.
Thanks,
Song
^ permalink raw reply
* Re: [PATCH 6/7] tomoyo: Convert from sb_mount to granular mount hooks
From: Tetsuo Handa @ 2026-03-22 10:46 UTC (permalink / raw)
To: Song Liu, viro@zeniv.linux.org.uk
Cc: Song Liu, paul@paul-moore.com, jmorris@namei.org,
serge@hallyn.com, brauner@kernel.org, jack@suse.cz,
john.johansen@canonical.com, stephen.smalley.work@gmail.com,
omosnace@redhat.com, mic@digikod.net, gnoack@google.com,
takedakn@nttdata.co.jp, herton@canonical.com, Kernel Team,
selinux@vger.kernel.org, apparmor@lists.ubuntu.com,
linux-fsdevel@vger.kernel.org,
linux-security-module@vger.kernel.org
In-Reply-To: <4DF5C4A8-7C92-4F76-9B34-2262089E7289@meta.com>
On 2026/03/22 10:06, Song Liu wrote:
>>> @@ -70,6 +70,7 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
>>> * @dir: Pointer to "struct path".
>>> * @type: Name of filesystem type.
>>> * @flags: Mount options.
>>> + * @dev_path: Pre-resolved device/source path. Maybe NULL.
>>
>> I guess that we can avoid passing maybe-NULL dev_name if Al Viro can accept
>> resolving maybe-NULL dev_path argument before calling LSM hooks.
>
> If I understand the code correctly, tomoyo records requested_dev_name for
> new mount. namespace.c, OTOH, does NOT do kern_path() for new mount. This
> is why we keep the maybe-NULL dev_name here. I personally think this is
> the best approach without changing tomoyo behavior.
Well, namespace.c does kern_path() for new mount, but it happens a bit later after
security_mount_new() was called.
do_new_mount_fc() => fc_mount() => vfs_get_tree() => fc->ops->get_tree() == e.g. ext4_get_tree()
=> get_tree_bdev() => get_tree_bdev_flags() => lookup_bdev() => kern_path()
@@ -3835,6 +3855,9 @@ static int do_new_mount(const struct path *path, const char *fstype,
err = parse_monolithic_mount_data(fc, data);
if (!err && !mount_capable(fc))
err = -EPERM;
+
+ if (!err)
+ err = security_mount_new(fc, path, mnt_flags, flags, data);
if (!err)
err = do_new_mount_fc(fc, path, mnt_flags);
Since not all filesystems need to resolve dev_name argument, conversion from dev_name
to "struct path" is up to individual filesystem. If we can use a flag like FS_REQUIRES_DEV
that tells whether that filesystem will resolve dev_name argument, I think we can resolve
the dev_name argument before security_mount_new() is called (and can avoid TOCTOU).
^ permalink raw reply
* [PATCH] ima: abort file hash computation on fatal signal
From: Shigeru Yoshida @ 2026-03-22 11:10 UTC (permalink / raw)
To: Mimi Zohar, Roberto Sassu, Dmitry Kasatkin, Eric Snowberg,
Paul Moore, James Morris, Serge E. Hallyn
Cc: Shigeru Yoshida, syzbot+6ed94e81a1492fe1d512, linux-integrity,
linux-security-module, linux-kernel
ima_calc_file_hash_atfm() and ima_calc_file_hash_tfm() compute a hash
over the entire file contents without checking for pending fatal
signals. When a very large file is being hashed during mmap (via
ima_file_mmap), the computation can take an extended period. If a
coredump is initiated by another thread in the same thread group during
this time, the dumper thread waits in coredump_wait() for all other
threads to exit. However, the hashing thread cannot exit until the hash
loop completes, resulting in a hung task.
Add fatal_signal_pending() checks to both the ahash and shash file
hashing loops so that the computation is aborted promptly when SIGKILL
is received.
Fixes: 3323eec921ef ("integrity: IMA as an integrity service provider")
Reported-by: syzbot+6ed94e81a1492fe1d512@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6ed94e81a1492fe1d512
Tested-by: syzbot+6ed94e81a1492fe1d512@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
---
security/integrity/ima/ima_crypto.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c
index aff61643415d..7b721b9c944f 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -17,6 +17,7 @@
#include <linux/crypto.h>
#include <linux/scatterlist.h>
#include <linux/err.h>
+#include <linux/sched/signal.h>
#include <linux/slab.h>
#include <crypto/hash.h>
@@ -416,6 +417,12 @@ static int ima_calc_file_hash_atfm(struct file *file,
if (rbuf[1])
active = !active; /* swap buffers, if we use two */
+
+ if (fatal_signal_pending(current)) {
+ ahash_wait(ahash_rc, &wait);
+ rc = -EINTR;
+ goto out3;
+ }
}
/* wait for the last update request to complete */
rc = ahash_wait(ahash_rc, &wait);
@@ -491,6 +498,10 @@ static int ima_calc_file_hash_tfm(struct file *file,
rc = crypto_shash_update(shash, rbuf, rbuf_len);
if (rc)
break;
+ if (fatal_signal_pending(current)) {
+ rc = -EINTR;
+ break;
+ }
}
kfree(rbuf);
out:
--
2.52.0
^ permalink raw reply related
* Re: [PATCH] ima: abort file hash computation on fatal signal
From: Eric Biggers @ 2026-03-22 14:17 UTC (permalink / raw)
To: Shigeru Yoshida
Cc: Mimi Zohar, Roberto Sassu, Dmitry Kasatkin, Eric Snowberg,
Paul Moore, James Morris, Serge E. Hallyn,
syzbot+6ed94e81a1492fe1d512, linux-integrity,
linux-security-module, linux-kernel
In-Reply-To: <20260322111019.2815601-1-syoshida@redhat.com>
On Sun, Mar 22, 2026 at 08:10:19PM +0900, Shigeru Yoshida wrote:
> ima_calc_file_hash_atfm() and ima_calc_file_hash_tfm() compute a hash
> over the entire file contents without checking for pending fatal
> signals. When a very large file is being hashed during mmap (via
> ima_file_mmap), the computation can take an extended period. If a
> coredump is initiated by another thread in the same thread group during
> this time, the dumper thread waits in coredump_wait() for all other
> threads to exit. However, the hashing thread cannot exit until the hash
> loop completes, resulting in a hung task.
>
> Add fatal_signal_pending() checks to both the ahash and shash file
> hashing loops so that the computation is aborted promptly when SIGKILL
> is received.
>
> Fixes: 3323eec921ef ("integrity: IMA as an integrity service provider")
> Reported-by: syzbot+6ed94e81a1492fe1d512@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=6ed94e81a1492fe1d512
> Tested-by: syzbot+6ed94e81a1492fe1d512@syzkaller.appspotmail.com
> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
> ---
> security/integrity/ima/ima_crypto.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c
> index aff61643415d..7b721b9c944f 100644
> --- a/security/integrity/ima/ima_crypto.c
> +++ b/security/integrity/ima/ima_crypto.c
> @@ -17,6 +17,7 @@
> #include <linux/crypto.h>
> #include <linux/scatterlist.h>
> #include <linux/err.h>
> +#include <linux/sched/signal.h>
> #include <linux/slab.h>
> #include <crypto/hash.h>
>
> @@ -416,6 +417,12 @@ static int ima_calc_file_hash_atfm(struct file *file,
>
> if (rbuf[1])
> active = !active; /* swap buffers, if we use two */
> +
> + if (fatal_signal_pending(current)) {
> + ahash_wait(ahash_rc, &wait);
> + rc = -EINTR;
> + goto out3;
> + }
I think you'll need to rebase onto
https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git/log/?h=next-integrity
since there is a patch queued up that removes ima_calc_file_hash_atfm().
So only ima_calc_file_hash_tfm() will need to be updated.
- Eric
^ permalink raw reply
* Re: [PATCH 6/7] tomoyo: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-03-23 3:32 UTC (permalink / raw)
To: Tetsuo Handa
Cc: viro@zeniv.linux.org.uk, Song Liu, paul@paul-moore.com,
jmorris@namei.org, serge@hallyn.com, brauner@kernel.org,
jack@suse.cz, john.johansen@canonical.com,
stephen.smalley.work@gmail.com, omosnace@redhat.com,
mic@digikod.net, gnoack@google.com, takedakn@nttdata.co.jp,
herton@canonical.com, Kernel Team, selinux@vger.kernel.org,
apparmor@lists.ubuntu.com, linux-fsdevel@vger.kernel.org,
linux-security-module@vger.kernel.org
In-Reply-To: <33abcf34-13e2-4a37-83f3-78bb27ecbc11@I-love.SAKURA.ne.jp>
> On Mar 22, 2026, at 3:46 AM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>
> On 2026/03/22 10:06, Song Liu wrote:
>>>> @@ -70,6 +70,7 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
>>>> * @dir: Pointer to "struct path".
>>>> * @type: Name of filesystem type.
>>>> * @flags: Mount options.
>>>> + * @dev_path: Pre-resolved device/source path. Maybe NULL.
>>>
>>> I guess that we can avoid passing maybe-NULL dev_name if Al Viro can accept
>>> resolving maybe-NULL dev_path argument before calling LSM hooks.
>>
>> If I understand the code correctly, tomoyo records requested_dev_name for
>> new mount. namespace.c, OTOH, does NOT do kern_path() for new mount. This
>> is why we keep the maybe-NULL dev_name here. I personally think this is
>> the best approach without changing tomoyo behavior.
>
> Well, namespace.c does kern_path() for new mount, but it happens a bit later after
> security_mount_new() was called.
>
> do_new_mount_fc() => fc_mount() => vfs_get_tree() => fc->ops->get_tree() == e.g. ext4_get_tree()
> => get_tree_bdev() => get_tree_bdev_flags() => lookup_bdev() => kern_path()
>
> @@ -3835,6 +3855,9 @@ static int do_new_mount(const struct path *path, const char *fstype,
> err = parse_monolithic_mount_data(fc, data);
> if (!err && !mount_capable(fc))
> err = -EPERM;
> +
> + if (!err)
> + err = security_mount_new(fc, path, mnt_flags, flags, data);
> if (!err)
> err = do_new_mount_fc(fc, path, mnt_flags);
>
>
> Since not all filesystems need to resolve dev_name argument, conversion from dev_name
> to "struct path" is up to individual filesystem. If we can use a flag like FS_REQUIRES_DEV
> that tells whether that filesystem will resolve dev_name argument, I think we can resolve
> the dev_name argument before security_mount_new() is called (and can avoid TOCTOU).
I guess we can add dev_path to fs_context?
Thanks,
Song
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox