Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH v6.1] apparmor: fix unprivileged local user can do privileged policy management
From: Greg KH @ 2026-04-02  6:01 UTC (permalink / raw)
  To: Keerthana K
  Cc: stable, john.johansen, paul, jmorris, serge, georgia.garcia,
	cengiz.can, sashal, apparmor, linux-security-module, linux-kernel,
	ajay.kaher, alexey.makhalov, vamsi-krishna.brahmajosyula,
	yin.ding, tapas.kundu, Qualys Security Advisory,
	Salvatore Bonaccorso
In-Reply-To: <20260402054700.2798707-1-keerthana.kalyanasundaram@broadcom.com>

On Thu, Apr 02, 2026 at 05:47:00AM +0000, Keerthana K wrote:
> From: John Johansen <john.johansen@canonical.com>
> 
> commit 6601e13e82841879406bf9f369032656f441a425 upstream.

<snip>

Does your group/company/whatever actually use apparmor?  If so, this
isn't the only commit that needs to be backported.  I'm waiting on a
"correct" set of 6.1.y patches from John before applying all of them to
6.1.y and then I can take the patch series that he gave me for 5.10.y
and 5.15.y and will queue them up.

So thanks for this backport, but it's not going to help resolve all of
the recent fixes that went in as part of this series by just applying
one of them.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v6.1] apparmor: fix unprivileged local user can do privileged policy management
From: Keerthana Kalyanasundaram @ 2026-04-02  8:03 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, john.johansen, paul, jmorris, serge, georgia.garcia,
	cengiz.can, sashal, apparmor, linux-security-module, linux-kernel,
	ajay.kaher, alexey.makhalov, vamsi-krishna.brahmajosyula,
	yin.ding, tapas.kundu, Qualys Security Advisory,
	Salvatore Bonaccorso
In-Reply-To: <2026040249-fable-sasquatch-4864@gregkh>


[-- Attachment #1.1: Type: text/plain, Size: 942 bytes --]

On Thu, Apr 2, 2026 at 11:31 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> On Thu, Apr 02, 2026 at 05:47:00AM +0000, Keerthana K wrote:
> > From: John Johansen <john.johansen@canonical.com>
> >
> > commit 6601e13e82841879406bf9f369032656f441a425 upstream.
>
> <snip>
>
> Does your group/company/whatever actually use apparmor?  If so, this
> isn't the only commit that needs to be backported.  I'm waiting on a
> "correct" set of 6.1.y patches from John before applying all of them to
> 6.1.y and then I can take the patch series that he gave me for 5.10.y
> and 5.15.y and will queue them up.
>
> So thanks for this backport, but it's not going to help resolve all of
> the recent fixes that went in as part of this series by just applying
> one of them.
>
> Thanks for the update, Greg. We will wait for John to queue and apply the
complete series of patches to the stable branches.

 thanks,
>
> greg k-h
>

[-- Attachment #1.2: Type: text/html, Size: 1696 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5459 bytes --]

^ permalink raw reply

* Re: [PATCH v8 04/12] landlock: Control pathname UNIX domain socket resolution by path
From: Sebastian Andrzej Siewior @ 2026-04-02  9:51 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, John Johansen, Tingmao Wang,
	Justin Suess, Kuniyuki Iwashima, Jann Horn, linux-security-module,
	Samasth Norway Ananda, Matthieu Buffet, Mikhail Ivanov,
	konstantin.meskhidze, Demi Marie Obenour, Alyssa Ross,
	Tahera Fahimi, Georgia Garcia
In-Reply-To: <20260327164838.38231-5-gnoack3000@gmail.com>

On 2026-03-27 17:48:29 [+0100], Günther Noack wrote:
> * Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
>   controls the lookup operations for named UNIX domain sockets.  The
>   resolution happens during connect() and sendmsg() (depending on
>   socket type).
> Cc: Tingmao Wang <m@maowtm.org>
> Cc: Justin Suess <utilityemal77@gmail.com>
> Cc: Mickaël Salaün <mic@digikod.net>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Kuniyuki Iwashima <kuniyu@google.com>
> Suggested-by: Jann Horn <jannh@google.com>
> Link[1]: https://github.com/landlock-lsm/linux/issues/36
> Link[2]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
> Signed-off-by: Günther Noack <gnoack3000@gmail.com>

The unix bits look okay to me,

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian

^ permalink raw reply

* Re: LSM namespacing API
From: Dr. Greg @ 2026-04-02 10:59 UTC (permalink / raw)
  To: Paul Moore
  Cc: Stephen Smalley, Ondrej Mosnacek, linux-security-module, selinux,
	John Johansen
In-Reply-To: <CAHC9VhR1R7TcC2a2wZ9-G8dmXTuhcDK1YedDduq0sFgPC8QxFw@mail.gmail.com>

On Sun, Mar 29, 2026 at 08:56:37PM -0400, Paul Moore wrote:

Good morning, hopefully the week is going well for everyone.

> On Sun, Mar 29, 2026 at 12:09???PM Dr. Greg <greg@enjellic.com> wrote:
> > On Tue, Mar 24, 2026 at 05:31:09PM -0400, Paul Moore wrote:
> > > On Tue, Mar 3, 2026 at 11:46???AM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > I'd really like to hear from some of the other LSMs before we start
> > > > diving into the code.  It may sound funny, but from my perspective
> > > > doing the work to get the API definition "right" is far more important
> > > > than implementing it.
> >
> > > It's been three weeks now, and I haven't seen any strong arguments for
> > > supporting the clone() API at this time, so we can leave that out for
> > > now and stick with just the unshare() API for an initial attempt.  We
> > > can always add a clone() API at a later date if needed; going small
> > > and expanding over time is usually a better decision anyway.
> > >
> > > So to quickly summarize, here is where I think the discussion landed:
> > >
> > > * Implement the lsm_unshare() syscall
> > >
> > > I expect it would look something like 'lsm_unshare(struct lsm_ctx
> > > *ctx, u32 size, u32 flags)' with @ctx specifying the particular LSM
> > > being unshared, and @flags being 0/unused at this point in time
> > > (unless we can think of something we want to specify here).  Like
> > > lsm_set_self_attr(), only one @ctx can be specified at a time, so you
> > > can only unshare one LSM at a time.
> >
> > Unless we miss something, it would seem that there needs to be
> > additional thought as to how a process moves, atomically, from one
> > effective security configuration to the next.
> >
> > At a minimum, if we restrict ourselves to the model of simply changing
> > the namespace for a single LSM, there would seem to be a need to have
> > a 2-step process in order to atomically transition from one security
> > model/policy to the next.

> That depends on the individual LSMs, they are free to interpret the
> unshare request and handle it however they like.

No argument there.

An LSM will obviously need to allocate an LSM namespace specific
security 'blob' in order to hold the security context for the new
namespace.

Christian had proposed patches for a generic mechanism to create
LSM security namespace blobs, is implementation of that in scope for
this effort?

> > The interim between the first and second steps would allow an
> > orchestrator to configure the new namespace and load new namespace
> > specific policy into the security namespace ...

> As discussed previously, the LSM policy load syscalls might include
> some LSM namespace options. However, I first want to focus on
> finalizing the most basic namespace API, which on Linux is arguably
> the unshare() syscall concept.

Unfortunately, without considering all the implications and
requirements of various LSM's we may end up with lsm_share2() and
beyond.

See below.

> > It would seem that the flags variable might be a good option to use to
> > handle this 2-stage transition, for example LSM_NS_INIT and
> > LSM_NS_CHANGE, respectively, to specify the initialization and
> > execution phases of the transition.

> No.  The lsm_unshare() syscall is intended to mimic the existing
> unshare() syscall as a single step process from a user's
> perspective.  If it returns successfully the caller will be in a new
> LSM namespace as defined by the individual LSM specified in the
> syscall.

OK, we can reason forward with that paradigm.

An orchestrator issues the unshare call for an LSM namespace and upon
return from the system call the calling task is in a new namespace for
that particular LSM, the goal of which is presumably to implement a
security policy/model different than what had been in force
previously.

So the process is in a new LSM specific namespace, but still
implementing the policy from the previous namespace, until the
orchestrator can load the new policy and then trigger the LSM to
change from its previous policy to the newly loaded policy.

Is this consistent with your vision as to how all of this will work?

> > The other unanswered issue, or perhaps we missed it, are the security
> > controls that should be associated with the unshare call.

> Each LSM is free to implement whatever access controls it deems
> necessary in its lsm_unshare() callback.

Just to be clear.

When you refer to 'lsm_unshare() callback' are you referring to a new
LSM security hook to be be implemented that will allow all of the
active LSM's to pass judgement on whether or not the unshare should be
allowed to complete successfully?

See below.

> > Will there be a new LSM hook that allows other LSM's to veto the
> > creation of a namespace either for itself or for another LSM?
> 
> I would expect the lsm_unshare() syscall to operate similarly to the
> lsm_set_self_attr() syscall in this regard.

The reference to handling this like lsm_set_self_attr() is unclear.

With lsm_set_self_attr() there is no reason for another LSM to deny
setting what is an LSM specific attribute, as you note above, each LSM
gets to decide if the request to set an attribute for the LSM should
be accepted or denied.

Since lsm_unshare() is changing the overall platform security state,
it seems consistent with the design of the LSM for other LSM's to be
able to veto this action.

Once again, this seems like an action that would be consistent with
the notion of the lockdown LSM,

> > Is there a need to have yet another kernel command-line parameter that
> > would completely deny the ability to create security namespaces?

> No, at least not at this point in time.

This would seem to reinforce issues in the previous discussion.

Given that distributions are 'kitchen sink' implementations it would
seem desirable that system security architects would want to use a
lockdown option to insure that the platform security configuration
cannot be changed.

> Individual LSMs can decide how they want to gate their own namespace
> functionality, if they implement namespaces at all.
> 
> > Is CAP_MAC_ADMIN appropriate as the required capability to create a
> > new namespace or does there need to be, for security rigor, a specific
> > capability (CAP_LSM_NS?) that gates the ability to execute whatever
> > form of the system call is adopted?

> Once again, this is up to the individual LSMs, not the framework
> layer.

Fair enough.

That still leaves the question of whether or not CAP_MAC_ADMIN is
appropriate for gating the creation of a new security namespace.

> > Should there be an option to completely compile LSM namespaces out of
> > the kernel?

> That doesn't belong in the LSM framework layer, that is up to the
> individual LSMs.

You noted above the desire for lsm_unshare to be consistent with other
namespaces.

The current kernel paradigm is to allow classes of namespace
resources, ie. CONFIG_UTS_NS, CONFIG_TIME_NS et.al., to be compiled in
our out of the kernel.

It seems that CONFIG_LSM_NS would be consistent with that model.

> > > * Implement /proc/pid/ns/lsm and setns(CLONE_NEWLSM)
> > >
> > > As discussed previously, this allows us to move a process into an
> > > existing, established LSM namespace set.  The caller cannot
> > > selectively choose which individual LSM namespaces they join from the
> > > given LSM namespace set, they receive the same LSM namespace
> > > configuration as the target process.
> >
> > As an initial aside.  It would be assumed that a positive result of a
> > setns call would be to cause the calling process to atomically change
> > its security namespace set.  This would further suggest the need to
> > have the security namespace creation process also execute atomically
> > in a multi-LSM namespace change environment.

> In the setns case no new LSM namespaces should be created, the process
> simply joins an existing set of LSM namespaces.

The issue isn't about new namespaces being created, the issue is
atomicity of a change to a new set of security policies.

With setns an atomic transition is implemented.

The proposed lsm_unshare() behavior results in a period of time when
multiple and varying security policies are active, depending on
various race issues in the orchestrator implementation.

This opens the door to a raft of potential security issues that we can
have a new acronym for, Time Of Implementation Time Of Use (TOITOU).

> > ... That is the concept of whether or not a setns
> > call, for any resource namespace, should also force a security
> > namespace change if the security namespace of the calling process
> > differs from that of the target process.

> That decision is left to the individual LSMs.

That is reasonable.

In order to support that model, there would seem to be a need to have
a new LSM call in the setns code that allows LSM's to determine
whether or not a change in the active security namespace set should be
forced, correct?

If so, is implementation of this in scope for the lsm_unshare()
infrastructure?

To close, at the risk of being the devils advocate.

Given that the sentiment is to force almost all of these
issues/decisions into the individual LSM's, what is the advantage of
having a common lsm_unshare() system call?

In the proposed model, a resource orchestrator is going to need to
have extensive knowledge over the mechanics of all the LSM's that
implement namespace functionality.  At a very minimum, intrinsic to
the concept of security namespaces, there will be a need to load a new
policy or model into the namespace, an action that will be deeply LSM
specific.

At this point, the only common functionality may be the allocation of
a new LSM namespace 'blob'.  An argument for not doing that in
lsm_unshare() is that it precludes the ability of an orchestrator to
implement an atomic policy change, as that would require an
orchestrator to somehow load a policy/model before lsm_unshare() is
called, which in turn would require a new security context to be
allocated prior to the unshare operation.

All of this tends to be an issue with integrity or measurement based
namespaces, which are important with respect to supporting
confidential computing initiatives.  Without two stage namespace
transition, you stumble into subtle problems associated with
'Heisenberg dilemma' issues.

> paul-moore.com

Hopefully all of this will assist in defining the requirements for all
of this.

Have a good remainder of the week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply

* Re: [PATCH v8 04/12] landlock: Control pathname UNIX domain socket resolution by path
From: Kuniyuki Iwashima @ 2026-04-02 18:09 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, John Johansen, Tingmao Wang,
	Justin Suess, Sebastian Andrzej Siewior, Jann Horn,
	linux-security-module, Samasth Norway Ananda, Matthieu Buffet,
	Mikhail Ivanov, konstantin.meskhidze, Demi Marie Obenour,
	Alyssa Ross, Tahera Fahimi, Georgia Garcia
In-Reply-To: <20260327164838.38231-5-gnoack3000@gmail.com>

On Fri, Mar 27, 2026 at 9:49 AM Günther Noack <gnoack3000@gmail.com> wrote:
>
> * Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
>   controls the lookup operations for named UNIX domain sockets.  The
>   resolution happens during connect() and sendmsg() (depending on
>   socket type).
> * Change access_mask_t from u16 to u32 (see below)
> * Hook into the path lookup in unix_find_bsd() in af_unix.c, using a
>   LSM hook.  Make policy decisions based on the new access rights
> * Increment the Landlock ABI version.
> * Minor test adaptations to keep the tests working.
> * Document the design rationale for scoped access rights,
>   and cross-reference it from the header documentation.
>
> With this access right, access is granted if either of the following
> conditions is met:
>
> * The target socket's filesystem path was allow-listed using a
>   LANDLOCK_RULE_PATH_BENEATH rule, *or*:
> * The target socket was created in the same Landlock domain in which
>   LANDLOCK_ACCESS_FS_RESOLVE_UNIX was restricted.
>
> In case of a denial, connect() and sendmsg() return EACCES, which is
> the same error as it is returned if the user does not have the write
> bit in the traditional UNIX file system permissions of that file.
>
> The access_mask_t type grows from u16 to u32 to make space for the new
> access right.  This also doubles the size of struct layer_access_masks
> from 32 byte to 64 byte.
>
> Document the (possible future) interaction between scoped flags and
> other access rights in struct landlock_ruleset_attr, and summarize the
> rationale, as discussed in code review leading up to [2].
>
> This feature was created with substantial discussion and input from
> Justin Suess, Tingmao Wang and Mickaël Salaün.
>
> Cc: Tingmao Wang <m@maowtm.org>
> Cc: Justin Suess <utilityemal77@gmail.com>
> Cc: Mickaël Salaün <mic@digikod.net>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Kuniyuki Iwashima <kuniyu@google.com>
> Suggested-by: Jann Horn <jannh@google.com>
> Link[1]: https://github.com/landlock-lsm/linux/issues/36
> Link[2]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
> Signed-off-by: Günther Noack <gnoack3000@gmail.com>
> ---
>  Documentation/security/landlock.rst          |  42 +++++-
>  Documentation/userspace-api/landlock.rst     |   2 +-
>  include/uapi/linux/landlock.h                |  21 +++
>  security/landlock/access.h                   |   2 +-
>  security/landlock/audit.c                    |   1 +
>  security/landlock/fs.c                       | 130 ++++++++++++++++++-
>  security/landlock/limits.h                   |   2 +-
>  security/landlock/syscalls.c                 |   2 +-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  tools/testing/selftests/landlock/fs_test.c   |   5 +-
>  10 files changed, 200 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> index 3e4d4d04cfae..c3f8f43073a7 100644
> --- a/Documentation/security/landlock.rst
> +++ b/Documentation/security/landlock.rst
> @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
>  ==================================
>
>  :Author: Mickaël Salaün
> -:Date: September 2025
> +:Date: March 2026
>
>  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
>  harden a whole system, this feature should be available to any process,
> @@ -89,6 +89,46 @@ this is required to keep access controls consistent over the whole system, and
>  this avoids unattended bypasses through file descriptor passing (i.e. confused
>  deputy attack).
>
> +.. _scoped-flags-interaction:
> +
> +Interaction between scoped flags and other access rights
> +--------------------------------------------------------
> +
> +The ``scoped`` flags in ``struct landlock_ruleset_attr`` restrict the
> +use of *outgoing* IPC from the created Landlock domain, while they
> +permit reaching out to IPC endpoints *within* the created Landlock
> +domain.
> +
> +In the future, scoped flags *may* interact with other access rights,
> +e.g. so that abstract UNIX sockets can be allow-listed by name, or so
> +that signals can be allow-listed by signal number or target process.
> +
> +When introducing ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``, we defined it to
> +implicitly have the same scoping semantics as a
> +``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` flag would have: connecting to
> +UNIX sockets within the same domain (where
> +``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` is used) is unconditionally
> +allowed.
> +
> +The reasoning is:
> +
> +* Like other IPC mechanisms, connecting to named UNIX sockets in the
> +  same domain should be expected and harmless.  (If needed, users can
> +  further refine their Landlock policies with nested domains or by
> +  restricting ``LANDLOCK_ACCESS_FS_MAKE_SOCK``.)
> +* We reserve the option to still introduce
> +  ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the future.  (This would
> +  be useful if we wanted to have a Landlock rule to permit IPC access
> +  to other Landlock domains.)
> +* But we can postpone the point in time when users have to deal with
> +  two interacting flags visible in the userspace API.  (In particular,
> +  it is possible that it won't be needed in practice, in which case we
> +  can avoid the second flag altogether.)
> +* If we *do* introduce ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the
> +  future, setting this scoped flag in a ruleset does *not reduce* the
> +  restrictions, because access within the same scope is already
> +  allowed based on ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``.
> +
>  Tests
>  =====
>
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index 13134bccdd39..1490f879f621 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -8,7 +8,7 @@ Landlock: unprivileged access control
>  =====================================
>
>  :Author: Mickaël Salaün
> -:Date: January 2026
> +:Date: March 2026
>
>  The goal of Landlock is to enable restriction of ambient rights (e.g. global
>  filesystem or network access) for a set of processes.  Because Landlock
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index f88fa1f68b77..3157d257555b 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -248,6 +248,26 @@ struct landlock_net_port_attr {
>   *
>   *   This access right is available since the fifth version of the Landlock
>   *   ABI.
> + * - %LANDLOCK_ACCESS_FS_RESOLVE_UNIX: Look up pathname UNIX domain sockets
> + *   (:manpage:`unix(7)`).  On UNIX domain sockets, this restricts both calls to
> + *   :manpage:`connect(2)` as well as calls to :manpage:`sendmsg(2)` with an
> + *   explicit recipient address.
> + *
> + *   This access right only applies to connections to UNIX server sockets which
> + *   were created outside of the newly created Landlock domain (e.g. from within
> + *   a parent domain or from an unrestricted process).  Newly created UNIX
> + *   servers within the same Landlock domain continue to be accessible.  In this
> + *   regard, %LANDLOCK_ACCESS_FS_RESOLVE_UNIX has the same semantics as the
> + *   ``LANDLOCK_SCOPE_*`` flags.
> + *
> + *   If a resolve attempt is denied, the operation returns an ``EACCES`` error,
> + *   in line with other filesystem access rights (but different to denials for
> + *   abstract UNIX domain sockets).
> + *
> + *   This access right is available since the ninth version of the Landlock ABI.
> + *
> + *   The rationale for this design is described in
> + *   :ref:`Documentation/security/landlock.rst <scoped-flags-interaction>`.
>   *
>   * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
>   * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> @@ -333,6 +353,7 @@ struct landlock_net_port_attr {
>  #define LANDLOCK_ACCESS_FS_REFER                       (1ULL << 13)
>  #define LANDLOCK_ACCESS_FS_TRUNCATE                    (1ULL << 14)
>  #define LANDLOCK_ACCESS_FS_IOCTL_DEV                   (1ULL << 15)
> +#define LANDLOCK_ACCESS_FS_RESOLVE_UNIX                        (1ULL << 16)
>  /* clang-format on */
>
>  /**
> diff --git a/security/landlock/access.h b/security/landlock/access.h
> index 277b6ed7f7bb..99c709f7979e 100644
> --- a/security/landlock/access.h
> +++ b/security/landlock/access.h
> @@ -34,7 +34,7 @@
>         LANDLOCK_ACCESS_FS_IOCTL_DEV)
>  /* clang-format on */
>
> -typedef u16 access_mask_t;
> +typedef u32 access_mask_t;
>
>  /* Makes sure all filesystem access rights can be stored. */
>  static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
> diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> index 60ff217ab95b..8d0edf94037d 100644
> --- a/security/landlock/audit.c
> +++ b/security/landlock/audit.c
> @@ -37,6 +37,7 @@ static const char *const fs_access_strings[] = {
>         [BIT_INDEX(LANDLOCK_ACCESS_FS_REFER)] = "fs.refer",
>         [BIT_INDEX(LANDLOCK_ACCESS_FS_TRUNCATE)] = "fs.truncate",
>         [BIT_INDEX(LANDLOCK_ACCESS_FS_IOCTL_DEV)] = "fs.ioctl_dev",
> +       [BIT_INDEX(LANDLOCK_ACCESS_FS_RESOLVE_UNIX)] = "fs.resolve_unix",
>  };
>
>  static_assert(ARRAY_SIZE(fs_access_strings) == LANDLOCK_NUM_ACCESS_FS);
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 97065d51685a..fcf69b3d734d 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -27,6 +27,7 @@
>  #include <linux/lsm_hooks.h>
>  #include <linux/mount.h>
>  #include <linux/namei.h>
> +#include <linux/net.h>
>  #include <linux/path.h>
>  #include <linux/pid.h>
>  #include <linux/rcupdate.h>
> @@ -36,6 +37,7 @@
>  #include <linux/types.h>
>  #include <linux/wait_bit.h>
>  #include <linux/workqueue.h>
> +#include <net/af_unix.h>
>  #include <uapi/linux/fiemap.h>
>  #include <uapi/linux/landlock.h>
>
> @@ -314,7 +316,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
>         LANDLOCK_ACCESS_FS_WRITE_FILE | \
>         LANDLOCK_ACCESS_FS_READ_FILE | \
>         LANDLOCK_ACCESS_FS_TRUNCATE | \
> -       LANDLOCK_ACCESS_FS_IOCTL_DEV)
> +       LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> +       LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
>  /* clang-format on */
>
>  /*
> @@ -1557,6 +1560,130 @@ static int hook_path_truncate(const struct path *const path)
>         return current_check_access_path(path, LANDLOCK_ACCESS_FS_TRUNCATE);
>  }
>
> +/**
> + * unmask_scoped_access - Remove access right bits in @masks in all layers
> + *                        where @client and @server have the same domain
> + *
> + * This does the same as domain_is_scoped(), but unmasks bits in @masks.
> + * It can not return early as domain_is_scoped() does.
> + *
> + * A scoped access for a given access right bit is allowed iff, for all layer
> + * depths where the access bit is set, the client and server domain are the
> + * same.  This function clears the access rights @access in @masks at all layer
> + * depths where the client and server domain are the same, so that, when they
> + * are all cleared, the access is allowed.
> + *
> + * @client: Client domain
> + * @server: Server domain
> + * @masks: Layer access masks to unmask
> + * @access: Access bits that control scoping
> + */
> +static void unmask_scoped_access(const struct landlock_ruleset *const client,
> +                                const struct landlock_ruleset *const server,
> +                                struct layer_access_masks *const masks,
> +                                const access_mask_t access)
> +{
> +       int client_layer, server_layer;
> +       const struct landlock_hierarchy *client_walker, *server_walker;
> +
> +       /* This should not happen. */
> +       if (WARN_ON_ONCE(!client))
> +               return;
> +
> +       /* Server has no Landlock domain; nothing to clear. */
> +       if (!server)
> +               return;
> +
> +       /*
> +        * client_layer must be a signed integer with greater capacity
> +        * than client->num_layers to ensure the following loop stops.
> +        */
> +       BUILD_BUG_ON(sizeof(client_layer) > sizeof(client->num_layers));
> +
> +       client_layer = client->num_layers - 1;
> +       client_walker = client->hierarchy;
> +       server_layer = server->num_layers - 1;
> +       server_walker = server->hierarchy;
> +
> +       /*
> +        * Clears the access bits at all layers where the client domain is the
> +        * same as the server domain.  We start the walk at min(client_layer,
> +        * server_layer).  The layer bits until there can not be cleared because
> +        * either the client or the server domain is missing.
> +        */
> +       for (; client_layer > server_layer; client_layer--)
> +               client_walker = client_walker->parent;
> +
> +       for (; server_layer > client_layer; server_layer--)
> +               server_walker = server_walker->parent;
> +
> +       for (; client_layer >= 0; client_layer--) {
> +               if (masks->access[client_layer] & access &&
> +                   client_walker == server_walker)
> +                       masks->access[client_layer] &= ~access;
> +
> +               client_walker = client_walker->parent;
> +               server_walker = server_walker->parent;
> +       }
> +}
> +
> +static int hook_unix_find(const struct path *const path, struct sock *other,
> +                         int flags)
> +{
> +       const struct landlock_ruleset *dom_other;
> +       const struct landlock_cred_security *subject;
> +       struct layer_access_masks layer_masks;
> +       struct landlock_request request = {};
> +       static const struct access_masks fs_resolve_unix = {
> +               .fs = LANDLOCK_ACCESS_FS_RESOLVE_UNIX,
> +       };
> +
> +       /* Lookup for the purpose of saving coredumps is OK. */
> +       if (unlikely(flags & SOCK_COREDUMP))
> +               return 0;
> +
> +       subject = landlock_get_applicable_subject(current_cred(),
> +                                                 fs_resolve_unix, NULL);
> +
> +       if (!subject)
> +               return 0;
> +
> +       /*
> +        * Ignoring return value: that the domains apply was already checked in
> +        * landlock_get_applicable_subject() above.
> +        */
> +       landlock_init_layer_masks(subject->domain, fs_resolve_unix.fs,
> +                                 &layer_masks, LANDLOCK_KEY_INODE);
> +
> +       /* Checks the layers in which we are connecting within the same domain. */
> +       unix_state_lock(other);
> +       if (unlikely(sock_flag(other, SOCK_DEAD) || !other->sk_socket ||
> +                    !other->sk_socket->file)) {

When will the latter two condition be true when !SOCK_DEAD ?

unix_find_bsd() should not find embryo sockets.


> +               unix_state_unlock(other);
> +               /*
> +                * We rely on the caller to catch the (non-reversible) SOCK_DEAD
> +                * condition and retry the lookup.  If we returned an error
> +                * here, the lookup would not get retried.
> +                */
> +               return 0;
> +       }
> +       dom_other = landlock_cred(other->sk_socket->file->f_cred)->domain;
> +
> +       /* Access to the same (or a lower) domain is always allowed. */
> +       unmask_scoped_access(subject->domain, dom_other, &layer_masks,
> +                            fs_resolve_unix.fs);
> +       unix_state_unlock(other);
> +
> +       /* Checks the connections to allow-listed paths. */
> +       if (is_access_to_paths_allowed(subject->domain, path,
> +                                      fs_resolve_unix.fs, &layer_masks,
> +                                      &request, NULL, 0, NULL, NULL, NULL))
> +               return 0;
> +
> +       landlock_log_denial(subject, &request);
> +       return -EACCES;
> +}
> +
>  /* File hooks */
>
>  /**
> @@ -1834,6 +1961,7 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
>         LSM_HOOK_INIT(path_unlink, hook_path_unlink),
>         LSM_HOOK_INIT(path_rmdir, hook_path_rmdir),
>         LSM_HOOK_INIT(path_truncate, hook_path_truncate),
> +       LSM_HOOK_INIT(unix_find, hook_unix_find),
>
>         LSM_HOOK_INIT(file_alloc_security, hook_file_alloc_security),
>         LSM_HOOK_INIT(file_open, hook_file_open),
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index eb584f47288d..b454ad73b15e 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -19,7 +19,7 @@
>  #define LANDLOCK_MAX_NUM_LAYERS                16
>  #define LANDLOCK_MAX_NUM_RULES         U32_MAX
>
> -#define LANDLOCK_LAST_ACCESS_FS                LANDLOCK_ACCESS_FS_IOCTL_DEV
> +#define LANDLOCK_LAST_ACCESS_FS                LANDLOCK_ACCESS_FS_RESOLVE_UNIX
>  #define LANDLOCK_MASK_ACCESS_FS                ((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
>  #define LANDLOCK_NUM_ACCESS_FS         __const_hweight64(LANDLOCK_MASK_ACCESS_FS)
>
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 3b33839b80c7..a6e23657f3ce 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -166,7 +166,7 @@ static const struct file_operations ruleset_fops = {
>   * If the change involves a fix that requires userspace awareness, also update
>   * the errata documentation in Documentation/userspace-api/landlock.rst .
>   */
> -const int landlock_abi_version = 8;
> +const int landlock_abi_version = 9;
>
>  /**
>   * sys_landlock_create_ruleset - Create a new ruleset
> diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
> index 0fea236ef4bd..30d37234086c 100644
> --- a/tools/testing/selftests/landlock/base_test.c
> +++ b/tools/testing/selftests/landlock/base_test.c
> @@ -76,7 +76,7 @@ TEST(abi_version)
>         const struct landlock_ruleset_attr ruleset_attr = {
>                 .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
>         };
> -       ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
> +       ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
>                                              LANDLOCK_CREATE_RULESET_VERSION));
>
>         ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
> diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
> index 968a91c927a4..b318627e7561 100644
> --- a/tools/testing/selftests/landlock/fs_test.c
> +++ b/tools/testing/selftests/landlock/fs_test.c
> @@ -575,9 +575,10 @@ TEST_F_FORK(layout1, inval)
>         LANDLOCK_ACCESS_FS_WRITE_FILE | \
>         LANDLOCK_ACCESS_FS_READ_FILE | \
>         LANDLOCK_ACCESS_FS_TRUNCATE | \
> -       LANDLOCK_ACCESS_FS_IOCTL_DEV)
> +       LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> +       LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
>
> -#define ACCESS_LAST LANDLOCK_ACCESS_FS_IOCTL_DEV
> +#define ACCESS_LAST LANDLOCK_ACCESS_FS_RESOLVE_UNIX
>
>  #define ACCESS_ALL ( \
>         ACCESS_FILE | \
> --
> 2.53.0
>

^ permalink raw reply

* Re: [PATCH] landlock: Document fallocate(2) as another truncation corner case
From: Mickaël Salaün @ 2026-04-02 18:16 UTC (permalink / raw)
  To: Günther Noack; +Cc: linux-security-module
In-Reply-To: <ac1SP3cGuEeIZFmM@google.com>

On Wed, Apr 01, 2026 at 07:13:35PM +0200, Günther Noack wrote:
> On Wed, Apr 01, 2026 at 06:30:28PM +0200, Mickaël Salaün wrote:
> > On Wed, Apr 01, 2026 at 05:09:10PM +0200, Günther Noack wrote:
> > > Reinforce the already stated policy that LANDLOCK_ACCESS_FS_TRUNCATE should
> > > always go hand in hand with LANDLOCK_ACCESS_FS_WRITE_FILE, as their
> > > meanings and enforcement overlap in counterintuitive ways.
> > > 
> > > On many common file systems, fallocate(2) offers a way to shorten files as
> > > long as the file is opened for writing, side-stepping the
> > > LANDLOCK_ACCESS_FS_TRUNCATE right.
> > > 
> > > Assisted-by: Gemini-CLI:gemini-3.1
> > > Signed-off-by: Günther Noack <gnoack@google.com>
> > > ---
> > >  Documentation/userspace-api/landlock.rst | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> > > index 7f86d7a37dc2..d5691ec136cc 100644
> > > --- a/Documentation/userspace-api/landlock.rst
> > > +++ b/Documentation/userspace-api/landlock.rst
> > > @@ -378,8 +378,8 @@ Truncating files
> > >  
> > >  The operations covered by ``LANDLOCK_ACCESS_FS_WRITE_FILE`` and
> > >  ``LANDLOCK_ACCESS_FS_TRUNCATE`` both change the contents of a file and sometimes
> > > -overlap in non-intuitive ways.  It is recommended to always specify both of
> > > -these together.
> > > +overlap in non-intuitive ways.  It is strongly recommended to always specify
> > > +both of these together (either granting both, or granting none).
> > >  
> > >  A particularly surprising example is :manpage:`creat(2)`.  The name suggests
> > >  that this system call requires the rights to create and write files.  However,
> > > @@ -391,6 +391,10 @@ It should also be noted that truncating files does not require the
> > >  system call, this can also be done through :manpage:`open(2)` with the flags
> > >  ``O_RDONLY | O_TRUNC``.
> > >  
> > > +At the same time, on some filesystems, :manpage:`fallocate(2)` offers a way to
> > > +shorten file contents with ``FALLOC_FL_COLLAPSE_RANGE`` when the file is opened
> > > +for writing, sidestepping the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right.
> > 
> > Interesting, which filesystems?  Shouldn't it be fixed in the code
> > instead?
> 
> It works on ext4, and I also see mentions of FALLOC_FL_COLLAPSE_RANGE
> in XFS, F2FS, SMB and NTFS3.
> 
> I should mention, it is not *exactly* the same as a truncation, but
> you can remove a chunk of the file from the middle, which also leads
> to a shorter file.  For example, assuming a block size of 1024:
> 
>   1. Make a file with 2*1024 bytes: 1024*'A', then 1024*'B'
>   2. fallocate(collapse range, 0, 1024)
> 
> Resulting file is 1024*'B', and the file is shortened to 1024 bytes.
> 
> So this is not *exactly* a truncation.  (The man page says that an
> attempt to remove the end of a file results in EINVAL, so you have to
> take it from the middle, and it needs to align with block boundaries.)
> 
> But it's quite similar, also shortens the file, and it does not
> require the Landlock truncation access right.
> 
> I agree, another way would potentially be to call the LSM ftruncate
> hook.  I suspect this would stay compatible with other LSMs, because
> the LSM ftruncate hook is a relatively recent addition (but have not
> checked in detail).
> 
> The implementation of fallocate is vfs_fallocate() in fs/open.c - I
> only had a tentative look now; it checks that the file->f_mode is open
> for writing and calls security_file_permission() with MAY_WRITE.
> 
> I always saw LANDLOCK_ACCESS_FS_WRITE_FILE and
> LANDLOCK_ACCESS_FS_TRUNCATE as rights that should always go together,
> so I suspect that it does not make a big difference in practice, and
> that is why I am suggesting to just document it more clearly for now.

OK, I agree, I'll take this patch. Thanks!

> 
> —Günther
> 

^ permalink raw reply

* Re: LSM namespacing API
From: Casey Schaufler @ 2026-04-02 17:49 UTC (permalink / raw)
  To: Dr. Greg, Paul Moore
  Cc: Stephen Smalley, Ondrej Mosnacek, linux-security-module, selinux,
	John Johansen, Casey Schaufler
In-Reply-To: <ac5MKr4lFQhc44i6@wind.enjellic.com>

On 4/2/2026 3:59 AM, Dr. Greg wrote:
> That still leaves the question of whether or not CAP_MAC_ADMIN is
> appropriate for gating the creation of a new security namespace.

That will have to be up to the individual LSMs. Not all LSMs implement
Mandatory Access Controls. It would be inappropriate for an LSM that
provides finer grain privilege than capabilities do to be gated by
CAP_MAC_ADMIN. An LSM that implements a novel access control list scheme
would fall under CAP_DAC_SOMETHING, not CAP_MAC_ADMIN. While a time-of-day
access scheme might require CAP_MAC_ADMIN, it might not. Implying that all
LSMs enforce a MAC policy is not a good idea.


^ permalink raw reply

* [PATCH v3 1/5] selftests/landlock: Fix snprintf truncation checks in audit helpers
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

snprintf() returns the number of characters that would have been
written, excluding the terminating NUL byte.  When the output is
truncated, this return value equals or exceeds the buffer size.  Fix
matches_log_domain_allocated() and matches_log_domain_deallocated() to
detect truncation with ">=" instead of ">".

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- New patch (split from the drain fix).
---
 tools/testing/selftests/landlock/audit.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 44eb433e9666..1049a0582af5 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
 
 	log_match_len =
 		snprintf(log_match, sizeof(log_match), log_template, pid);
-	if (log_match_len > sizeof(log_match))
+	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
 	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
@@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocated(
 
 	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
 				 num_denials);
-	if (log_match_len > sizeof(log_match))
+	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
 	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 4/5] selftests/landlock: Skip stale records in audit_match_record()
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

Domain deallocation records are emitted asynchronously from kworker
threads (via free_ruleset_work()).  Stale deallocation records from a
previous test can arrive during the current test's deallocation read
loop and be picked up by audit_match_record() instead of the expected
record, causing a domain ID mismatch.  The audit.layers test (which
creates 16 nested domains) is particularly vulnerable because it reads
16 deallocation records in sequence, providing a large window for stale
records to interleave.

The same issue affects audit_flags.signal, where deallocation records
from a previous test (audit.layers) can leak into the next test and be
picked up by audit_match_record() instead of the expected record.

Fix this by continuing to read records when the type matches but the
content pattern does not.  Stale records are silently consumed, and the
loop only stops when both type and pattern match (or the socket times
out with -EAGAIN).

Additionally, extend matches_log_domain_deallocated() with an
expected_domain_id parameter.  When set, the regex pattern includes the
specific domain ID as a literal hex value, so that deallocation records
for a different domain do not match the pattern at all.  This handles
the case where the stale record has the same denial count as the
expected one (e.g. both have denials=1), which the type+pattern loop
alone cannot distinguish.  Callers that already know the expected domain
ID (from a prior denial or allocation record) now pass it to filter
precisely.

When expected_domain_id is set, matches_log_domain_deallocated() also
temporarily increases the socket timeout to audit_tv_dom_drop (1 second)
to wait for the asynchronous kworker deallocation, and restores
audit_tv_default afterward.  This removes the need for callers to manage
the timeout switch manually.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v2:
https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
- Fix __u64 format warnings on powerpc64 (cast to unsigned long long
  for %llx).

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- New patch.
---
 tools/testing/selftests/landlock/audit.h      | 82 ++++++++++++++-----
 tools/testing/selftests/landlock/audit_test.c | 34 ++++----
 2 files changed, 77 insertions(+), 39 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 74e1c3d763be..834005b2b0f0 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -249,9 +249,9 @@ static __maybe_unused char *regex_escape(const char *const src, char *dst,
 static int audit_match_record(int audit_fd, const __u16 type,
 			      const char *const pattern, __u64 *domain_id)
 {
-	struct audit_message msg;
+	struct audit_message msg, last_mismatch = {};
 	int ret, err = 0;
-	bool matches_record = !type;
+	int num_type_match = 0;
 	regmatch_t matches[2];
 	regex_t regex;
 
@@ -259,21 +259,35 @@ static int audit_match_record(int audit_fd, const __u16 type,
 	if (ret)
 		return -EINVAL;
 
-	do {
+	/*
+	 * Reads records until one matches both the expected type and the
+	 * pattern.  Type-matching records with non-matching content are
+	 * silently consumed, which handles stale domain deallocation records
+	 * from a previous test emitted asynchronously by kworker threads.
+	 */
+	while (true) {
 		memset(&msg, 0, sizeof(msg));
 		err = audit_recv(audit_fd, &msg);
-		if (err)
+		if (err) {
+			if (num_type_match) {
+				printf("DATA: %s\n", last_mismatch.data);
+				printf("ERROR: %d record(s) matched type %u"
+				       " but not pattern: %s\n",
+				       num_type_match, type, pattern);
+			}
 			goto out;
+		}
 
-		if (msg.header.nlmsg_type == type)
-			matches_record = true;
-	} while (!matches_record);
+		if (type && msg.header.nlmsg_type != type)
+			continue;
 
-	ret = regexec(&regex, msg.data, ARRAY_SIZE(matches), matches, 0);
-	if (ret) {
-		printf("DATA: %s\n", msg.data);
-		printf("ERROR: no match for pattern: %s\n", pattern);
-		err = -ENOENT;
+		ret = regexec(&regex, msg.data, ARRAY_SIZE(matches), matches,
+			      0);
+		if (!ret)
+			break;
+
+		num_type_match++;
+		last_mismatch = msg;
 	}
 
 	if (domain_id) {
@@ -316,21 +330,49 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
 				  domain_id);
 }
 
-static int __maybe_unused matches_log_domain_deallocated(
-	int audit_fd, unsigned int num_denials, __u64 *domain_id)
+/*
+ * Matches a domain deallocation record.  When expected_domain_id is non-zero,
+ * the pattern includes the specific domain ID so that stale deallocation
+ * records from a previous test (with a different domain ID) are skipped by
+ * audit_match_record(), and the socket timeout is temporarily increased to
+ * audit_tv_dom_drop to wait for the asynchronous kworker deallocation.
+ */
+static int __maybe_unused
+matches_log_domain_deallocated(int audit_fd, unsigned int num_denials,
+			       __u64 expected_domain_id, __u64 *domain_id)
 {
 	static const char log_template[] = REGEX_LANDLOCK_PREFIX
 		" status=deallocated denials=%u$";
-	char log_match[sizeof(log_template) + 10];
-	int log_match_len;
+	static const char log_template_with_id[] =
+		"^audit([0-9.:]\\+): domain=\\(%llx\\)"
+		" status=deallocated denials=%u$";
+	char log_match[sizeof(log_template_with_id) + 32];
+	int log_match_len, err;
+
+	if (expected_domain_id)
+		log_match_len = snprintf(log_match, sizeof(log_match),
+					 log_template_with_id,
+					 (unsigned long long)expected_domain_id,
+					 num_denials);
+	else
+		log_match_len = snprintf(log_match, sizeof(log_match),
+					 log_template, num_denials);
 
-	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
-				 num_denials);
 	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
-	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
-				  domain_id);
+	if (expected_domain_id)
+		setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO,
+			   &audit_tv_dom_drop, sizeof(audit_tv_dom_drop));
+
+	err = audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
+				 domain_id);
+
+	if (expected_domain_id)
+		setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
+			   sizeof(audit_tv_default));
+
+	return err;
 }
 
 struct audit_records {
diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
index f92ba6774faa..60de97bd0153 100644
--- a/tools/testing/selftests/landlock/audit_test.c
+++ b/tools/testing/selftests/landlock/audit_test.c
@@ -139,23 +139,24 @@ TEST_F(audit, layers)
 	    WEXITSTATUS(status) != EXIT_SUCCESS)
 		_metadata->exit_code = KSFT_FAIL;
 
-	/* Purges log from deallocated domains. */
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_dom_drop, sizeof(audit_tv_dom_drop)));
+	/*
+	 * Purges log from deallocated domains.  Records arrive in LIFO order
+	 * (innermost domain first) because landlock_put_hierarchy() walks the
+	 * chain sequentially in a single kworker context.
+	 */
 	for (i = ARRAY_SIZE(*domain_stack) - 1; i >= 0; i--) {
 		__u64 deallocated_dom = 2;
 
 		EXPECT_EQ(0, matches_log_domain_deallocated(self->audit_fd, 1,
+							    (*domain_stack)[i],
 							    &deallocated_dom));
 		EXPECT_EQ((*domain_stack)[i], deallocated_dom)
 		{
 			TH_LOG("Failed to match domain %llx (#%d)",
-			       (*domain_stack)[i], i);
+			       (unsigned long long)(*domain_stack)[i], i);
 		}
 	}
 	EXPECT_EQ(0, munmap(domain_stack, sizeof(*domain_stack)));
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_default, sizeof(audit_tv_default)));
 	EXPECT_EQ(0, close(ruleset_fd));
 }
 
@@ -270,13 +271,9 @@ TEST_F(audit, thread)
 	EXPECT_EQ(0, close(pipe_parent[1]));
 	ASSERT_EQ(0, pthread_join(thread, NULL));
 
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_dom_drop, sizeof(audit_tv_dom_drop)));
-	EXPECT_EQ(0, matches_log_domain_deallocated(self->audit_fd, 1,
-						    &deallocated_dom));
+	EXPECT_EQ(0, matches_log_domain_deallocated(
+			     self->audit_fd, 1, denial_dom, &deallocated_dom));
 	EXPECT_EQ(denial_dom, deallocated_dom);
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_default, sizeof(audit_tv_default)));
 }
 
 FIXTURE(audit_flags)
@@ -432,22 +429,21 @@ TEST_F(audit_flags, signal)
 
 	if (variant->restrict_flags &
 	    LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF) {
+		/*
+		 * No deallocation record: denials=0 never matches a real
+		 * record.
+		 */
 		EXPECT_EQ(-EAGAIN,
-			  matches_log_domain_deallocated(self->audit_fd, 0,
+			  matches_log_domain_deallocated(self->audit_fd, 0, 0,
 							 &deallocated_dom));
 		EXPECT_EQ(deallocated_dom, 2);
 	} else {
-		EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-					&audit_tv_dom_drop,
-					sizeof(audit_tv_dom_drop)));
 		EXPECT_EQ(0, matches_log_domain_deallocated(self->audit_fd, 2,
+							    *self->domain_id,
 							    &deallocated_dom));
 		EXPECT_NE(deallocated_dom, 2);
 		EXPECT_NE(deallocated_dom, 0);
 		EXPECT_EQ(deallocated_dom, *self->domain_id);
-		EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-					&audit_tv_default,
-					sizeof(audit_tv_default)));
 	}
 }
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 5/5] selftests/landlock: Fix format warning for __u64 in net_test
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable, kernel test robot
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

On architectures where __u64 is unsigned long (e.g. powerpc64), using
%llx to format a __u64 triggers a -Wformat warning because %llx expects
unsigned long long.  Cast the argument to unsigned long long.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: a549d055a22e ("selftests/landlock: Add network tests")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202604020206.62zgOTeP-lkp@intel.com/
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v2:
- New patch.
---
 tools/testing/selftests/landlock/net_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index b34b139b3f89..4c528154ea92 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -1356,7 +1356,7 @@ TEST_F(mini, network_access_rights)
 					    &net_port, 0))
 		{
 			TH_LOG("Failed to add rule with access 0x%llx: %s",
-			       access, strerror(errno));
+			       (unsigned long long)access, strerror(errno));
 		}
 	}
 	EXPECT_EQ(0, close(ruleset_fd));
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 0/5] Fix Landlock audit test flakiness
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang

This series fixes two classes of audit selftest failures plus two minor
bugs in the audit test helpers.

The main issue is that domain deallocation audit records are emitted
asynchronously from kworker threads and can arrive after a previous
test's socket has been closed.  This causes two distinct failure modes:

- audit_match_record() picks up a stale deallocation record from a
  previous test instead of the expected one, causing a domain ID
  mismatch.  The audit.layers test (which reads 16 deallocation records
  in sequence) is particularly vulnerable because the large read window
  allows stale records to interleave.  Patch 4 fixes this by filtering
  deallocation records by domain ID and skipping type-matching records
  with wrong content patterns.

- audit_count_records() counts stale deallocation records from a
  previous test, incrementing records.domain from the expected 0 to 1.
  Patch 3 fixes this by draining stale records at audit_init() time and
  removing records.domain == 0 checks that are not preceded by
  audit_match_record() calls (which would consume stale records).

These races are more likely to manifest when additional instrumentation
changes kworker timing in the deallocation path (e.g. with the upcoming
Landlock tracepoints work).

The two minor fixes (patches 1-2) correct a snprintf truncation check
off-by-one and socket file descriptor leaks on error paths in
audit_init(), audit_init_with_exe_filter(), and audit_cleanup().
Patch 5 fixes a __u64 format warning reported by the kbuild bot on
powerpc64.

Patch 1 is an exact subset of the v1 combined patch, which is why it
carries the Reviewed-by tag.  Patches 2 and 3 extend beyond what was in
v1, so the Reviewed-by is not carried.  Patches 4 and 5 are new.

Changes since v2:
https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
- Patches 4-5: fix __u64 format warnings on powerpc64 (cast to unsigned
  long long for %llx).  Patch 5 is new.

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- Split the combined drain fix into four separate patches.
- Patch 2: extend fd leak fix to audit_init_with_exe_filter() and
  audit_cleanup().
- Patch 3: also remove domain checks from audit.trace and
  scoped_audit.connect_to_child, document constraint, explain why a
  longer drain timeout was rejected.
- Patch 4: new, add domain ID filtering and timeout management to
  matches_log_domain_deallocated(), skip stale records in
  audit_match_record().

Mickaël Salaün (5):
  selftests/landlock: Fix snprintf truncation checks in audit helpers
  selftests/landlock: Fix socket file descriptor leaks in audit helpers
  selftests/landlock: Drain stale audit records on init
  selftests/landlock: Skip stale records in audit_match_record()
  selftests/landlock: Fix format warning for __u64 in net_test

 tools/testing/selftests/landlock/audit.h      | 133 ++++++++++++++----
 tools/testing/selftests/landlock/audit_test.c |  36 ++---
 tools/testing/selftests/landlock/net_test.c   |   2 +-
 .../testing/selftests/landlock/ptrace_test.c  |   1 -
 .../landlock/scoped_abstract_unix_test.c      |   1 -
 5 files changed, 119 insertions(+), 54 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH v3 3/5] selftests/landlock: Drain stale audit records on init
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

Non-audit Landlock tests generate audit records as side effects when
audit_enabled is non-zero (e.g. from boot configuration).  These records
accumulate in the kernel audit backlog while no audit daemon socket is
open.  When the next test opens a new netlink socket and registers as
the audit daemon, the stale backlog is delivered, causing baseline
record count checks to fail spuriously.

Fix this by draining all pending records in audit_init() right after
setting the receive timeout.  The 1-usec SO_RCVTIMEO causes audit_recv()
to return -EAGAIN once the backlog is empty, naturally terminating the
drain loop.

Domain deallocation records are emitted asynchronously from a work
queue, so they may still arrive after the drain.  Remove records.domain
== 0 checks that are not preceded by audit_match_record() calls, which
would otherwise consume stale records before the count.  Document this
constraint above audit_count_records().

Increasing the drain timeout to catch in-flight deallocation records was
considered but rejected: a longer timeout adds latency to every
audit_init() call even when no stale record is pending, and any fixed
timeout is still not guaranteed to catch all records under load.
Removing the unprotected checks is simpler and avoids the spurious
failures.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- Also remove domain checks from audit.trace and
  scoped_audit.connect_to_child.
- Document records.domain == 0 constraint above
  audit_count_records().
- Explain why a longer drain timeout was rejected.
- Drop Reviewed-by (new code comment not in v1).
- Split snprintf and fd leak fixes into separate patches.
---
 tools/testing/selftests/landlock/audit.h      | 19 +++++++++++++++++++
 tools/testing/selftests/landlock/audit_test.c |  2 --
 .../testing/selftests/landlock/ptrace_test.c  |  1 -
 .../landlock/scoped_abstract_unix_test.c      |  1 -
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 6422943fc69e..74e1c3d763be 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -338,6 +338,15 @@ struct audit_records {
 	size_t domain;
 };
 
+/*
+ * WARNING: Do not assert records.domain == 0 without a preceding
+ * audit_match_record() call.  Domain deallocation records are emitted
+ * asynchronously from kworker threads and can arrive after the drain in
+ * audit_init(), corrupting the domain count.  A preceding audit_match_record()
+ * call consumes stale records while scanning, making the assertion safe in
+ * practice because stale deallocation records arrive before the expected access
+ * records.
+ */
 static int audit_count_records(int audit_fd, struct audit_records *records)
 {
 	struct audit_message msg;
@@ -393,6 +402,16 @@ static int audit_init(void)
 		goto err_close;
 	}
 
+	/*
+	 * Drains stale audit records that accumulated in the kernel backlog
+	 * while no audit daemon socket was open.  This happens when non-audit
+	 * Landlock tests generate records while audit_enabled is non-zero (e.g.
+	 * from boot configuration), or when domain deallocation records arrive
+	 * asynchronously after a previous test's socket was closed.
+	 */
+	while (audit_recv(fd, NULL) == 0)
+		;
+
 	return fd;
 
 err_close:
diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
index 46d02d49835a..f92ba6774faa 100644
--- a/tools/testing/selftests/landlock/audit_test.c
+++ b/tools/testing/selftests/landlock/audit_test.c
@@ -412,7 +412,6 @@ TEST_F(audit_flags, signal)
 		} else {
 			EXPECT_EQ(1, records.access);
 		}
-		EXPECT_EQ(0, records.domain);
 
 		/* Updates filter rules to match the drop record. */
 		set_cap(_metadata, CAP_AUDIT_CONTROL);
@@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open)
 	/* Tests that there was no denial until now. */
 	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
 	EXPECT_EQ(0, records.access);
-	EXPECT_EQ(0, records.domain);
 
 	/*
 	 * Wait for the child to do a first denied action by layer1 and
diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
index 4f64c90583cd..1b6c8b53bf33 100644
--- a/tools/testing/selftests/landlock/ptrace_test.c
+++ b/tools/testing/selftests/landlock/ptrace_test.c
@@ -342,7 +342,6 @@ TEST_F(audit, trace)
 	/* Makes sure there is no superfluous logged records. */
 	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
 	EXPECT_EQ(0, records.access);
-	EXPECT_EQ(0, records.domain);
 
 	yama_ptrace_scope = get_yama_ptrace_scope();
 	ASSERT_LE(0, yama_ptrace_scope);
diff --git a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
index 72f97648d4a7..c47491d2d1c1 100644
--- a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
+++ b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
@@ -312,7 +312,6 @@ TEST_F(scoped_audit, connect_to_child)
 	/* Makes sure there is no superfluous logged records. */
 	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
 	EXPECT_EQ(0, records.access);
-	EXPECT_EQ(0, records.domain);
 
 	ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
 	ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
-- 
2.53.0


^ permalink raw reply related

* Re: LSM namespacing API
From: Paul Moore @ 2026-04-02 19:31 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Casey Schaufler, Stephen Smalley, Ondrej Mosnacek,
	linux-security-module, selinux, John Johansen
In-Reply-To: <5e210223-f9a4-4613-8c4b-bea5eea7f8c0@schaufler-ca.com>

On Thu, Apr 2, 2026 at 1:49 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/2/2026 3:59 AM, Dr. Greg wrote:
> > That still leaves the question of whether or not CAP_MAC_ADMIN is
> > appropriate for gating the creation of a new security namespace.
>
> That will have to be up to the individual LSMs.

Yes, exactly.

> Not all LSMs implement Mandatory Access Controls.

... and not all LSMs that implement mandatory access controls rely on
CAP_MAC_ADMIN to gate configuration changes.

-- 
paul-moore.com

^ permalink raw reply

* [PATCH v3 2/5] selftests/landlock: Fix socket file descriptor leaks in audit helpers
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

audit_init() opens a netlink socket and configures it, but leaks the
file descriptor if audit_set_status() or setsockopt() fails.  Fix this
by jumping to an error path that closes the socket before returning.

Apply the same fix to audit_init_with_exe_filter(), which leaks the file
descriptor from audit_init() if audit_init_filter_exe() or
audit_filter_exe() fails, and to audit_cleanup(), which leaks it if
audit_init_filter_exe() fails in FIXTURE_TEARDOWN_PARENT().

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- New patch (split from the drain fix, extended to
  audit_init_with_exe_filter() and audit_cleanup()).
---
 tools/testing/selftests/landlock/audit.h | 26 +++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 1049a0582af5..6422943fc69e 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -379,19 +379,25 @@ static int audit_init(void)
 
 	err = audit_set_status(fd, AUDIT_STATUS_ENABLED, 1);
 	if (err)
-		return err;
+		goto err_close;
 
 	err = audit_set_status(fd, AUDIT_STATUS_PID, getpid());
 	if (err)
-		return err;
+		goto err_close;
 
 	/* Sets a timeout for negative tests. */
 	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
 			 sizeof(audit_tv_default));
-	if (err)
-		return -errno;
+	if (err) {
+		err = -errno;
+		goto err_close;
+	}
 
 	return fd;
+
+err_close:
+	close(fd);
+	return err;
 }
 
 static int audit_init_filter_exe(struct audit_filter *filter, const char *path)
@@ -441,8 +447,10 @@ static int audit_cleanup(int audit_fd, struct audit_filter *filter)
 
 		filter = &new_filter;
 		err = audit_init_filter_exe(filter, NULL);
-		if (err)
+		if (err) {
+			close(audit_fd);
 			return err;
+		}
 	}
 
 	/* Filters might not be in place. */
@@ -468,11 +476,15 @@ static int audit_init_with_exe_filter(struct audit_filter *filter)
 
 	err = audit_init_filter_exe(filter, NULL);
 	if (err)
-		return err;
+		goto err_close;
 
 	err = audit_filter_exe(fd, filter, AUDIT_ADD_RULE);
 	if (err)
-		return err;
+		goto err_close;
 
 	return fd;
+
+err_close:
+	close(fd);
+	return err;
 }
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v3 5/5] selftests/landlock: Fix format warning for __u64 in net_test
From: Günther Noack @ 2026-04-02 20:21 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable, kernel test robot
In-Reply-To: <20260402192608.1458252-6-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:06PM +0200, Mickaël Salaün wrote:
> On architectures where __u64 is unsigned long (e.g. powerpc64), using
> %llx to format a __u64 triggers a -Wformat warning because %llx expects
> unsigned long long.  Cast the argument to unsigned long long.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: a549d055a22e ("selftests/landlock: Add network tests")
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/r/202604020206.62zgOTeP-lkp@intel.com/
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v2:
> - New patch.
> ---
>  tools/testing/selftests/landlock/net_test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
> index b34b139b3f89..4c528154ea92 100644
> --- a/tools/testing/selftests/landlock/net_test.c
> +++ b/tools/testing/selftests/landlock/net_test.c
> @@ -1356,7 +1356,7 @@ TEST_F(mini, network_access_rights)
>  					    &net_port, 0))
>  		{
>  			TH_LOG("Failed to add rule with access 0x%llx: %s",
> -			       access, strerror(errno));
> +			       (unsigned long long)access, strerror(errno));
>  		}
>  	}
>  	EXPECT_EQ(0, close(ruleset_fd));
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH v3 2/5] selftests/landlock: Fix socket file descriptor leaks in audit helpers
From: Günther Noack @ 2026-04-02 20:25 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-3-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:03PM +0200, Mickaël Salaün wrote:
> audit_init() opens a netlink socket and configures it, but leaks the
> file descriptor if audit_set_status() or setsockopt() fails.  Fix this
> by jumping to an error path that closes the socket before returning.
> 
> Apply the same fix to audit_init_with_exe_filter(), which leaks the file
> descriptor from audit_init() if audit_init_filter_exe() or
> audit_filter_exe() fails, and to audit_cleanup(), which leaks it if
> audit_init_filter_exe() fails in FIXTURE_TEARDOWN_PARENT().
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - New patch (split from the drain fix, extended to
>   audit_init_with_exe_filter() and audit_cleanup()).
> ---
>  tools/testing/selftests/landlock/audit.h | 26 +++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 1049a0582af5..6422943fc69e 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -379,19 +379,25 @@ static int audit_init(void)
>  
>  	err = audit_set_status(fd, AUDIT_STATUS_ENABLED, 1);
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	err = audit_set_status(fd, AUDIT_STATUS_PID, getpid());
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	/* Sets a timeout for negative tests. */
>  	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
>  			 sizeof(audit_tv_default));
> -	if (err)
> -		return -errno;
> +	if (err) {
> +		err = -errno;
> +		goto err_close;
> +	}
>  
>  	return fd;
> +
> +err_close:
> +	close(fd);
> +	return err;
>  }
>  
>  static int audit_init_filter_exe(struct audit_filter *filter, const char *path)
> @@ -441,8 +447,10 @@ static int audit_cleanup(int audit_fd, struct audit_filter *filter)
>  
>  		filter = &new_filter;
>  		err = audit_init_filter_exe(filter, NULL);
> -		if (err)
> +		if (err) {
> +			close(audit_fd);
>  			return err;
> +		}
>  	}
>  
>  	/* Filters might not be in place. */
> @@ -468,11 +476,15 @@ static int audit_init_with_exe_filter(struct audit_filter *filter)
>  
>  	err = audit_init_filter_exe(filter, NULL);
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	err = audit_filter_exe(fd, filter, AUDIT_ADD_RULE);
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	return fd;
> +
> +err_close:
> +	close(fd);
> +	return err;
>  }
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH v3 3/5] selftests/landlock: Drain stale audit records on init
From: Günther Noack @ 2026-04-02 20:28 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-4-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:04PM +0200, Mickaël Salaün wrote:
> Non-audit Landlock tests generate audit records as side effects when
> audit_enabled is non-zero (e.g. from boot configuration).  These records
> accumulate in the kernel audit backlog while no audit daemon socket is
> open.  When the next test opens a new netlink socket and registers as
> the audit daemon, the stale backlog is delivered, causing baseline
> record count checks to fail spuriously.
> 
> Fix this by draining all pending records in audit_init() right after
> setting the receive timeout.  The 1-usec SO_RCVTIMEO causes audit_recv()
> to return -EAGAIN once the backlog is empty, naturally terminating the
> drain loop.
> 
> Domain deallocation records are emitted asynchronously from a work
> queue, so they may still arrive after the drain.  Remove records.domain
> == 0 checks that are not preceded by audit_match_record() calls, which
> would otherwise consume stale records before the count.  Document this
> constraint above audit_count_records().
> 
> Increasing the drain timeout to catch in-flight deallocation records was
> considered but rejected: a longer timeout adds latency to every
> audit_init() call even when no stale record is pending, and any fixed
> timeout is still not guaranteed to catch all records under load.
> Removing the unprotected checks is simpler and avoids the spurious
> failures.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - Also remove domain checks from audit.trace and
>   scoped_audit.connect_to_child.
> - Document records.domain == 0 constraint above
>   audit_count_records().
> - Explain why a longer drain timeout was rejected.
> - Drop Reviewed-by (new code comment not in v1).
> - Split snprintf and fd leak fixes into separate patches.
> ---
>  tools/testing/selftests/landlock/audit.h      | 19 +++++++++++++++++++
>  tools/testing/selftests/landlock/audit_test.c |  2 --
>  .../testing/selftests/landlock/ptrace_test.c  |  1 -
>  .../landlock/scoped_abstract_unix_test.c      |  1 -
>  4 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 6422943fc69e..74e1c3d763be 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -338,6 +338,15 @@ struct audit_records {
>  	size_t domain;
>  };
>  
> +/*
> + * WARNING: Do not assert records.domain == 0 without a preceding
> + * audit_match_record() call.  Domain deallocation records are emitted
> + * asynchronously from kworker threads and can arrive after the drain in
> + * audit_init(), corrupting the domain count.  A preceding audit_match_record()
> + * call consumes stale records while scanning, making the assertion safe in
> + * practice because stale deallocation records arrive before the expected access
> + * records.
> + */
>  static int audit_count_records(int audit_fd, struct audit_records *records)
>  {
>  	struct audit_message msg;
> @@ -393,6 +402,16 @@ static int audit_init(void)
>  		goto err_close;
>  	}
>  
> +	/*
> +	 * Drains stale audit records that accumulated in the kernel backlog
> +	 * while no audit daemon socket was open.  This happens when non-audit
> +	 * Landlock tests generate records while audit_enabled is non-zero (e.g.
> +	 * from boot configuration), or when domain deallocation records arrive
> +	 * asynchronously after a previous test's socket was closed.
> +	 */
> +	while (audit_recv(fd, NULL) == 0)
> +		;
> +
>  	return fd;
>  
>  err_close:
> diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
> index 46d02d49835a..f92ba6774faa 100644
> --- a/tools/testing/selftests/landlock/audit_test.c
> +++ b/tools/testing/selftests/landlock/audit_test.c
> @@ -412,7 +412,6 @@ TEST_F(audit_flags, signal)
>  		} else {
>  			EXPECT_EQ(1, records.access);
>  		}
> -		EXPECT_EQ(0, records.domain);
>  
>  		/* Updates filter rules to match the drop record. */
>  		set_cap(_metadata, CAP_AUDIT_CONTROL);
> @@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open)
>  	/* Tests that there was no denial until now. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> -	EXPECT_EQ(0, records.domain);
>  
>  	/*
>  	 * Wait for the child to do a first denied action by layer1 and
> diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
> index 4f64c90583cd..1b6c8b53bf33 100644
> --- a/tools/testing/selftests/landlock/ptrace_test.c
> +++ b/tools/testing/selftests/landlock/ptrace_test.c
> @@ -342,7 +342,6 @@ TEST_F(audit, trace)
>  	/* Makes sure there is no superfluous logged records. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> -	EXPECT_EQ(0, records.domain);
>  
>  	yama_ptrace_scope = get_yama_ptrace_scope();
>  	ASSERT_LE(0, yama_ptrace_scope);
> diff --git a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> index 72f97648d4a7..c47491d2d1c1 100644
> --- a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> +++ b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> @@ -312,7 +312,6 @@ TEST_F(scoped_audit, connect_to_child)
>  	/* Makes sure there is no superfluous logged records. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> -	EXPECT_EQ(0, records.domain);
>  
>  	ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
>  	ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH v3 1/5] selftests/landlock: Fix snprintf truncation checks in audit helpers
From: Günther Noack @ 2026-04-02 20:30 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-2-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:02PM +0200, Mickaël Salaün wrote:
> snprintf() returns the number of characters that would have been
> written, excluding the terminating NUL byte.  When the output is
> truncated, this return value equals or exceeds the buffer size.  Fix
> matches_log_domain_allocated() and matches_log_domain_deallocated() to
> detect truncation with ">=" instead of ">".
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Reviewed-by: Günther Noack <gnoack@google.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - New patch (split from the drain fix).
> ---
>  tools/testing/selftests/landlock/audit.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 44eb433e9666..1049a0582af5 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
>  
>  	log_match_len =
>  		snprintf(log_match, sizeof(log_match), log_template, pid);
> -	if (log_match_len > sizeof(log_match))
> +	if (log_match_len >= sizeof(log_match))
>  		return -E2BIG;
>  
>  	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
> @@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocated(
>  
>  	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
>  				 num_denials);
> -	if (log_match_len > sizeof(log_match))
> +	if (log_match_len >= sizeof(log_match))
>  		return -E2BIG;
>  
>  	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

(I noticed the Reviewed-by tag was already there, re-sending to
confirm that this also applies to this subset of the original patch)

–Günther

^ permalink raw reply

* Re: [PATCH v3 0/5] Fix Landlock audit test flakiness
From: Günther Noack @ 2026-04-02 20:52 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

Hello!

On Thu, Apr 02, 2026 at 09:26:01PM +0200, Mickaël Salaün wrote:
> This series fixes two classes of audit selftest failures plus two minor
> bugs in the audit test helpers.
> 
> The main issue is that domain deallocation audit records are emitted
> asynchronously from kworker threads and can arrive after a previous
> test's socket has been closed.  This causes two distinct failure modes:
> 
> - audit_match_record() picks up a stale deallocation record from a
>   previous test instead of the expected one, causing a domain ID
>   mismatch.  The audit.layers test (which reads 16 deallocation records
>   in sequence) is particularly vulnerable because the large read window
>   allows stale records to interleave.  Patch 4 fixes this by filtering
>   deallocation records by domain ID and skipping type-matching records
>   with wrong content patterns.
> 
> - audit_count_records() counts stale deallocation records from a
>   previous test, incrementing records.domain from the expected 0 to 1.
>   Patch 3 fixes this by draining stale records at audit_init() time and
>   removing records.domain == 0 checks that are not preceded by
>   audit_match_record() calls (which would consume stale records).
> 
> These races are more likely to manifest when additional instrumentation
> changes kworker timing in the deallocation path (e.g. with the upcoming
> Landlock tracepoints work).
> 
> The two minor fixes (patches 1-2) correct a snprintf truncation check
> off-by-one and socket file descriptor leaks on error paths in
> audit_init(), audit_init_with_exe_filter(), and audit_cleanup().
> Patch 5 fixes a __u64 format warning reported by the kbuild bot on
> powerpc64.
> 
> Patch 1 is an exact subset of the v1 combined patch, which is why it
> carries the Reviewed-by tag.  Patches 2 and 3 extend beyond what was in
> v1, so the Reviewed-by is not carried.  Patches 4 and 5 are new.
> 
> Changes since v2:
> https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
> - Patches 4-5: fix __u64 format warnings on powerpc64 (cast to unsigned
>   long long for %llx).  Patch 5 is new.
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - Split the combined drain fix into four separate patches.
> - Patch 2: extend fd leak fix to audit_init_with_exe_filter() and
>   audit_cleanup().
> - Patch 3: also remove domain checks from audit.trace and
>   scoped_audit.connect_to_child, document constraint, explain why a
>   longer drain timeout was rejected.
> - Patch 4: new, add domain ID filtering and timeout management to
>   matches_log_domain_deallocated(), skip stale records in
>   audit_match_record().
> 
> Mickaël Salaün (5):
>   selftests/landlock: Fix snprintf truncation checks in audit helpers
>   selftests/landlock: Fix socket file descriptor leaks in audit helpers
>   selftests/landlock: Drain stale audit records on init
>   selftests/landlock: Skip stale records in audit_match_record()
>   selftests/landlock: Fix format warning for __u64 in net_test
> 
>  tools/testing/selftests/landlock/audit.h      | 133 ++++++++++++++----
>  tools/testing/selftests/landlock/audit_test.c |  36 ++---
>  tools/testing/selftests/landlock/net_test.c   |   2 +-
>  .../testing/selftests/landlock/ptrace_test.c  |   1 -
>  .../landlock/scoped_abstract_unix_test.c      |   1 -
>  5 files changed, 119 insertions(+), 54 deletions(-)
> 
> -- 
> 2.53.0
> 

I am still getting flaky audit tests even with these patches, I am
afraid.  It differs which of these tests is flaking, some of them
still do, for example:

#  RUN           audit_layout1.remove_dir ...
# fs_test.c:7281:remove_dir:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.remove_dir", dir_s1d2) (-11)
# remove_dir: Test failed
#          ❌ FAIL  audit_layout1.remove_dir
not ok 191 audit_layout1.remove_dir
#  RUN           audit_layout1.read_dir ...
#            ✅ OK  audit_layout1.read_dir
ok 192 audit_layout1.read_dir
#  RUN           audit_layout1.read_file ...
#            ✅ OK  audit_layout1.read_file
ok 193 audit_layout1.read_file
#  RUN           audit_layout1.write_file ...
# fs_test.c:7221:write_file:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.write_file", file1_s1d1) (-11)
# fs_test.c:7224:write_file:Expected 0 (0) == records.access (1)
# write_file: Test failed
#          ❌ FAIL  audit_layout1.write_file
not ok 194 audit_layout1.write_file

My kernel config is this:

    make defconfig
    make kvm_guest.config
    KCONFIG_CONFIG="${KBUILD_OUTPUT}/.config" ./scripts/kconfig/merge_config.sh "${KBUILD_OUTPUT}/.config" tools/testing/selftests/landlock/config
    make debug.config
    echo "CONFIG_RANDOMIZE_BASE=n" >> "${KBUILD_OUTPUT}/.config"
    make olddefconfig

and then I run the selftests in Qemu with these flags:

qemu-system-x86_64 \
    -nographic \
    -m 4G \
    -enable-kvm \
    -append "console=ttyS0 lsm=landlock no_hash_pointers" \
    -kernel "${KBUILD_OUTPUT}/arch/x86/boot/bzImage" \
    -initrd "${INITRAMFS}"

This is using my own selftest runner scripts which builds an initramfs
with the statically linked selftests.

Do you have a hunch what might be missing there?  In the test run
above, I have applied your V4 patch set on top of the current master,
5619b098e2fbf3a23bf13d91897056a1fe238c6d ("Merge tag 'for-7.0-rc6-tag'
of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux").

–Günther

^ permalink raw reply

* Re: [PATCH v3 0/5] Fix Landlock audit test flakiness
From: Günther Noack @ 2026-04-02 20:57 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang
In-Reply-To: <20260402.eb5c4e85f472@gnoack.org>

On Thu, Apr 02, 2026 at 10:52:46PM +0200, Günther Noack wrote:
> My kernel config is this:
> 
>     make defconfig
>     make kvm_guest.config
>     KCONFIG_CONFIG="${KBUILD_OUTPUT}/.config" ./scripts/kconfig/merge_config.sh "${KBUILD_OUTPUT}/.config" tools/testing/selftests/landlock/config
>     make debug.config
>     echo "CONFIG_RANDOMIZE_BASE=n" >> "${KBUILD_OUTPUT}/.config"
>     make olddefconfig

P.S.: I should point out, everytime that I have observed these
flakiness problems with the audit tests, it was in this debug config.
I suspect that it adds delays in a way that makes it more likely.

–Günther

^ permalink raw reply

* Re: LSM namespacing API
From: Paul Moore @ 2026-04-02 21:04 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Stephen Smalley, Ondrej Mosnacek, linux-security-module, selinux,
	John Johansen
In-Reply-To: <ac5MKr4lFQhc44i6@wind.enjellic.com>

On Thu, Apr 2, 2026 at 7:00 AM Dr. Greg <greg@enjellic.com> wrote:
> On Sun, Mar 29, 2026 at 08:56:37PM -0400, Paul Moore wrote:
> > On Sun, Mar 29, 2026 at 12:09???PM Dr. Greg <greg@enjellic.com> wrote:
> > > On Tue, Mar 24, 2026 at 05:31:09PM -0400, Paul Moore wrote:
> > > > On Tue, Mar 3, 2026 at 11:46???AM Paul Moore <paul@paul-moore.com> wrote:

...

> Christian had proposed patches for a generic mechanism to create
> LSM security namespace blobs, is implementation of that in scope for
> this effort?

That isn't what Christian proposed, although I can understand how a
quick glance at the patchset would lead you to believe that (I had the
same misunderstanding while skimming my inbox on my phone while
traveling).  I suggest reviewing Christian's post again as well as the
related Landlock patchset which is the first to use the hooks
Christian proposed.

> > > It would seem that the flags variable might be a good option to use to
> > > handle this 2-stage transition, for example LSM_NS_INIT and
> > > LSM_NS_CHANGE, respectively, to specify the initialization and
> > > execution phases of the transition.
>
> > No.  The lsm_unshare() syscall is intended to mimic the existing
> > unshare() syscall as a single step process from a user's
> > perspective.  If it returns successfully the caller will be in a new
> > LSM namespace as defined by the individual LSM specified in the
> > syscall.
>
> OK, we can reason forward with that paradigm.
>
> An orchestrator issues the unshare call for an LSM namespace and upon
> return from the system call the calling task is in a new namespace for
> that particular LSM ...

Yes.

> ... the goal of which is presumably to implement a
> security policy/model different than what had been in force
> previously.

Maybe.  That is dependent on the individual LSM, I don't want to
encode any assumptions on this at the LSM framework layer.

> So the process is in a new LSM specific namespace, but still
> implementing the policy from the previous namespace, until the
> orchestrator can load the new policy and then trigger the LSM to
> change from its previous policy to the newly loaded policy.
>
> Is this consistent with your vision as to how all of this will work?

No.  What an individual LSM does upon creation of a new namespace via
lsm_unshare() is entirely up to that LSM.  The LSM may choose to bound
the new namespace by the parent's policy, or it may choose a
non-hierarchical relationship where the new namespace remains entirely
separate from the parent.  The LSM may start the new namespace in an
uninitialized state (similar to early boot), initialized with a
default policy, initialized with the parent's policy, or something
else.

> > > The other unanswered issue, or perhaps we missed it, are the security
> > > controls that should be associated with the unshare call.
>
> > Each LSM is free to implement whatever access controls it deems
> > necessary in its lsm_unshare() callback.
>
> Just to be clear.
>
> When you refer to 'lsm_unshare() callback' are you referring to a new
> LSM security hook to be be implemented that will allow all of the
> active LSM's to pass judgement on whether or not the unshare should be
> allowed to complete successfully?

No.  The lsm_unshare() callback is the individual LSM provided
function that the LSM framework calls when the lsm_unshare() syscall
is invoked.  Put another way, the lsm_unshare() callback is the
function specified by a LSM, using the LSM_HOOK_INIT() macro, that is
called by the lsm_unshare() syscall.

> > > Will there be a new LSM hook that allows other LSM's to veto the
> > > creation of a namespace either for itself or for another LSM?
> >
> > I would expect the lsm_unshare() syscall to operate similarly to the
> > lsm_set_self_attr() syscall in this regard.
>
> The reference to handling this like lsm_set_self_attr() is unclear.
>
> With lsm_set_self_attr() there is no reason for another LSM to deny
> setting what is an LSM specific attribute, as you note above, each LSM
> gets to decide if the request to set an attribute for the LSM should
> be accepted or denied.

No.  LSM "A" gets to decide if LSM "A" can create a new namespace
using the lsm_unshare() syscall, LSM "B" does not get to enforce any
policy on LSM "A"'s decision.

> Since lsm_unshare() is changing the overall platform security state,
> it seems consistent with the design of the LSM for other LSM's to be
> able to veto this action.

No.  This is not consistent with either the design or general
conventions associated with LSM development.

> Once again, this seems like an action that would be consistent with
> the notion of the lockdown LSM,

No.

> > > Should there be an option to completely compile LSM namespaces out of
> > > the kernel?
>
> > That doesn't belong in the LSM framework layer, that is up to the
> > individual LSMs.
>
> You noted above the desire for lsm_unshare to be consistent with other
> namespaces.
>
> The current kernel paradigm is to allow classes of namespace
> resources, ie. CONFIG_UTS_NS, CONFIG_TIME_NS et.al., to be compiled in
> our out of the kernel.
>
> It seems that CONFIG_LSM_NS would be consistent with that model.

CONFIG_UTS_NS does not have multiple radically different
implementations underneath it.  Comparing any of the existing Kconfig
namespace knobs to what we are attempting to do with the LSM framework
is going to be difficult due to some inherent differences between the
two things.

The lsm_unshare() syscall is simply an API abstraction intended to
make it easier for userspace to interact with the individual LSMs;
instead of dealing with multiple different namespacing APIs, one for
each LSM, lsm_unshare() provides a single interface to make app devs'
lives easier.

If a individual LSM wants to provide a Kconfig knob to toggle their
namespace support they are welcome to do so, lsm_unshare() should
exist regardless and return an error code if the desired LSM does not
implement namespace support in the particular kernel build.

> > > > * Implement /proc/pid/ns/lsm and setns(CLONE_NEWLSM)
> > > >
> > > > As discussed previously, this allows us to move a process into an
> > > > existing, established LSM namespace set.  The caller cannot
> > > > selectively choose which individual LSM namespaces they join from the
> > > > given LSM namespace set, they receive the same LSM namespace
> > > > configuration as the target process.
> > >
> > > As an initial aside.  It would be assumed that a positive result of a
> > > setns call would be to cause the calling process to atomically change
> > > its security namespace set.  This would further suggest the need to
> > > have the security namespace creation process also execute atomically
> > > in a multi-LSM namespace change environment.
>
> > In the setns case no new LSM namespaces should be created, the process
> > simply joins an existing set of LSM namespaces.
>
> The issue isn't about new namespaces being created, the issue is
> atomicity of a change to a new set of security policies.
>
> With setns an atomic transition is implemented.
>
> The proposed lsm_unshare() behavior results in a period of time when
> multiple and varying security policies are active, depending on
> various race issues in the orchestrator implementation.
>
> This opens the door to a raft of potential security issues that we can
> have a new acronym for, Time Of Implementation Time Of Use (TOITOU).

I would expect that any LSM implementing namespaces would have
sufficient protections/locking in place to ensure that processes and
namespaces remain in a consistent state outside of the
protected/locked regions.  It is reasonable for one process to attempt
the creation of a new namespace while another attempts to join the
namespace of the process creating the new namespace.  This is not
really a new problem in systems programming, and is one reason why
synchronization mechanisms exist.  Once again, we do not want to force
any particular solution at the LSM framework layer as the
synchonization mechanisms will likely be very LSM dependent.

> > > ... That is the concept of whether or not a setns
> > > call, for any resource namespace, should also force a security
> > > namespace change if the security namespace of the calling process
> > > differs from that of the target process.
>
> > That decision is left to the individual LSMs.
>
> That is reasonable.
>
> In order to support that model, there would seem to be a need to have
> a new LSM call in the setns code that allows LSM's to determine
> whether or not a change in the active security namespace set should be
> forced, correct?

Possibly.  I think we need to see some RFC code to see how this would
look, but I think the LSM implementation inside the setns() syscall
would need to be done in two stages: the first to "prepare" the join
operation where permissions checks are performed (if desired by the
individual LSM) and any operations that could fail are done; the
second stage would be very basic and simply finish the join operation
without any risk of failure.  An individual LSM could fail the join
operation for a variety of reasons in stage 1, causing the entire
setns() operation to fail, but once we progress to stage 2 the
operation should succeed.

At this point I'm not too bothered by how we do this as it is an
implementation detail buried within the setns() implementation and not
really an API issue.  We could create a single LSM hook that is called
within sys_setns(), or we could leverage the existing two-stage
process within sys_setns() and implement the two LSM stages as two LSM
hooks.  The first option would be more complicated from a LSM
perspective, but cleaner from a nsproxy.c perspective (that alone
could make it the more preferable option).  The latter option would
result in cleaner, thinner LSM hooks, but it would likley add
complexity to ns_common and/or nsset.  As I said earlier, this is a
decision that will likely be decided by how the code ends up looking.

> If so, is implementation of this in scope for the lsm_unshare()
> infrastructure?

No.  The lsm_unshare() syscall would only operate on one LSM at a time
so a two stage process isn't needed at the LSM framework layer.  It is
possible that an individual LSM may want to implement a two-stage
transaction in their lsm_unshare() callback, but that is their
decision.

> To close, at the risk of being the devils advocate.
>
> Given that the sentiment is to force almost all of these
> issues/decisions into the individual LSM's, what is the advantage of
> having a common lsm_unshare() system call?

A single uniform API for userspace applications that wish to make use
of LSM namespaces.  Ideally we want to leverage the existing kernel
APIs, e.g. procfs and setns(), but others, e.g. clone(), remain
impractical due to a combination of technical and political reasons
(we've already discussed some of the former, the latter is a rathole
discussion I'm not going to engage in at the moment).

> In the proposed model, a resource orchestrator is going to need to
> have extensive knowledge over the mechanics of all the LSM's that
> implement namespace functionality.

Maybe.  I don't think orchestrators will need to have "extensive"
knowledge of the individual LSMs, although this largely depends on
what you define as "extensive".

I also want to get ahead of this and say that I have absolutely zero
desire to debate this point with you at the moment.  It's an argument
without end and the discussion is unlikely to yield anything specific
enough to be helpful.

> At a very minimum, intrinsic to
> the concept of security namespaces, there will be a need to load a new
> policy or model into the namespace, an action that will be deeply LSM
> specific.

Possibly, as this is once again very LSM dependent.  Some LSMs may not
need a new policy loaded when they create a new namespace.

I will also, once again, point you at the LSM policy loading syscall
ideas.  While on hold, we've already discussed that they should be
namespace aware and potentially have the ability to trigger new LSM
namespace creation.

> At this point, the only common functionality may be the allocation of
> a new LSM namespace 'blob'.

Now you are starting to get it.  The LSM framework exists primarily as
a multiplexing layer hidden beneath an API.  Originally the API was
only for internal kernel users, but recently we started providing a
userspace syscall API.

-- 
paul-moore.com

^ permalink raw reply

* [PATCH v4 0/3] Fix incorrect overlayfs mmap() and mprotect() LSM access controls
From: Paul Moore @ 2026-04-03  3:08 UTC (permalink / raw)
  To: linux-security-module, selinux, linux-fsdevel, linux-unionfs,
	linux-erofs
  Cc: Amir Goldstein, Gao Xiang, Christian Brauner

Another week, another revision to this patchset.  The v3 revision can be
found at the lore[1] link below.

The revision still takes the same basic approach introduced in v2, with
the most significant change in v4 being the change to make the backing
file LSM blob conditional on CONFIG_SECURITY.  This requires a number of
other changes to ensure that all accesses of the LSM blob go through a
set of accessor functions which can be converted into dummy functions
when !CONFIG_SECURITY.

While the changes between v3 and v4 were fairly straight forward, there
were enough of them that it felt wrong to preserve the ACKs from previous
revisions.  It would be appreciated if those of you who had previously
ACK'd a patch could take a second look and renew your ACK (or comment on
the problem preventing you from ACK'ing).

Thanks all.

[1] https://lore.kernel.org/linux-security-module/20260327220446.353103-4-paul@paul-moore.com/

--
CHANGELOG:
v4:
- added fs prep patch (Amir)
- added CONFIG_SECURITY conditional code (Amir)
v3:
- fix the LSM hook stubs (kernel robot, Ryan Lee)
- fix the lsm_backing_file_cache allocation size (Ryan Lee)
- minor style, simplicity tweaks to the SELinux patch
v2:
- remove the user O_PATH file patch from Amir
- add the backing_file LSM blob and lifecycle hooks
- update the SELinux code to reflect the other changes
v1:
- initial version

--
Amir Goldstein (1):
      fs: prepare for adding LSM blob to backing_file

Paul Moore (2):
      lsm: add backing_file LSM hooks
      selinux: fix overlayfs mmap() and mprotect() access checks

 fs/backing-file.c                 |   18 +-
 fs/erofs/ishare.c                 |   10 +
 fs/file_table.c                   |   43 ++++-
 fs/fuse/passthrough.c             |    2 
 fs/internal.h                     |    3 
 fs/overlayfs/dir.c                |    2 
 fs/overlayfs/file.c               |    2 
 include/linux/backing-file.h      |    4 
 include/linux/fs.h                |   13 +
 include/linux/lsm_audit.h         |    2 
 include/linux/lsm_hook_defs.h     |    5 
 include/linux/lsm_hooks.h         |    1 
 include/linux/security.h          |   22 ++
 security/lsm.h                    |    1 
 security/lsm_init.c               |    9 +
 security/security.c               |  102 +++++++++++
 security/selinux/hooks.c          |  256 +++++++++++++++++++++---------
 security/selinux/include/objsec.h |   11 +
 18 files changed, 419 insertions(+), 87 deletions(-)


^ permalink raw reply

* [PATCH v4 1/3] fs: prepare for adding LSM blob to backing_file
From: Paul Moore @ 2026-04-03  3:08 UTC (permalink / raw)
  To: linux-security-module, selinux, linux-fsdevel, linux-unionfs,
	linux-erofs
  Cc: Amir Goldstein, Gao Xiang, Christian Brauner
In-Reply-To: <20260403030848.731867-5-paul@paul-moore.com>

From: Amir Goldstein <amir73il@gmail.com>

In preparation to adding LSM blob to backing_file struct, factor out
helpers init_backing_file() and backing_file_free().

Cc: stable@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: linux-erofs@lists.ozlabs.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
[PM: use the term "LSM blob", fix comment style to match file]
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 fs/file_table.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index aaa5faaace1e..3b3792903185 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -66,6 +66,12 @@ void backing_file_set_user_path(struct file *f, const struct path *path)
 }
 EXPORT_SYMBOL_GPL(backing_file_set_user_path);
 
+static inline void backing_file_free(struct backing_file *ff)
+{
+	path_put(&ff->user_path);
+	kmem_cache_free(bfilp_cachep, ff);
+}
+
 static inline void file_free(struct file *f)
 {
 	security_file_free(f);
@@ -73,8 +79,7 @@ static inline void file_free(struct file *f)
 		percpu_counter_dec(&nr_files);
 	put_cred(f->f_cred);
 	if (unlikely(f->f_mode & FMODE_BACKING)) {
-		path_put(backing_file_user_path(f));
-		kmem_cache_free(bfilp_cachep, backing_file(f));
+		backing_file_free(backing_file(f));
 	} else {
 		kmem_cache_free(filp_cachep, f);
 	}
@@ -283,6 +288,12 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred)
 	return f;
 }
 
+static int init_backing_file(struct backing_file *ff)
+{
+	memset(&ff->user_path, 0, sizeof(ff->user_path));
+	return 0;
+}
+
 /*
  * Variant of alloc_empty_file() that allocates a backing_file container
  * and doesn't check and modify nr_files.
@@ -305,7 +316,14 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
 		return ERR_PTR(error);
 	}
 
+	/* The f_mode flags must be set before fput(). */
 	ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT;
+	error = init_backing_file(ff);
+	if (unlikely(error)) {
+		fput(&ff->file);
+		return ERR_PTR(error);
+	}
+
 	return &ff->file;
 }
 EXPORT_SYMBOL_GPL(alloc_empty_backing_file);
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 2/3] lsm: add backing_file LSM hooks
From: Paul Moore @ 2026-04-03  3:08 UTC (permalink / raw)
  To: linux-security-module, selinux, linux-fsdevel, linux-unionfs,
	linux-erofs
  Cc: Amir Goldstein, Gao Xiang, Christian Brauner
In-Reply-To: <20260403030848.731867-5-paul@paul-moore.com>

Stacked filesystems such as overlayfs do not currently provide the
necessary mechanisms for LSMs to properly enforce access controls on the
mmap() and mprotect() operations.  In order to resolve this gap, a LSM
security blob is being added to the backing_file struct and the following
new LSM hooks are being created:

 security_backing_file_alloc()
 security_backing_file_free()
 security_mmap_backing_file()

The first two hooks are to manage the lifecycle of the LSM security blob
in the backing_file struct, while the third provides a new mmap() access
control point for the underlying backing file.  It is also expected that
LSMs will likely want to update their security_file_mprotect() callback
to address issues with their mprotect() controls, but that does not
require a change to the security_file_mprotect() LSM hook.

There are a three other small changes to support these new LSM hooks:
* Pass the user file associated with a backing file down to
alloc_empty_backing_file() so it can be included in the
security_backing_file_alloc() hook.
* Add getter and setter functions for the backing_file struct LSM blob
as the backing_file struct remains private to fs/file_table.c.
* Constify the file struct field in the LSM common_audit_data struct to
better support LSMs that need to pass a const file struct pointer into
the common LSM audit code.

Thanks to Arnd Bergmann for identifying the missing EXPORT_SYMBOL_GPL()
and supplying a fixup.

Cc: stable@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: linux-erofs@lists.ozlabs.org
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 fs/backing-file.c             |  18 ++++--
 fs/erofs/ishare.c             |  10 +++-
 fs/file_table.c               |  27 +++++++--
 fs/fuse/passthrough.c         |   2 +-
 fs/internal.h                 |   3 +-
 fs/overlayfs/dir.c            |   2 +-
 fs/overlayfs/file.c           |   2 +-
 include/linux/backing-file.h  |   4 +-
 include/linux/fs.h            |  13 +++++
 include/linux/lsm_audit.h     |   2 +-
 include/linux/lsm_hook_defs.h |   5 ++
 include/linux/lsm_hooks.h     |   1 +
 include/linux/security.h      |  22 ++++++++
 security/lsm.h                |   1 +
 security/lsm_init.c           |   9 +++
 security/security.c           | 102 ++++++++++++++++++++++++++++++++++
 16 files changed, 206 insertions(+), 17 deletions(-)

diff --git a/fs/backing-file.c b/fs/backing-file.c
index 45da8600d564..1f3bbfc75882 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -12,6 +12,7 @@
 #include <linux/backing-file.h>
 #include <linux/splice.h>
 #include <linux/mm.h>
+#include <linux/security.h>
 
 #include "internal.h"
 
@@ -29,14 +30,15 @@
  * returned file into a container structure that also stores the stacked
  * file's path, which can be retrieved using backing_file_user_path().
  */
-struct file *backing_file_open(const struct path *user_path, int flags,
+struct file *backing_file_open(const struct file *user_file, int flags,
 			       const struct path *real_path,
 			       const struct cred *cred)
 {
+	const struct path *user_path = &user_file->f_path;
 	struct file *f;
 	int error;
 
-	f = alloc_empty_backing_file(flags, cred);
+	f = alloc_empty_backing_file(flags, cred, user_file);
 	if (IS_ERR(f))
 		return f;
 
@@ -52,15 +54,16 @@ struct file *backing_file_open(const struct path *user_path, int flags,
 }
 EXPORT_SYMBOL_GPL(backing_file_open);
 
-struct file *backing_tmpfile_open(const struct path *user_path, int flags,
+struct file *backing_tmpfile_open(const struct file *user_file, int flags,
 				  const struct path *real_parentpath,
 				  umode_t mode, const struct cred *cred)
 {
 	struct mnt_idmap *real_idmap = mnt_idmap(real_parentpath->mnt);
+	const struct path *user_path = &user_file->f_path;
 	struct file *f;
 	int error;
 
-	f = alloc_empty_backing_file(flags, cred);
+	f = alloc_empty_backing_file(flags, cred, user_file);
 	if (IS_ERR(f))
 		return f;
 
@@ -336,8 +339,13 @@ int backing_file_mmap(struct file *file, struct vm_area_struct *vma,
 
 	vma_set_file(vma, file);
 
-	scoped_with_creds(ctx->cred)
+	scoped_with_creds(ctx->cred) {
+		ret = security_mmap_backing_file(vma, file, user_file);
+		if (ret)
+			return ret;
+
 		ret = vfs_mmap(vma->vm_file, vma);
+	}
 
 	if (ctx->accessed)
 		ctx->accessed(user_file);
diff --git a/fs/erofs/ishare.c b/fs/erofs/ishare.c
index ec433bacc592..6ed66b17359b 100644
--- a/fs/erofs/ishare.c
+++ b/fs/erofs/ishare.c
@@ -4,6 +4,7 @@
  */
 #include <linux/xxhash.h>
 #include <linux/mount.h>
+#include <linux/security.h>
 #include "internal.h"
 #include "xattr.h"
 
@@ -106,7 +107,8 @@ static int erofs_ishare_file_open(struct inode *inode, struct file *file)
 
 	if (file->f_flags & O_DIRECT)
 		return -EINVAL;
-	realfile = alloc_empty_backing_file(O_RDONLY|O_NOATIME, current_cred());
+	realfile = alloc_empty_backing_file(O_RDONLY|O_NOATIME, current_cred(),
+					    file);
 	if (IS_ERR(realfile))
 		return PTR_ERR(realfile);
 	ihold(sharedinode);
@@ -150,8 +152,14 @@ static ssize_t erofs_ishare_file_read_iter(struct kiocb *iocb,
 static int erofs_ishare_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct file *realfile = file->private_data;
+	int err;
 
 	vma_set_file(vma, realfile);
+
+	err = security_mmap_backing_file(vma, realfile, file);
+	if (err)
+		return err;
+
 	return generic_file_readonly_mmap(file, vma);
 }
 
diff --git a/fs/file_table.c b/fs/file_table.c
index 3b3792903185..d19d879b6efc 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -50,6 +50,9 @@ struct backing_file {
 		struct path user_path;
 		freeptr_t bf_freeptr;
 	};
+#ifdef CONFIG_SECURITY
+	void *security;
+#endif
 };
 
 #define backing_file(f) container_of(f, struct backing_file, file)
@@ -66,8 +69,21 @@ void backing_file_set_user_path(struct file *f, const struct path *path)
 }
 EXPORT_SYMBOL_GPL(backing_file_set_user_path);
 
+#ifdef CONFIG_SECURITY
+void *backing_file_security(const struct file *f)
+{
+	return backing_file(f)->security;
+}
+
+void backing_file_set_security(struct file *f, void *security)
+{
+	backing_file(f)->security = security;
+}
+#endif /* CONFIG_SECURITY */
+
 static inline void backing_file_free(struct backing_file *ff)
 {
+	security_backing_file_free(&ff->file);
 	path_put(&ff->user_path);
 	kmem_cache_free(bfilp_cachep, ff);
 }
@@ -288,10 +304,12 @@ struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred)
 	return f;
 }
 
-static int init_backing_file(struct backing_file *ff)
+static int init_backing_file(struct backing_file *ff,
+			     const struct file *user_file)
 {
 	memset(&ff->user_path, 0, sizeof(ff->user_path));
-	return 0;
+	backing_file_set_security(&ff->file, NULL);
+	return security_backing_file_alloc(&ff->file, user_file);
 }
 
 /*
@@ -301,7 +319,8 @@ static int init_backing_file(struct backing_file *ff)
  * This is only for kernel internal use, and the allocate file must not be
  * installed into file tables or such.
  */
-struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred,
+				      const struct file *user_file)
 {
 	struct backing_file *ff;
 	int error;
@@ -318,7 +337,7 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
 
 	/* The f_mode flags must be set before fput(). */
 	ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT;
-	error = init_backing_file(ff);
+	error = init_backing_file(ff, user_file);
 	if (unlikely(error)) {
 		fput(&ff->file);
 		return ERR_PTR(error);
diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
index 72de97c03d0e..f2d08ac2459b 100644
--- a/fs/fuse/passthrough.c
+++ b/fs/fuse/passthrough.c
@@ -167,7 +167,7 @@ struct fuse_backing *fuse_passthrough_open(struct file *file, int backing_id)
 		goto out;
 
 	/* Allocate backing file per fuse file to store fuse path */
-	backing_file = backing_file_open(&file->f_path, file->f_flags,
+	backing_file = backing_file_open(file, file->f_flags,
 					 &fb->file->f_path, fb->cred);
 	err = PTR_ERR(backing_file);
 	if (IS_ERR(backing_file)) {
diff --git a/fs/internal.h b/fs/internal.h
index cbc384a1aa09..77e90e4124e0 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -106,7 +106,8 @@ extern void chroot_fs_refs(const struct path *, const struct path *);
  */
 struct file *alloc_empty_file(int flags, const struct cred *cred);
 struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
-struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred,
+				      const struct file *user_file);
 void backing_file_set_user_path(struct file *f, const struct path *path);
 
 static inline void file_put_write_access(struct file *file)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index ff3dbd1ca61f..f2f20a611af3 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -1374,7 +1374,7 @@ static int ovl_create_tmpfile(struct file *file, struct dentry *dentry,
 				return PTR_ERR(cred);
 
 			ovl_path_upper(dentry->d_parent, &realparentpath);
-			realfile = backing_tmpfile_open(&file->f_path, flags, &realparentpath,
+			realfile = backing_tmpfile_open(file, flags, &realparentpath,
 							mode, current_cred());
 			err = PTR_ERR_OR_ZERO(realfile);
 			pr_debug("tmpfile/open(%pd2, 0%o) = %i\n", realparentpath.dentry, mode, err);
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 97bed2286030..27cc07738f33 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -48,7 +48,7 @@ static struct file *ovl_open_realfile(const struct file *file,
 			if (!inode_owner_or_capable(real_idmap, realinode))
 				flags &= ~O_NOATIME;
 
-			realfile = backing_file_open(file_user_path(file),
+			realfile = backing_file_open(file,
 						     flags, realpath, current_cred());
 		}
 	}
diff --git a/include/linux/backing-file.h b/include/linux/backing-file.h
index 1476a6ed1bfd..c939cd222730 100644
--- a/include/linux/backing-file.h
+++ b/include/linux/backing-file.h
@@ -18,10 +18,10 @@ struct backing_file_ctx {
 	void (*end_write)(struct kiocb *iocb, ssize_t);
 };
 
-struct file *backing_file_open(const struct path *user_path, int flags,
+struct file *backing_file_open(const struct file *user_file, int flags,
 			       const struct path *real_path,
 			       const struct cred *cred);
-struct file *backing_tmpfile_open(const struct path *user_path, int flags,
+struct file *backing_tmpfile_open(const struct file *user_file, int flags,
 				  const struct path *real_parentpath,
 				  umode_t mode, const struct cred *cred);
 ssize_t backing_file_read_iter(struct file *file, struct iov_iter *iter,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8b3dd145b25e..d0d0e8f55589 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2475,6 +2475,19 @@ struct file *dentry_create(struct path *path, int flags, umode_t mode,
 			   const struct cred *cred);
 const struct path *backing_file_user_path(const struct file *f);
 
+#ifdef CONFIG_SECURITY
+void *backing_file_security(const struct file *f);
+void backing_file_set_security(struct file *f, void *security);
+#else
+static inline void *backing_file_security(const struct file *f)
+{
+	return NULL;
+}
+static inline void backing_file_set_security(struct file *f, void *security)
+{
+}
+#endif /* CONFIG_SECURITY */
+
 /*
  * When mmapping a file on a stackable filesystem (e.g., overlayfs), the file
  * stored in ->vm_file is a backing file whose f_inode is on the underlying
diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 382c56a97bba..584db296e43b 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -94,7 +94,7 @@ struct common_audit_data {
 #endif
 		char *kmod_name;
 		struct lsm_ioctlop_audit *op;
-		struct file *file;
+		const struct file *file;
 		struct lsm_ibpkey_audit *ibpkey;
 		struct lsm_ibendport_audit *ibendport;
 		int reason;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 8c42b4bde09c..b4958167e381 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -191,6 +191,9 @@ LSM_HOOK(int, 0, file_permission, struct file *file, int mask)
 LSM_HOOK(int, 0, file_alloc_security, struct file *file)
 LSM_HOOK(void, LSM_RET_VOID, file_release, struct file *file)
 LSM_HOOK(void, LSM_RET_VOID, file_free_security, struct file *file)
+LSM_HOOK(int, 0, backing_file_alloc, struct file *backing_file,
+	 const struct file *user_file)
+LSM_HOOK(void, LSM_RET_VOID, backing_file_free, struct file *backing_file)
 LSM_HOOK(int, 0, file_ioctl, struct file *file, unsigned int cmd,
 	 unsigned long arg)
 LSM_HOOK(int, 0, file_ioctl_compat, struct file *file, unsigned int cmd,
@@ -198,6 +201,8 @@ LSM_HOOK(int, 0, file_ioctl_compat, struct file *file, unsigned int cmd,
 LSM_HOOK(int, 0, mmap_addr, unsigned long addr)
 LSM_HOOK(int, 0, mmap_file, struct file *file, unsigned long reqprot,
 	 unsigned long prot, unsigned long flags)
+LSM_HOOK(int, 0, mmap_backing_file, struct vm_area_struct *vma,
+	 struct file *backing_file, struct file *user_file)
 LSM_HOOK(int, 0, file_mprotect, struct vm_area_struct *vma,
 	 unsigned long reqprot, unsigned long prot)
 LSM_HOOK(int, 0, file_lock, struct file *file, unsigned int cmd)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d48bf0ad26f4..b4f8cad53ddb 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -104,6 +104,7 @@ struct security_hook_list {
 struct lsm_blob_sizes {
 	unsigned int lbs_cred;
 	unsigned int lbs_file;
+	unsigned int lbs_backing_file;
 	unsigned int lbs_ib;
 	unsigned int lbs_inode;
 	unsigned int lbs_sock;
diff --git a/include/linux/security.h b/include/linux/security.h
index ee88dd2d2d1f..8d2d4856934e 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -472,11 +472,17 @@ int security_file_permission(struct file *file, int mask);
 int security_file_alloc(struct file *file);
 void security_file_release(struct file *file);
 void security_file_free(struct file *file);
+int security_backing_file_alloc(struct file *backing_file,
+				const struct file *user_file);
+void security_backing_file_free(struct file *backing_file);
 int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 int security_file_ioctl_compat(struct file *file, unsigned int cmd,
 			       unsigned long arg);
 int security_mmap_file(struct file *file, unsigned long prot,
 			unsigned long flags);
+int security_mmap_backing_file(struct vm_area_struct *vma,
+			       struct file *backing_file,
+			       struct file *user_file);
 int security_mmap_addr(unsigned long addr);
 int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot,
 			   unsigned long prot);
@@ -1141,6 +1147,15 @@ static inline void security_file_release(struct file *file)
 static inline void security_file_free(struct file *file)
 { }
 
+static inline int security_backing_file_alloc(struct file *backing_file,
+					      const struct file *user_file)
+{
+	return 0;
+}
+
+static inline void security_backing_file_free(struct file *backing_file)
+{ }
+
 static inline int security_file_ioctl(struct file *file, unsigned int cmd,
 				      unsigned long arg)
 {
@@ -1160,6 +1175,13 @@ static inline int security_mmap_file(struct file *file, unsigned long prot,
 	return 0;
 }
 
+static inline int security_mmap_backing_file(struct vm_area_struct *vma,
+					     struct file *backing_file,
+					     struct file *user_file)
+{
+	return 0;
+}
+
 static inline int security_mmap_addr(unsigned long addr)
 {
 	return cap_mmap_addr(addr);
diff --git a/security/lsm.h b/security/lsm.h
index db77cc83e158..32f808ad4335 100644
--- a/security/lsm.h
+++ b/security/lsm.h
@@ -29,6 +29,7 @@ extern struct lsm_blob_sizes blob_sizes;
 
 /* LSM blob caches */
 extern struct kmem_cache *lsm_file_cache;
+extern struct kmem_cache *lsm_backing_file_cache;
 extern struct kmem_cache *lsm_inode_cache;
 
 /* LSM blob allocators */
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..7c0fd17f1601 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -293,6 +293,8 @@ static void __init lsm_prepare(struct lsm_info *lsm)
 	blobs = lsm->blobs;
 	lsm_blob_size_update(&blobs->lbs_cred, &blob_sizes.lbs_cred);
 	lsm_blob_size_update(&blobs->lbs_file, &blob_sizes.lbs_file);
+	lsm_blob_size_update(&blobs->lbs_backing_file,
+			     &blob_sizes.lbs_backing_file);
 	lsm_blob_size_update(&blobs->lbs_ib, &blob_sizes.lbs_ib);
 	/* inode blob gets an rcu_head in addition to LSM blobs. */
 	if (blobs->lbs_inode && blob_sizes.lbs_inode == 0)
@@ -441,6 +443,8 @@ int __init security_init(void)
 	if (lsm_debug) {
 		lsm_pr("blob(cred) size %d\n", blob_sizes.lbs_cred);
 		lsm_pr("blob(file) size %d\n", blob_sizes.lbs_file);
+		lsm_pr("blob(backing_file) size %d\n",
+		       blob_sizes.lbs_backing_file);
 		lsm_pr("blob(ib) size %d\n", blob_sizes.lbs_ib);
 		lsm_pr("blob(inode) size %d\n", blob_sizes.lbs_inode);
 		lsm_pr("blob(ipc) size %d\n", blob_sizes.lbs_ipc);
@@ -462,6 +466,11 @@ int __init security_init(void)
 		lsm_file_cache = kmem_cache_create("lsm_file_cache",
 						   blob_sizes.lbs_file, 0,
 						   SLAB_PANIC, NULL);
+	if (blob_sizes.lbs_backing_file)
+		lsm_backing_file_cache = kmem_cache_create(
+						   "lsm_backing_file_cache",
+						   blob_sizes.lbs_backing_file,
+						   0, SLAB_PANIC, NULL);
 	if (blob_sizes.lbs_inode)
 		lsm_inode_cache = kmem_cache_create("lsm_inode_cache",
 						    blob_sizes.lbs_inode, 0,
diff --git a/security/security.c b/security/security.c
index a26c1474e2e4..048560ef6a1a 100644
--- a/security/security.c
+++ b/security/security.c
@@ -82,6 +82,7 @@ const struct lsm_id *lsm_idlist[MAX_LSM_COUNT];
 struct lsm_blob_sizes blob_sizes;
 
 struct kmem_cache *lsm_file_cache;
+struct kmem_cache *lsm_backing_file_cache;
 struct kmem_cache *lsm_inode_cache;
 
 #define SECURITY_HOOK_ACTIVE_KEY(HOOK, IDX) security_hook_active_##HOOK##_##IDX
@@ -173,6 +174,30 @@ static int lsm_file_alloc(struct file *file)
 	return 0;
 }
 
+/**
+ * lsm_backing_file_alloc - allocate a composite backing file blob
+ * @backing_file: the backing file
+ *
+ * Allocate the backing file blob for all the modules.
+ *
+ * Returns 0, or -ENOMEM if memory can't be allocated.
+ */
+static int lsm_backing_file_alloc(struct file *backing_file)
+{
+	void *blob;
+
+	if (!lsm_backing_file_cache) {
+		backing_file_set_security(backing_file, NULL);
+		return 0;
+	}
+
+	blob = kmem_cache_zalloc(lsm_backing_file_cache, GFP_KERNEL);
+	backing_file_set_security(backing_file, blob);
+	if (!blob)
+		return -ENOMEM;
+	return 0;
+}
+
 /**
  * lsm_blob_alloc - allocate a composite blob
  * @dest: the destination for the blob
@@ -2418,6 +2443,57 @@ void security_file_free(struct file *file)
 	}
 }
 
+/**
+ * security_backing_file_alloc() - Allocate and setup a backing file blob
+ * @backing_file: the backing file
+ * @user_file: the associated user visible file
+ *
+ * Allocate a backing file LSM blob and perform any necessary initialization of
+ * the LSM blob.  There will be some operations where the LSM will not have
+ * access to @user_file after this point, so any important state associated
+ * with @user_file that is important to the LSM should be captured in the
+ * backing file's LSM blob.
+ *
+ * LSM's should avoid taking a reference to @user_file in this hook as it will
+ * result in problems later when the system attempts to drop/put the file
+ * references due to a circular dependency.
+ *
+ * Return: Return 0 if the hook is successful, negative values otherwise.
+ */
+int security_backing_file_alloc(struct file *backing_file,
+				const struct file *user_file)
+{
+	int rc;
+
+	rc = lsm_backing_file_alloc(backing_file);
+	if (rc)
+		return rc;
+	rc = call_int_hook(backing_file_alloc, backing_file, user_file);
+	if (unlikely(rc))
+		security_backing_file_free(backing_file);
+
+	return rc;
+}
+
+/**
+ * security_backing_file_free() - Free a backing file blob
+ * @backing_file: the backing file
+ *
+ * Free any LSM state associate with a backing file's LSM blob, including the
+ * blob itself.
+ */
+void security_backing_file_free(struct file *backing_file)
+{
+	void *blob = backing_file_security(backing_file);
+
+	call_void_hook(backing_file_free, backing_file);
+
+	if (blob) {
+		backing_file_set_security(backing_file, NULL);
+		kmem_cache_free(lsm_backing_file_cache, blob);
+	}
+}
+
 /**
  * security_file_ioctl() - Check if an ioctl is allowed
  * @file: associated file
@@ -2506,6 +2582,32 @@ int security_mmap_file(struct file *file, unsigned long prot,
 			     flags);
 }
 
+/**
+ * security_mmap_backing_file - Check if mmap'ing a backing file is allowed
+ * @vma: the vm_area_struct for the mmap'd region
+ * @backing_file: the backing file being mmap'd
+ * @user_file: the user file being mmap'd
+ *
+ * Check permissions for a mmap operation on a stacked filesystem.  This hook
+ * is called after the security_mmap_file() and is responsible for authorizing
+ * the mmap on @backing_file.  It is important to note that the mmap operation
+ * on @user_file has already been authorized and the @vma->vm_file has been
+ * set to @backing_file.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mmap_backing_file(struct vm_area_struct *vma,
+			       struct file *backing_file,
+			       struct file *user_file)
+{
+	/* recommended by the stackable filesystem devs */
+	if (WARN_ON_ONCE(!(backing_file->f_mode & FMODE_BACKING)))
+		return -EIO;
+
+	return call_int_hook(mmap_backing_file, vma, backing_file, user_file);
+}
+EXPORT_SYMBOL_GPL(security_mmap_backing_file);
+
 /**
  * security_mmap_addr() - Check if mmap'ing an address is allowed
  * @addr: address
-- 
2.53.0


^ permalink raw reply related

* [PATCH v4 3/3] selinux: fix overlayfs mmap() and mprotect() access checks
From: Paul Moore @ 2026-04-03  3:08 UTC (permalink / raw)
  To: linux-security-module, selinux, linux-fsdevel, linux-unionfs,
	linux-erofs
  Cc: Amir Goldstein, Gao Xiang, Christian Brauner
In-Reply-To: <20260403030848.731867-5-paul@paul-moore.com>

The existing SELinux security model for overlayfs is to allow access if
the current task is able to access the top level file (the "user" file)
and the mounter's credentials are sufficient to access the lower
level file (the "backing" file).  Unfortunately, the current code does
not properly enforce these access controls for both mmap() and mprotect()
operations on overlayfs filesystems.

This patch makes use of the newly created security_mmap_backing_file()
LSM hook to provide the missing backing file enforcement for mmap()
operations, and leverages the backing file API and new LSM blob to
provide the necessary information to properly enforce the mprotect()
access controls.

Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 security/selinux/hooks.c          | 256 +++++++++++++++++++++---------
 security/selinux/include/objsec.h |  11 ++
 2 files changed, 196 insertions(+), 71 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index d8224ea113d1..76e0fb7dcb36 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1745,6 +1745,60 @@ static inline int file_path_has_perm(const struct cred *cred,
 static int bpf_fd_pass(const struct file *file, u32 sid);
 #endif
 
+static int __file_has_perm(const struct cred *cred, const struct file *file,
+			   u32 av, bool bf_user_file)
+
+{
+	struct common_audit_data ad;
+	struct inode *inode;
+	u32 ssid = cred_sid(cred);
+	u32 tsid_fd;
+	int rc;
+
+	if (bf_user_file) {
+		struct backing_file_security_struct *bfsec;
+		const struct path *path;
+
+		if (WARN_ON(!(file->f_mode & FMODE_BACKING)))
+			return -EIO;
+
+		bfsec = selinux_backing_file(file);
+		path = backing_file_user_path(file);
+		tsid_fd = bfsec->uf_sid;
+		inode = d_inode(path->dentry);
+
+		ad.type = LSM_AUDIT_DATA_PATH;
+		ad.u.path = *path;
+	} else {
+		struct file_security_struct *fsec = selinux_file(file);
+
+		tsid_fd = fsec->sid;
+		inode = file_inode(file);
+
+		ad.type = LSM_AUDIT_DATA_FILE;
+		ad.u.file = file;
+	}
+
+	if (ssid != tsid_fd) {
+		rc = avc_has_perm(ssid, tsid_fd, SECCLASS_FD, FD__USE, &ad);
+		if (rc)
+			return rc;
+	}
+
+#ifdef CONFIG_BPF_SYSCALL
+	/* regardless of backing vs user file, use the underlying file here */
+	rc = bpf_fd_pass(file, ssid);
+	if (rc)
+		return rc;
+#endif
+
+	/* av is zero if only checking access to the descriptor. */
+	if (av)
+		return inode_has_perm(cred, inode, av, &ad);
+
+	return 0;
+}
+
 /* Check whether a task can use an open file descriptor to
    access an inode in a given way.  Check access to the
    descriptor itself, and then use dentry_has_perm to
@@ -1753,41 +1807,10 @@ static int bpf_fd_pass(const struct file *file, u32 sid);
    has the same SID as the process.  If av is zero, then
    access to the file is not checked, e.g. for cases
    where only the descriptor is affected like seek. */
-static int file_has_perm(const struct cred *cred,
-			 struct file *file,
-			 u32 av)
+static inline int file_has_perm(const struct cred *cred,
+				const struct file *file, u32 av)
 {
-	struct file_security_struct *fsec = selinux_file(file);
-	struct inode *inode = file_inode(file);
-	struct common_audit_data ad;
-	u32 sid = cred_sid(cred);
-	int rc;
-
-	ad.type = LSM_AUDIT_DATA_FILE;
-	ad.u.file = file;
-
-	if (sid != fsec->sid) {
-		rc = avc_has_perm(sid, fsec->sid,
-				  SECCLASS_FD,
-				  FD__USE,
-				  &ad);
-		if (rc)
-			goto out;
-	}
-
-#ifdef CONFIG_BPF_SYSCALL
-	rc = bpf_fd_pass(file, cred_sid(cred));
-	if (rc)
-		return rc;
-#endif
-
-	/* av is zero if only checking access to the descriptor. */
-	rc = 0;
-	if (av)
-		rc = inode_has_perm(cred, inode, av, &ad);
-
-out:
-	return rc;
+	return __file_has_perm(cred, file, av, false);
 }
 
 /*
@@ -3825,6 +3848,17 @@ static int selinux_file_alloc_security(struct file *file)
 	return 0;
 }
 
+static int selinux_backing_file_alloc(struct file *backing_file,
+				      const struct file *user_file)
+{
+	struct backing_file_security_struct *bfsec;
+
+	bfsec = selinux_backing_file(backing_file);
+	bfsec->uf_sid = selinux_file(user_file)->sid;
+
+	return 0;
+}
+
 /*
  * Check whether a task has the ioctl permission and cmd
  * operation to an inode.
@@ -3942,42 +3976,55 @@ static int selinux_file_ioctl_compat(struct file *file, unsigned int cmd,
 
 static int default_noexec __ro_after_init;
 
-static int file_map_prot_check(struct file *file, unsigned long prot, int shared)
+static int __file_map_prot_check(const struct cred *cred,
+				 const struct file *file, unsigned long prot,
+				 bool shared, bool bf_user_file)
 {
-	const struct cred *cred = current_cred();
-	u32 sid = cred_sid(cred);
-	int rc = 0;
+	struct inode *inode = NULL;
+	bool prot_exec = prot & PROT_EXEC;
+	bool prot_write = prot & PROT_WRITE;
+
+	if (file) {
+		if (bf_user_file)
+			inode = d_inode(backing_file_user_path(file)->dentry);
+		else
+			inode = file_inode(file);
+	}
+
+	if (default_noexec && prot_exec &&
+	    (!file || IS_PRIVATE(inode) || (!shared && prot_write))) {
+		int rc;
+		u32 sid = cred_sid(cred);
 
-	if (default_noexec &&
-	    (prot & PROT_EXEC) && (!file || IS_PRIVATE(file_inode(file)) ||
-				   (!shared && (prot & PROT_WRITE)))) {
 		/*
-		 * We are making executable an anonymous mapping or a
-		 * private file mapping that will also be writable.
-		 * This has an additional check.
+		 * We are making executable an anonymous mapping or a private
+		 * file mapping that will also be writable.
 		 */
-		rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
-				  PROCESS__EXECMEM, NULL);
+		rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, PROCESS__EXECMEM,
+				  NULL);
 		if (rc)
-			goto error;
+			return rc;
 	}
 
 	if (file) {
-		/* read access is always possible with a mapping */
+		/* "read" always possible, "write" only if shared */
 		u32 av = FILE__READ;
-
-		/* write access only matters if the mapping is shared */
-		if (shared && (prot & PROT_WRITE))
+		if (shared && prot_write)
 			av |= FILE__WRITE;
-
-		if (prot & PROT_EXEC)
+		if (prot_exec)
 			av |= FILE__EXECUTE;
 
-		return file_has_perm(cred, file, av);
+		return __file_has_perm(cred, file, av, bf_user_file);
 	}
 
-error:
-	return rc;
+	return 0;
+}
+
+static inline int file_map_prot_check(const struct cred *cred,
+				      const struct file *file,
+				      unsigned long prot, bool shared)
+{
+	return __file_map_prot_check(cred, file, prot, shared, false);
 }
 
 static int selinux_mmap_addr(unsigned long addr)
@@ -3993,36 +4040,80 @@ static int selinux_mmap_addr(unsigned long addr)
 	return rc;
 }
 
-static int selinux_mmap_file(struct file *file,
-			     unsigned long reqprot __always_unused,
-			     unsigned long prot, unsigned long flags)
+static int selinux_mmap_file_common(const struct cred *cred, struct file *file,
+				    unsigned long prot, bool shared)
 {
-	struct common_audit_data ad;
-	int rc;
-
 	if (file) {
+		int rc;
+		struct common_audit_data ad;
+
 		ad.type = LSM_AUDIT_DATA_FILE;
 		ad.u.file = file;
-		rc = inode_has_perm(current_cred(), file_inode(file),
-				    FILE__MAP, &ad);
+		rc = inode_has_perm(cred, file_inode(file), FILE__MAP, &ad);
 		if (rc)
 			return rc;
 	}
 
-	return file_map_prot_check(file, prot,
-				   (flags & MAP_TYPE) == MAP_SHARED);
+	return file_map_prot_check(cred, file, prot, shared);
+}
+
+static int selinux_mmap_file(struct file *file,
+			     unsigned long reqprot __always_unused,
+			     unsigned long prot, unsigned long flags)
+{
+	return selinux_mmap_file_common(current_cred(), file, prot,
+					(flags & MAP_TYPE) == MAP_SHARED);
+}
+
+/**
+ * selinux_mmap_backing_file - Check mmap permissions on a backing file
+ * @vma: memory region
+ * @backing_file: stacked filesystem backing file
+ * @user_file: user visible file
+ *
+ * This is called after selinux_mmap_file() on stacked filesystems, and it
+ * is this function's responsibility to verify access to @backing_file and
+ * setup the SELinux state for possible later use in the mprotect() code path.
+ *
+ * By the time this function is called, mmap() access to @user_file has already
+ * been authorized and @vma->vm_file has been set to point to @backing_file.
+ *
+ * Return zero on success, negative values otherwise.
+ */
+static int selinux_mmap_backing_file(struct vm_area_struct *vma,
+				     struct file *backing_file,
+				     struct file *user_file __always_unused)
+{
+	unsigned long prot = 0;
+
+	/* translate vma->vm_flags perms into PROT perms */
+	if (vma->vm_flags & VM_READ)
+		prot |= PROT_READ;
+	if (vma->vm_flags & VM_WRITE)
+		prot |= PROT_WRITE;
+	if (vma->vm_flags & VM_EXEC)
+		prot |= PROT_EXEC;
+
+	return selinux_mmap_file_common(backing_file->f_cred, backing_file,
+					prot, vma->vm_flags & VM_SHARED);
 }
 
 static int selinux_file_mprotect(struct vm_area_struct *vma,
 				 unsigned long reqprot __always_unused,
 				 unsigned long prot)
 {
+	int rc;
 	const struct cred *cred = current_cred();
 	u32 sid = cred_sid(cred);
+	const struct file *file = vma->vm_file;
+	bool backing_file;
+	bool shared = vma->vm_flags & VM_SHARED;
+
+	/* check if we need to trigger the "backing files are awful" mode */
+	backing_file = file && (file->f_mode & FMODE_BACKING);
 
 	if (default_noexec &&
 	    (prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) {
-		int rc = 0;
 		/*
 		 * We don't use the vma_is_initial_heap() helper as it has
 		 * a history of problems and is currently broken on systems
@@ -4036,11 +4127,15 @@ static int selinux_file_mprotect(struct vm_area_struct *vma,
 		    vma->vm_end <= vma->vm_mm->brk) {
 			rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
 					  PROCESS__EXECHEAP, NULL);
-		} else if (!vma->vm_file && (vma_is_initial_stack(vma) ||
+			if (rc)
+				return rc;
+		} else if (!file && (vma_is_initial_stack(vma) ||
 			    vma_is_stack_for_current(vma))) {
 			rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
 					  PROCESS__EXECSTACK, NULL);
-		} else if (vma->vm_file && vma->anon_vma) {
+			if (rc)
+				return rc;
+		} else if (file && vma->anon_vma) {
 			/*
 			 * We are making executable a file mapping that has
 			 * had some COW done. Since pages might have been
@@ -4048,13 +4143,29 @@ static int selinux_file_mprotect(struct vm_area_struct *vma,
 			 * modified content.  This typically should only
 			 * occur for text relocations.
 			 */
-			rc = file_has_perm(cred, vma->vm_file, FILE__EXECMOD);
+			rc = __file_has_perm(cred, file, FILE__EXECMOD,
+					     backing_file);
+			if (rc)
+				return rc;
+			if (backing_file) {
+				rc = file_has_perm(file->f_cred, file,
+						   FILE__EXECMOD);
+				if (rc)
+					return rc;
+			}
 		}
+	}
+
+	rc = __file_map_prot_check(cred, file, prot, shared, backing_file);
+	if (rc)
+		return rc;
+	if (backing_file) {
+		rc = file_map_prot_check(file->f_cred, file, prot, shared);
 		if (rc)
 			return rc;
 	}
 
-	return file_map_prot_check(vma->vm_file, prot, vma->vm_flags&VM_SHARED);
+	return 0;
 }
 
 static int selinux_file_lock(struct file *file, unsigned int cmd)
@@ -7393,6 +7504,7 @@ struct lsm_blob_sizes selinux_blob_sizes __ro_after_init = {
 	.lbs_cred = sizeof(struct cred_security_struct),
 	.lbs_task = sizeof(struct task_security_struct),
 	.lbs_file = sizeof(struct file_security_struct),
+	.lbs_backing_file = sizeof(struct backing_file_security_struct),
 	.lbs_inode = sizeof(struct inode_security_struct),
 	.lbs_ipc = sizeof(struct ipc_security_struct),
 	.lbs_key = sizeof(struct key_security_struct),
@@ -7498,9 +7610,11 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 
 	LSM_HOOK_INIT(file_permission, selinux_file_permission),
 	LSM_HOOK_INIT(file_alloc_security, selinux_file_alloc_security),
+	LSM_HOOK_INIT(backing_file_alloc, selinux_backing_file_alloc),
 	LSM_HOOK_INIT(file_ioctl, selinux_file_ioctl),
 	LSM_HOOK_INIT(file_ioctl_compat, selinux_file_ioctl_compat),
 	LSM_HOOK_INIT(mmap_file, selinux_mmap_file),
+	LSM_HOOK_INIT(mmap_backing_file, selinux_mmap_backing_file),
 	LSM_HOOK_INIT(mmap_addr, selinux_mmap_addr),
 	LSM_HOOK_INIT(file_mprotect, selinux_file_mprotect),
 	LSM_HOOK_INIT(file_lock, selinux_file_lock),
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 5bddd28ea5cb..b19e5d978e82 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -88,6 +88,10 @@ struct file_security_struct {
 	u32 pseqno; /* Policy seqno at the time of file open */
 };
 
+struct backing_file_security_struct {
+	u32 uf_sid; /* associated user file fsec->sid */
+};
+
 struct superblock_security_struct {
 	u32 sid; /* SID of file system superblock */
 	u32 def_sid; /* default SID for labeling */
@@ -195,6 +199,13 @@ static inline struct file_security_struct *selinux_file(const struct file *file)
 	return file->f_security + selinux_blob_sizes.lbs_file;
 }
 
+static inline struct backing_file_security_struct *
+selinux_backing_file(const struct file *backing_file)
+{
+	void *blob = backing_file_security(backing_file);
+	return blob + selinux_blob_sizes.lbs_backing_file;
+}
+
 static inline struct inode_security_struct *
 selinux_inode(const struct inode *inode)
 {
-- 
2.53.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox