Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: LSM namespacing API
From: Paul Moore @ 2026-04-02 21:04 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Stephen Smalley, Ondrej Mosnacek, linux-security-module, selinux,
	John Johansen
In-Reply-To: <ac5MKr4lFQhc44i6@wind.enjellic.com>

On Thu, Apr 2, 2026 at 7:00 AM Dr. Greg <greg@enjellic.com> wrote:
> On Sun, Mar 29, 2026 at 08:56:37PM -0400, Paul Moore wrote:
> > On Sun, Mar 29, 2026 at 12:09???PM Dr. Greg <greg@enjellic.com> wrote:
> > > On Tue, Mar 24, 2026 at 05:31:09PM -0400, Paul Moore wrote:
> > > > On Tue, Mar 3, 2026 at 11:46???AM Paul Moore <paul@paul-moore.com> wrote:

...

> Christian had proposed patches for a generic mechanism to create
> LSM security namespace blobs, is implementation of that in scope for
> this effort?

That isn't what Christian proposed, although I can understand how a
quick glance at the patchset would lead you to believe that (I had the
same misunderstanding while skimming my inbox on my phone while
traveling).  I suggest reviewing Christian's post again as well as the
related Landlock patchset which is the first to use the hooks
Christian proposed.

> > > It would seem that the flags variable might be a good option to use to
> > > handle this 2-stage transition, for example LSM_NS_INIT and
> > > LSM_NS_CHANGE, respectively, to specify the initialization and
> > > execution phases of the transition.
>
> > No.  The lsm_unshare() syscall is intended to mimic the existing
> > unshare() syscall as a single step process from a user's
> > perspective.  If it returns successfully the caller will be in a new
> > LSM namespace as defined by the individual LSM specified in the
> > syscall.
>
> OK, we can reason forward with that paradigm.
>
> An orchestrator issues the unshare call for an LSM namespace and upon
> return from the system call the calling task is in a new namespace for
> that particular LSM ...

Yes.

> ... the goal of which is presumably to implement a
> security policy/model different than what had been in force
> previously.

Maybe.  That is dependent on the individual LSM, I don't want to
encode any assumptions on this at the LSM framework layer.

> So the process is in a new LSM specific namespace, but still
> implementing the policy from the previous namespace, until the
> orchestrator can load the new policy and then trigger the LSM to
> change from its previous policy to the newly loaded policy.
>
> Is this consistent with your vision as to how all of this will work?

No.  What an individual LSM does upon creation of a new namespace via
lsm_unshare() is entirely up to that LSM.  The LSM may choose to bound
the new namespace by the parent's policy, or it may choose a
non-hierarchical relationship where the new namespace remains entirely
separate from the parent.  The LSM may start the new namespace in an
uninitialized state (similar to early boot), initialized with a
default policy, initialized with the parent's policy, or something
else.

> > > The other unanswered issue, or perhaps we missed it, are the security
> > > controls that should be associated with the unshare call.
>
> > Each LSM is free to implement whatever access controls it deems
> > necessary in its lsm_unshare() callback.
>
> Just to be clear.
>
> When you refer to 'lsm_unshare() callback' are you referring to a new
> LSM security hook to be be implemented that will allow all of the
> active LSM's to pass judgement on whether or not the unshare should be
> allowed to complete successfully?

No.  The lsm_unshare() callback is the individual LSM provided
function that the LSM framework calls when the lsm_unshare() syscall
is invoked.  Put another way, the lsm_unshare() callback is the
function specified by a LSM, using the LSM_HOOK_INIT() macro, that is
called by the lsm_unshare() syscall.

> > > Will there be a new LSM hook that allows other LSM's to veto the
> > > creation of a namespace either for itself or for another LSM?
> >
> > I would expect the lsm_unshare() syscall to operate similarly to the
> > lsm_set_self_attr() syscall in this regard.
>
> The reference to handling this like lsm_set_self_attr() is unclear.
>
> With lsm_set_self_attr() there is no reason for another LSM to deny
> setting what is an LSM specific attribute, as you note above, each LSM
> gets to decide if the request to set an attribute for the LSM should
> be accepted or denied.

No.  LSM "A" gets to decide if LSM "A" can create a new namespace
using the lsm_unshare() syscall, LSM "B" does not get to enforce any
policy on LSM "A"'s decision.

> Since lsm_unshare() is changing the overall platform security state,
> it seems consistent with the design of the LSM for other LSM's to be
> able to veto this action.

No.  This is not consistent with either the design or general
conventions associated with LSM development.

> Once again, this seems like an action that would be consistent with
> the notion of the lockdown LSM,

No.

> > > Should there be an option to completely compile LSM namespaces out of
> > > the kernel?
>
> > That doesn't belong in the LSM framework layer, that is up to the
> > individual LSMs.
>
> You noted above the desire for lsm_unshare to be consistent with other
> namespaces.
>
> The current kernel paradigm is to allow classes of namespace
> resources, ie. CONFIG_UTS_NS, CONFIG_TIME_NS et.al., to be compiled in
> our out of the kernel.
>
> It seems that CONFIG_LSM_NS would be consistent with that model.

CONFIG_UTS_NS does not have multiple radically different
implementations underneath it.  Comparing any of the existing Kconfig
namespace knobs to what we are attempting to do with the LSM framework
is going to be difficult due to some inherent differences between the
two things.

The lsm_unshare() syscall is simply an API abstraction intended to
make it easier for userspace to interact with the individual LSMs;
instead of dealing with multiple different namespacing APIs, one for
each LSM, lsm_unshare() provides a single interface to make app devs'
lives easier.

If a individual LSM wants to provide a Kconfig knob to toggle their
namespace support they are welcome to do so, lsm_unshare() should
exist regardless and return an error code if the desired LSM does not
implement namespace support in the particular kernel build.

> > > > * Implement /proc/pid/ns/lsm and setns(CLONE_NEWLSM)
> > > >
> > > > As discussed previously, this allows us to move a process into an
> > > > existing, established LSM namespace set.  The caller cannot
> > > > selectively choose which individual LSM namespaces they join from the
> > > > given LSM namespace set, they receive the same LSM namespace
> > > > configuration as the target process.
> > >
> > > As an initial aside.  It would be assumed that a positive result of a
> > > setns call would be to cause the calling process to atomically change
> > > its security namespace set.  This would further suggest the need to
> > > have the security namespace creation process also execute atomically
> > > in a multi-LSM namespace change environment.
>
> > In the setns case no new LSM namespaces should be created, the process
> > simply joins an existing set of LSM namespaces.
>
> The issue isn't about new namespaces being created, the issue is
> atomicity of a change to a new set of security policies.
>
> With setns an atomic transition is implemented.
>
> The proposed lsm_unshare() behavior results in a period of time when
> multiple and varying security policies are active, depending on
> various race issues in the orchestrator implementation.
>
> This opens the door to a raft of potential security issues that we can
> have a new acronym for, Time Of Implementation Time Of Use (TOITOU).

I would expect that any LSM implementing namespaces would have
sufficient protections/locking in place to ensure that processes and
namespaces remain in a consistent state outside of the
protected/locked regions.  It is reasonable for one process to attempt
the creation of a new namespace while another attempts to join the
namespace of the process creating the new namespace.  This is not
really a new problem in systems programming, and is one reason why
synchronization mechanisms exist.  Once again, we do not want to force
any particular solution at the LSM framework layer as the
synchonization mechanisms will likely be very LSM dependent.

> > > ... That is the concept of whether or not a setns
> > > call, for any resource namespace, should also force a security
> > > namespace change if the security namespace of the calling process
> > > differs from that of the target process.
>
> > That decision is left to the individual LSMs.
>
> That is reasonable.
>
> In order to support that model, there would seem to be a need to have
> a new LSM call in the setns code that allows LSM's to determine
> whether or not a change in the active security namespace set should be
> forced, correct?

Possibly.  I think we need to see some RFC code to see how this would
look, but I think the LSM implementation inside the setns() syscall
would need to be done in two stages: the first to "prepare" the join
operation where permissions checks are performed (if desired by the
individual LSM) and any operations that could fail are done; the
second stage would be very basic and simply finish the join operation
without any risk of failure.  An individual LSM could fail the join
operation for a variety of reasons in stage 1, causing the entire
setns() operation to fail, but once we progress to stage 2 the
operation should succeed.

At this point I'm not too bothered by how we do this as it is an
implementation detail buried within the setns() implementation and not
really an API issue.  We could create a single LSM hook that is called
within sys_setns(), or we could leverage the existing two-stage
process within sys_setns() and implement the two LSM stages as two LSM
hooks.  The first option would be more complicated from a LSM
perspective, but cleaner from a nsproxy.c perspective (that alone
could make it the more preferable option).  The latter option would
result in cleaner, thinner LSM hooks, but it would likley add
complexity to ns_common and/or nsset.  As I said earlier, this is a
decision that will likely be decided by how the code ends up looking.

> If so, is implementation of this in scope for the lsm_unshare()
> infrastructure?

No.  The lsm_unshare() syscall would only operate on one LSM at a time
so a two stage process isn't needed at the LSM framework layer.  It is
possible that an individual LSM may want to implement a two-stage
transaction in their lsm_unshare() callback, but that is their
decision.

> To close, at the risk of being the devils advocate.
>
> Given that the sentiment is to force almost all of these
> issues/decisions into the individual LSM's, what is the advantage of
> having a common lsm_unshare() system call?

A single uniform API for userspace applications that wish to make use
of LSM namespaces.  Ideally we want to leverage the existing kernel
APIs, e.g. procfs and setns(), but others, e.g. clone(), remain
impractical due to a combination of technical and political reasons
(we've already discussed some of the former, the latter is a rathole
discussion I'm not going to engage in at the moment).

> In the proposed model, a resource orchestrator is going to need to
> have extensive knowledge over the mechanics of all the LSM's that
> implement namespace functionality.

Maybe.  I don't think orchestrators will need to have "extensive"
knowledge of the individual LSMs, although this largely depends on
what you define as "extensive".

I also want to get ahead of this and say that I have absolutely zero
desire to debate this point with you at the moment.  It's an argument
without end and the discussion is unlikely to yield anything specific
enough to be helpful.

> At a very minimum, intrinsic to
> the concept of security namespaces, there will be a need to load a new
> policy or model into the namespace, an action that will be deeply LSM
> specific.

Possibly, as this is once again very LSM dependent.  Some LSMs may not
need a new policy loaded when they create a new namespace.

I will also, once again, point you at the LSM policy loading syscall
ideas.  While on hold, we've already discussed that they should be
namespace aware and potentially have the ability to trigger new LSM
namespace creation.

> At this point, the only common functionality may be the allocation of
> a new LSM namespace 'blob'.

Now you are starting to get it.  The LSM framework exists primarily as
a multiplexing layer hidden beneath an API.  Originally the API was
only for internal kernel users, but recently we started providing a
userspace syscall API.

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH v3 0/5] Fix Landlock audit test flakiness
From: Günther Noack @ 2026-04-02 20:57 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang
In-Reply-To: <20260402.eb5c4e85f472@gnoack.org>

On Thu, Apr 02, 2026 at 10:52:46PM +0200, Günther Noack wrote:
> My kernel config is this:
> 
>     make defconfig
>     make kvm_guest.config
>     KCONFIG_CONFIG="${KBUILD_OUTPUT}/.config" ./scripts/kconfig/merge_config.sh "${KBUILD_OUTPUT}/.config" tools/testing/selftests/landlock/config
>     make debug.config
>     echo "CONFIG_RANDOMIZE_BASE=n" >> "${KBUILD_OUTPUT}/.config"
>     make olddefconfig

P.S.: I should point out, everytime that I have observed these
flakiness problems with the audit tests, it was in this debug config.
I suspect that it adds delays in a way that makes it more likely.

–Günther

^ permalink raw reply

* Re: [PATCH v3 0/5] Fix Landlock audit test flakiness
From: Günther Noack @ 2026-04-02 20:52 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

Hello!

On Thu, Apr 02, 2026 at 09:26:01PM +0200, Mickaël Salaün wrote:
> This series fixes two classes of audit selftest failures plus two minor
> bugs in the audit test helpers.
> 
> The main issue is that domain deallocation audit records are emitted
> asynchronously from kworker threads and can arrive after a previous
> test's socket has been closed.  This causes two distinct failure modes:
> 
> - audit_match_record() picks up a stale deallocation record from a
>   previous test instead of the expected one, causing a domain ID
>   mismatch.  The audit.layers test (which reads 16 deallocation records
>   in sequence) is particularly vulnerable because the large read window
>   allows stale records to interleave.  Patch 4 fixes this by filtering
>   deallocation records by domain ID and skipping type-matching records
>   with wrong content patterns.
> 
> - audit_count_records() counts stale deallocation records from a
>   previous test, incrementing records.domain from the expected 0 to 1.
>   Patch 3 fixes this by draining stale records at audit_init() time and
>   removing records.domain == 0 checks that are not preceded by
>   audit_match_record() calls (which would consume stale records).
> 
> These races are more likely to manifest when additional instrumentation
> changes kworker timing in the deallocation path (e.g. with the upcoming
> Landlock tracepoints work).
> 
> The two minor fixes (patches 1-2) correct a snprintf truncation check
> off-by-one and socket file descriptor leaks on error paths in
> audit_init(), audit_init_with_exe_filter(), and audit_cleanup().
> Patch 5 fixes a __u64 format warning reported by the kbuild bot on
> powerpc64.
> 
> Patch 1 is an exact subset of the v1 combined patch, which is why it
> carries the Reviewed-by tag.  Patches 2 and 3 extend beyond what was in
> v1, so the Reviewed-by is not carried.  Patches 4 and 5 are new.
> 
> Changes since v2:
> https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
> - Patches 4-5: fix __u64 format warnings on powerpc64 (cast to unsigned
>   long long for %llx).  Patch 5 is new.
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - Split the combined drain fix into four separate patches.
> - Patch 2: extend fd leak fix to audit_init_with_exe_filter() and
>   audit_cleanup().
> - Patch 3: also remove domain checks from audit.trace and
>   scoped_audit.connect_to_child, document constraint, explain why a
>   longer drain timeout was rejected.
> - Patch 4: new, add domain ID filtering and timeout management to
>   matches_log_domain_deallocated(), skip stale records in
>   audit_match_record().
> 
> Mickaël Salaün (5):
>   selftests/landlock: Fix snprintf truncation checks in audit helpers
>   selftests/landlock: Fix socket file descriptor leaks in audit helpers
>   selftests/landlock: Drain stale audit records on init
>   selftests/landlock: Skip stale records in audit_match_record()
>   selftests/landlock: Fix format warning for __u64 in net_test
> 
>  tools/testing/selftests/landlock/audit.h      | 133 ++++++++++++++----
>  tools/testing/selftests/landlock/audit_test.c |  36 ++---
>  tools/testing/selftests/landlock/net_test.c   |   2 +-
>  .../testing/selftests/landlock/ptrace_test.c  |   1 -
>  .../landlock/scoped_abstract_unix_test.c      |   1 -
>  5 files changed, 119 insertions(+), 54 deletions(-)
> 
> -- 
> 2.53.0
> 

I am still getting flaky audit tests even with these patches, I am
afraid.  It differs which of these tests is flaking, some of them
still do, for example:

#  RUN           audit_layout1.remove_dir ...
# fs_test.c:7281:remove_dir:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.remove_dir", dir_s1d2) (-11)
# remove_dir: Test failed
#          ❌ FAIL  audit_layout1.remove_dir
not ok 191 audit_layout1.remove_dir
#  RUN           audit_layout1.read_dir ...
#            ✅ OK  audit_layout1.read_dir
ok 192 audit_layout1.read_dir
#  RUN           audit_layout1.read_file ...
#            ✅ OK  audit_layout1.read_file
ok 193 audit_layout1.read_file
#  RUN           audit_layout1.write_file ...
# fs_test.c:7221:write_file:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.write_file", file1_s1d1) (-11)
# fs_test.c:7224:write_file:Expected 0 (0) == records.access (1)
# write_file: Test failed
#          ❌ FAIL  audit_layout1.write_file
not ok 194 audit_layout1.write_file

My kernel config is this:

    make defconfig
    make kvm_guest.config
    KCONFIG_CONFIG="${KBUILD_OUTPUT}/.config" ./scripts/kconfig/merge_config.sh "${KBUILD_OUTPUT}/.config" tools/testing/selftests/landlock/config
    make debug.config
    echo "CONFIG_RANDOMIZE_BASE=n" >> "${KBUILD_OUTPUT}/.config"
    make olddefconfig

and then I run the selftests in Qemu with these flags:

qemu-system-x86_64 \
    -nographic \
    -m 4G \
    -enable-kvm \
    -append "console=ttyS0 lsm=landlock no_hash_pointers" \
    -kernel "${KBUILD_OUTPUT}/arch/x86/boot/bzImage" \
    -initrd "${INITRAMFS}"

This is using my own selftest runner scripts which builds an initramfs
with the statically linked selftests.

Do you have a hunch what might be missing there?  In the test run
above, I have applied your V4 patch set on top of the current master,
5619b098e2fbf3a23bf13d91897056a1fe238c6d ("Merge tag 'for-7.0-rc6-tag'
of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux").

–Günther

^ permalink raw reply

* Re: [PATCH v3 1/5] selftests/landlock: Fix snprintf truncation checks in audit helpers
From: Günther Noack @ 2026-04-02 20:30 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-2-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:02PM +0200, Mickaël Salaün wrote:
> snprintf() returns the number of characters that would have been
> written, excluding the terminating NUL byte.  When the output is
> truncated, this return value equals or exceeds the buffer size.  Fix
> matches_log_domain_allocated() and matches_log_domain_deallocated() to
> detect truncation with ">=" instead of ">".
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Reviewed-by: Günther Noack <gnoack@google.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - New patch (split from the drain fix).
> ---
>  tools/testing/selftests/landlock/audit.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 44eb433e9666..1049a0582af5 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
>  
>  	log_match_len =
>  		snprintf(log_match, sizeof(log_match), log_template, pid);
> -	if (log_match_len > sizeof(log_match))
> +	if (log_match_len >= sizeof(log_match))
>  		return -E2BIG;
>  
>  	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
> @@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocated(
>  
>  	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
>  				 num_denials);
> -	if (log_match_len > sizeof(log_match))
> +	if (log_match_len >= sizeof(log_match))
>  		return -E2BIG;
>  
>  	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

(I noticed the Reviewed-by tag was already there, re-sending to
confirm that this also applies to this subset of the original patch)

–Günther

^ permalink raw reply

* Re: [PATCH v3 3/5] selftests/landlock: Drain stale audit records on init
From: Günther Noack @ 2026-04-02 20:28 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-4-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:04PM +0200, Mickaël Salaün wrote:
> Non-audit Landlock tests generate audit records as side effects when
> audit_enabled is non-zero (e.g. from boot configuration).  These records
> accumulate in the kernel audit backlog while no audit daemon socket is
> open.  When the next test opens a new netlink socket and registers as
> the audit daemon, the stale backlog is delivered, causing baseline
> record count checks to fail spuriously.
> 
> Fix this by draining all pending records in audit_init() right after
> setting the receive timeout.  The 1-usec SO_RCVTIMEO causes audit_recv()
> to return -EAGAIN once the backlog is empty, naturally terminating the
> drain loop.
> 
> Domain deallocation records are emitted asynchronously from a work
> queue, so they may still arrive after the drain.  Remove records.domain
> == 0 checks that are not preceded by audit_match_record() calls, which
> would otherwise consume stale records before the count.  Document this
> constraint above audit_count_records().
> 
> Increasing the drain timeout to catch in-flight deallocation records was
> considered but rejected: a longer timeout adds latency to every
> audit_init() call even when no stale record is pending, and any fixed
> timeout is still not guaranteed to catch all records under load.
> Removing the unprotected checks is simpler and avoids the spurious
> failures.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - Also remove domain checks from audit.trace and
>   scoped_audit.connect_to_child.
> - Document records.domain == 0 constraint above
>   audit_count_records().
> - Explain why a longer drain timeout was rejected.
> - Drop Reviewed-by (new code comment not in v1).
> - Split snprintf and fd leak fixes into separate patches.
> ---
>  tools/testing/selftests/landlock/audit.h      | 19 +++++++++++++++++++
>  tools/testing/selftests/landlock/audit_test.c |  2 --
>  .../testing/selftests/landlock/ptrace_test.c  |  1 -
>  .../landlock/scoped_abstract_unix_test.c      |  1 -
>  4 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 6422943fc69e..74e1c3d763be 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -338,6 +338,15 @@ struct audit_records {
>  	size_t domain;
>  };
>  
> +/*
> + * WARNING: Do not assert records.domain == 0 without a preceding
> + * audit_match_record() call.  Domain deallocation records are emitted
> + * asynchronously from kworker threads and can arrive after the drain in
> + * audit_init(), corrupting the domain count.  A preceding audit_match_record()
> + * call consumes stale records while scanning, making the assertion safe in
> + * practice because stale deallocation records arrive before the expected access
> + * records.
> + */
>  static int audit_count_records(int audit_fd, struct audit_records *records)
>  {
>  	struct audit_message msg;
> @@ -393,6 +402,16 @@ static int audit_init(void)
>  		goto err_close;
>  	}
>  
> +	/*
> +	 * Drains stale audit records that accumulated in the kernel backlog
> +	 * while no audit daemon socket was open.  This happens when non-audit
> +	 * Landlock tests generate records while audit_enabled is non-zero (e.g.
> +	 * from boot configuration), or when domain deallocation records arrive
> +	 * asynchronously after a previous test's socket was closed.
> +	 */
> +	while (audit_recv(fd, NULL) == 0)
> +		;
> +
>  	return fd;
>  
>  err_close:
> diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
> index 46d02d49835a..f92ba6774faa 100644
> --- a/tools/testing/selftests/landlock/audit_test.c
> +++ b/tools/testing/selftests/landlock/audit_test.c
> @@ -412,7 +412,6 @@ TEST_F(audit_flags, signal)
>  		} else {
>  			EXPECT_EQ(1, records.access);
>  		}
> -		EXPECT_EQ(0, records.domain);
>  
>  		/* Updates filter rules to match the drop record. */
>  		set_cap(_metadata, CAP_AUDIT_CONTROL);
> @@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open)
>  	/* Tests that there was no denial until now. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> -	EXPECT_EQ(0, records.domain);
>  
>  	/*
>  	 * Wait for the child to do a first denied action by layer1 and
> diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
> index 4f64c90583cd..1b6c8b53bf33 100644
> --- a/tools/testing/selftests/landlock/ptrace_test.c
> +++ b/tools/testing/selftests/landlock/ptrace_test.c
> @@ -342,7 +342,6 @@ TEST_F(audit, trace)
>  	/* Makes sure there is no superfluous logged records. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> -	EXPECT_EQ(0, records.domain);
>  
>  	yama_ptrace_scope = get_yama_ptrace_scope();
>  	ASSERT_LE(0, yama_ptrace_scope);
> diff --git a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> index 72f97648d4a7..c47491d2d1c1 100644
> --- a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> +++ b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> @@ -312,7 +312,6 @@ TEST_F(scoped_audit, connect_to_child)
>  	/* Makes sure there is no superfluous logged records. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> -	EXPECT_EQ(0, records.domain);
>  
>  	ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
>  	ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH v3 2/5] selftests/landlock: Fix socket file descriptor leaks in audit helpers
From: Günther Noack @ 2026-04-02 20:25 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-3-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:03PM +0200, Mickaël Salaün wrote:
> audit_init() opens a netlink socket and configures it, but leaks the
> file descriptor if audit_set_status() or setsockopt() fails.  Fix this
> by jumping to an error path that closes the socket before returning.
> 
> Apply the same fix to audit_init_with_exe_filter(), which leaks the file
> descriptor from audit_init() if audit_init_filter_exe() or
> audit_filter_exe() fails, and to audit_cleanup(), which leaks it if
> audit_init_filter_exe() fails in FIXTURE_TEARDOWN_PARENT().
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - New patch (split from the drain fix, extended to
>   audit_init_with_exe_filter() and audit_cleanup()).
> ---
>  tools/testing/selftests/landlock/audit.h | 26 +++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 1049a0582af5..6422943fc69e 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -379,19 +379,25 @@ static int audit_init(void)
>  
>  	err = audit_set_status(fd, AUDIT_STATUS_ENABLED, 1);
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	err = audit_set_status(fd, AUDIT_STATUS_PID, getpid());
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	/* Sets a timeout for negative tests. */
>  	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
>  			 sizeof(audit_tv_default));
> -	if (err)
> -		return -errno;
> +	if (err) {
> +		err = -errno;
> +		goto err_close;
> +	}
>  
>  	return fd;
> +
> +err_close:
> +	close(fd);
> +	return err;
>  }
>  
>  static int audit_init_filter_exe(struct audit_filter *filter, const char *path)
> @@ -441,8 +447,10 @@ static int audit_cleanup(int audit_fd, struct audit_filter *filter)
>  
>  		filter = &new_filter;
>  		err = audit_init_filter_exe(filter, NULL);
> -		if (err)
> +		if (err) {
> +			close(audit_fd);
>  			return err;
> +		}
>  	}
>  
>  	/* Filters might not be in place. */
> @@ -468,11 +476,15 @@ static int audit_init_with_exe_filter(struct audit_filter *filter)
>  
>  	err = audit_init_filter_exe(filter, NULL);
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	err = audit_filter_exe(fd, filter, AUDIT_ADD_RULE);
>  	if (err)
> -		return err;
> +		goto err_close;
>  
>  	return fd;
> +
> +err_close:
> +	close(fd);
> +	return err;
>  }
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH v3 5/5] selftests/landlock: Fix format warning for __u64 in net_test
From: Günther Noack @ 2026-04-02 20:21 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang, stable, kernel test robot
In-Reply-To: <20260402192608.1458252-6-mic@digikod.net>

On Thu, Apr 02, 2026 at 09:26:06PM +0200, Mickaël Salaün wrote:
> On architectures where __u64 is unsigned long (e.g. powerpc64), using
> %llx to format a __u64 triggers a -Wformat warning because %llx expects
> unsigned long long.  Cast the argument to unsigned long long.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Fixes: a549d055a22e ("selftests/landlock: Add network tests")
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/r/202604020206.62zgOTeP-lkp@intel.com/
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v2:
> - New patch.
> ---
>  tools/testing/selftests/landlock/net_test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
> index b34b139b3f89..4c528154ea92 100644
> --- a/tools/testing/selftests/landlock/net_test.c
> +++ b/tools/testing/selftests/landlock/net_test.c
> @@ -1356,7 +1356,7 @@ TEST_F(mini, network_access_rights)
>  					    &net_port, 0))
>  		{
>  			TH_LOG("Failed to add rule with access 0x%llx: %s",
> -			       access, strerror(errno));
> +			       (unsigned long long)access, strerror(errno));
>  		}
>  	}
>  	EXPECT_EQ(0, close(ruleset_fd));
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* [PATCH v3 2/5] selftests/landlock: Fix socket file descriptor leaks in audit helpers
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

audit_init() opens a netlink socket and configures it, but leaks the
file descriptor if audit_set_status() or setsockopt() fails.  Fix this
by jumping to an error path that closes the socket before returning.

Apply the same fix to audit_init_with_exe_filter(), which leaks the file
descriptor from audit_init() if audit_init_filter_exe() or
audit_filter_exe() fails, and to audit_cleanup(), which leaks it if
audit_init_filter_exe() fails in FIXTURE_TEARDOWN_PARENT().

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- New patch (split from the drain fix, extended to
  audit_init_with_exe_filter() and audit_cleanup()).
---
 tools/testing/selftests/landlock/audit.h | 26 +++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 1049a0582af5..6422943fc69e 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -379,19 +379,25 @@ static int audit_init(void)
 
 	err = audit_set_status(fd, AUDIT_STATUS_ENABLED, 1);
 	if (err)
-		return err;
+		goto err_close;
 
 	err = audit_set_status(fd, AUDIT_STATUS_PID, getpid());
 	if (err)
-		return err;
+		goto err_close;
 
 	/* Sets a timeout for negative tests. */
 	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
 			 sizeof(audit_tv_default));
-	if (err)
-		return -errno;
+	if (err) {
+		err = -errno;
+		goto err_close;
+	}
 
 	return fd;
+
+err_close:
+	close(fd);
+	return err;
 }
 
 static int audit_init_filter_exe(struct audit_filter *filter, const char *path)
@@ -441,8 +447,10 @@ static int audit_cleanup(int audit_fd, struct audit_filter *filter)
 
 		filter = &new_filter;
 		err = audit_init_filter_exe(filter, NULL);
-		if (err)
+		if (err) {
+			close(audit_fd);
 			return err;
+		}
 	}
 
 	/* Filters might not be in place. */
@@ -468,11 +476,15 @@ static int audit_init_with_exe_filter(struct audit_filter *filter)
 
 	err = audit_init_filter_exe(filter, NULL);
 	if (err)
-		return err;
+		goto err_close;
 
 	err = audit_filter_exe(fd, filter, AUDIT_ADD_RULE);
 	if (err)
-		return err;
+		goto err_close;
 
 	return fd;
+
+err_close:
+	close(fd);
+	return err;
 }
-- 
2.53.0


^ permalink raw reply related

* Re: LSM namespacing API
From: Paul Moore @ 2026-04-02 19:31 UTC (permalink / raw)
  To: Dr. Greg
  Cc: Casey Schaufler, Stephen Smalley, Ondrej Mosnacek,
	linux-security-module, selinux, John Johansen
In-Reply-To: <5e210223-f9a4-4613-8c4b-bea5eea7f8c0@schaufler-ca.com>

On Thu, Apr 2, 2026 at 1:49 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/2/2026 3:59 AM, Dr. Greg wrote:
> > That still leaves the question of whether or not CAP_MAC_ADMIN is
> > appropriate for gating the creation of a new security namespace.
>
> That will have to be up to the individual LSMs.

Yes, exactly.

> Not all LSMs implement Mandatory Access Controls.

... and not all LSMs that implement mandatory access controls rely on
CAP_MAC_ADMIN to gate configuration changes.

-- 
paul-moore.com

^ permalink raw reply

* [PATCH v3 3/5] selftests/landlock: Drain stale audit records on init
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

Non-audit Landlock tests generate audit records as side effects when
audit_enabled is non-zero (e.g. from boot configuration).  These records
accumulate in the kernel audit backlog while no audit daemon socket is
open.  When the next test opens a new netlink socket and registers as
the audit daemon, the stale backlog is delivered, causing baseline
record count checks to fail spuriously.

Fix this by draining all pending records in audit_init() right after
setting the receive timeout.  The 1-usec SO_RCVTIMEO causes audit_recv()
to return -EAGAIN once the backlog is empty, naturally terminating the
drain loop.

Domain deallocation records are emitted asynchronously from a work
queue, so they may still arrive after the drain.  Remove records.domain
== 0 checks that are not preceded by audit_match_record() calls, which
would otherwise consume stale records before the count.  Document this
constraint above audit_count_records().

Increasing the drain timeout to catch in-flight deallocation records was
considered but rejected: a longer timeout adds latency to every
audit_init() call even when no stale record is pending, and any fixed
timeout is still not guaranteed to catch all records under load.
Removing the unprotected checks is simpler and avoids the spurious
failures.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- Also remove domain checks from audit.trace and
  scoped_audit.connect_to_child.
- Document records.domain == 0 constraint above
  audit_count_records().
- Explain why a longer drain timeout was rejected.
- Drop Reviewed-by (new code comment not in v1).
- Split snprintf and fd leak fixes into separate patches.
---
 tools/testing/selftests/landlock/audit.h      | 19 +++++++++++++++++++
 tools/testing/selftests/landlock/audit_test.c |  2 --
 .../testing/selftests/landlock/ptrace_test.c  |  1 -
 .../landlock/scoped_abstract_unix_test.c      |  1 -
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 6422943fc69e..74e1c3d763be 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -338,6 +338,15 @@ struct audit_records {
 	size_t domain;
 };
 
+/*
+ * WARNING: Do not assert records.domain == 0 without a preceding
+ * audit_match_record() call.  Domain deallocation records are emitted
+ * asynchronously from kworker threads and can arrive after the drain in
+ * audit_init(), corrupting the domain count.  A preceding audit_match_record()
+ * call consumes stale records while scanning, making the assertion safe in
+ * practice because stale deallocation records arrive before the expected access
+ * records.
+ */
 static int audit_count_records(int audit_fd, struct audit_records *records)
 {
 	struct audit_message msg;
@@ -393,6 +402,16 @@ static int audit_init(void)
 		goto err_close;
 	}
 
+	/*
+	 * Drains stale audit records that accumulated in the kernel backlog
+	 * while no audit daemon socket was open.  This happens when non-audit
+	 * Landlock tests generate records while audit_enabled is non-zero (e.g.
+	 * from boot configuration), or when domain deallocation records arrive
+	 * asynchronously after a previous test's socket was closed.
+	 */
+	while (audit_recv(fd, NULL) == 0)
+		;
+
 	return fd;
 
 err_close:
diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
index 46d02d49835a..f92ba6774faa 100644
--- a/tools/testing/selftests/landlock/audit_test.c
+++ b/tools/testing/selftests/landlock/audit_test.c
@@ -412,7 +412,6 @@ TEST_F(audit_flags, signal)
 		} else {
 			EXPECT_EQ(1, records.access);
 		}
-		EXPECT_EQ(0, records.domain);
 
 		/* Updates filter rules to match the drop record. */
 		set_cap(_metadata, CAP_AUDIT_CONTROL);
@@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open)
 	/* Tests that there was no denial until now. */
 	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
 	EXPECT_EQ(0, records.access);
-	EXPECT_EQ(0, records.domain);
 
 	/*
 	 * Wait for the child to do a first denied action by layer1 and
diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
index 4f64c90583cd..1b6c8b53bf33 100644
--- a/tools/testing/selftests/landlock/ptrace_test.c
+++ b/tools/testing/selftests/landlock/ptrace_test.c
@@ -342,7 +342,6 @@ TEST_F(audit, trace)
 	/* Makes sure there is no superfluous logged records. */
 	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
 	EXPECT_EQ(0, records.access);
-	EXPECT_EQ(0, records.domain);
 
 	yama_ptrace_scope = get_yama_ptrace_scope();
 	ASSERT_LE(0, yama_ptrace_scope);
diff --git a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
index 72f97648d4a7..c47491d2d1c1 100644
--- a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
+++ b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
@@ -312,7 +312,6 @@ TEST_F(scoped_audit, connect_to_child)
 	/* Makes sure there is no superfluous logged records. */
 	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
 	EXPECT_EQ(0, records.access);
-	EXPECT_EQ(0, records.domain);
 
 	ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
 	ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 0/5] Fix Landlock audit test flakiness
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang

This series fixes two classes of audit selftest failures plus two minor
bugs in the audit test helpers.

The main issue is that domain deallocation audit records are emitted
asynchronously from kworker threads and can arrive after a previous
test's socket has been closed.  This causes two distinct failure modes:

- audit_match_record() picks up a stale deallocation record from a
  previous test instead of the expected one, causing a domain ID
  mismatch.  The audit.layers test (which reads 16 deallocation records
  in sequence) is particularly vulnerable because the large read window
  allows stale records to interleave.  Patch 4 fixes this by filtering
  deallocation records by domain ID and skipping type-matching records
  with wrong content patterns.

- audit_count_records() counts stale deallocation records from a
  previous test, incrementing records.domain from the expected 0 to 1.
  Patch 3 fixes this by draining stale records at audit_init() time and
  removing records.domain == 0 checks that are not preceded by
  audit_match_record() calls (which would consume stale records).

These races are more likely to manifest when additional instrumentation
changes kworker timing in the deallocation path (e.g. with the upcoming
Landlock tracepoints work).

The two minor fixes (patches 1-2) correct a snprintf truncation check
off-by-one and socket file descriptor leaks on error paths in
audit_init(), audit_init_with_exe_filter(), and audit_cleanup().
Patch 5 fixes a __u64 format warning reported by the kbuild bot on
powerpc64.

Patch 1 is an exact subset of the v1 combined patch, which is why it
carries the Reviewed-by tag.  Patches 2 and 3 extend beyond what was in
v1, so the Reviewed-by is not carried.  Patches 4 and 5 are new.

Changes since v2:
https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
- Patches 4-5: fix __u64 format warnings on powerpc64 (cast to unsigned
  long long for %llx).  Patch 5 is new.

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- Split the combined drain fix into four separate patches.
- Patch 2: extend fd leak fix to audit_init_with_exe_filter() and
  audit_cleanup().
- Patch 3: also remove domain checks from audit.trace and
  scoped_audit.connect_to_child, document constraint, explain why a
  longer drain timeout was rejected.
- Patch 4: new, add domain ID filtering and timeout management to
  matches_log_domain_deallocated(), skip stale records in
  audit_match_record().

Mickaël Salaün (5):
  selftests/landlock: Fix snprintf truncation checks in audit helpers
  selftests/landlock: Fix socket file descriptor leaks in audit helpers
  selftests/landlock: Drain stale audit records on init
  selftests/landlock: Skip stale records in audit_match_record()
  selftests/landlock: Fix format warning for __u64 in net_test

 tools/testing/selftests/landlock/audit.h      | 133 ++++++++++++++----
 tools/testing/selftests/landlock/audit_test.c |  36 ++---
 tools/testing/selftests/landlock/net_test.c   |   2 +-
 .../testing/selftests/landlock/ptrace_test.c  |   1 -
 .../landlock/scoped_abstract_unix_test.c      |   1 -
 5 files changed, 119 insertions(+), 54 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH v3 5/5] selftests/landlock: Fix format warning for __u64 in net_test
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable, kernel test robot
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

On architectures where __u64 is unsigned long (e.g. powerpc64), using
%llx to format a __u64 triggers a -Wformat warning because %llx expects
unsigned long long.  Cast the argument to unsigned long long.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: a549d055a22e ("selftests/landlock: Add network tests")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202604020206.62zgOTeP-lkp@intel.com/
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v2:
- New patch.
---
 tools/testing/selftests/landlock/net_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index b34b139b3f89..4c528154ea92 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -1356,7 +1356,7 @@ TEST_F(mini, network_access_rights)
 					    &net_port, 0))
 		{
 			TH_LOG("Failed to add rule with access 0x%llx: %s",
-			       access, strerror(errno));
+			       (unsigned long long)access, strerror(errno));
 		}
 	}
 	EXPECT_EQ(0, close(ruleset_fd));
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 4/5] selftests/landlock: Skip stale records in audit_match_record()
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

Domain deallocation records are emitted asynchronously from kworker
threads (via free_ruleset_work()).  Stale deallocation records from a
previous test can arrive during the current test's deallocation read
loop and be picked up by audit_match_record() instead of the expected
record, causing a domain ID mismatch.  The audit.layers test (which
creates 16 nested domains) is particularly vulnerable because it reads
16 deallocation records in sequence, providing a large window for stale
records to interleave.

The same issue affects audit_flags.signal, where deallocation records
from a previous test (audit.layers) can leak into the next test and be
picked up by audit_match_record() instead of the expected record.

Fix this by continuing to read records when the type matches but the
content pattern does not.  Stale records are silently consumed, and the
loop only stops when both type and pattern match (or the socket times
out with -EAGAIN).

Additionally, extend matches_log_domain_deallocated() with an
expected_domain_id parameter.  When set, the regex pattern includes the
specific domain ID as a literal hex value, so that deallocation records
for a different domain do not match the pattern at all.  This handles
the case where the stale record has the same denial count as the
expected one (e.g. both have denials=1), which the type+pattern loop
alone cannot distinguish.  Callers that already know the expected domain
ID (from a prior denial or allocation record) now pass it to filter
precisely.

When expected_domain_id is set, matches_log_domain_deallocated() also
temporarily increases the socket timeout to audit_tv_dom_drop (1 second)
to wait for the asynchronous kworker deallocation, and restores
audit_tv_default afterward.  This removes the need for callers to manage
the timeout switch manually.

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v2:
https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
- Fix __u64 format warnings on powerpc64 (cast to unsigned long long
  for %llx).

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- New patch.
---
 tools/testing/selftests/landlock/audit.h      | 82 ++++++++++++++-----
 tools/testing/selftests/landlock/audit_test.c | 34 ++++----
 2 files changed, 77 insertions(+), 39 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 74e1c3d763be..834005b2b0f0 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -249,9 +249,9 @@ static __maybe_unused char *regex_escape(const char *const src, char *dst,
 static int audit_match_record(int audit_fd, const __u16 type,
 			      const char *const pattern, __u64 *domain_id)
 {
-	struct audit_message msg;
+	struct audit_message msg, last_mismatch = {};
 	int ret, err = 0;
-	bool matches_record = !type;
+	int num_type_match = 0;
 	regmatch_t matches[2];
 	regex_t regex;
 
@@ -259,21 +259,35 @@ static int audit_match_record(int audit_fd, const __u16 type,
 	if (ret)
 		return -EINVAL;
 
-	do {
+	/*
+	 * Reads records until one matches both the expected type and the
+	 * pattern.  Type-matching records with non-matching content are
+	 * silently consumed, which handles stale domain deallocation records
+	 * from a previous test emitted asynchronously by kworker threads.
+	 */
+	while (true) {
 		memset(&msg, 0, sizeof(msg));
 		err = audit_recv(audit_fd, &msg);
-		if (err)
+		if (err) {
+			if (num_type_match) {
+				printf("DATA: %s\n", last_mismatch.data);
+				printf("ERROR: %d record(s) matched type %u"
+				       " but not pattern: %s\n",
+				       num_type_match, type, pattern);
+			}
 			goto out;
+		}
 
-		if (msg.header.nlmsg_type == type)
-			matches_record = true;
-	} while (!matches_record);
+		if (type && msg.header.nlmsg_type != type)
+			continue;
 
-	ret = regexec(&regex, msg.data, ARRAY_SIZE(matches), matches, 0);
-	if (ret) {
-		printf("DATA: %s\n", msg.data);
-		printf("ERROR: no match for pattern: %s\n", pattern);
-		err = -ENOENT;
+		ret = regexec(&regex, msg.data, ARRAY_SIZE(matches), matches,
+			      0);
+		if (!ret)
+			break;
+
+		num_type_match++;
+		last_mismatch = msg;
 	}
 
 	if (domain_id) {
@@ -316,21 +330,49 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
 				  domain_id);
 }
 
-static int __maybe_unused matches_log_domain_deallocated(
-	int audit_fd, unsigned int num_denials, __u64 *domain_id)
+/*
+ * Matches a domain deallocation record.  When expected_domain_id is non-zero,
+ * the pattern includes the specific domain ID so that stale deallocation
+ * records from a previous test (with a different domain ID) are skipped by
+ * audit_match_record(), and the socket timeout is temporarily increased to
+ * audit_tv_dom_drop to wait for the asynchronous kworker deallocation.
+ */
+static int __maybe_unused
+matches_log_domain_deallocated(int audit_fd, unsigned int num_denials,
+			       __u64 expected_domain_id, __u64 *domain_id)
 {
 	static const char log_template[] = REGEX_LANDLOCK_PREFIX
 		" status=deallocated denials=%u$";
-	char log_match[sizeof(log_template) + 10];
-	int log_match_len;
+	static const char log_template_with_id[] =
+		"^audit([0-9.:]\\+): domain=\\(%llx\\)"
+		" status=deallocated denials=%u$";
+	char log_match[sizeof(log_template_with_id) + 32];
+	int log_match_len, err;
+
+	if (expected_domain_id)
+		log_match_len = snprintf(log_match, sizeof(log_match),
+					 log_template_with_id,
+					 (unsigned long long)expected_domain_id,
+					 num_denials);
+	else
+		log_match_len = snprintf(log_match, sizeof(log_match),
+					 log_template, num_denials);
 
-	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
-				 num_denials);
 	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
-	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
-				  domain_id);
+	if (expected_domain_id)
+		setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO,
+			   &audit_tv_dom_drop, sizeof(audit_tv_dom_drop));
+
+	err = audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
+				 domain_id);
+
+	if (expected_domain_id)
+		setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
+			   sizeof(audit_tv_default));
+
+	return err;
 }
 
 struct audit_records {
diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
index f92ba6774faa..60de97bd0153 100644
--- a/tools/testing/selftests/landlock/audit_test.c
+++ b/tools/testing/selftests/landlock/audit_test.c
@@ -139,23 +139,24 @@ TEST_F(audit, layers)
 	    WEXITSTATUS(status) != EXIT_SUCCESS)
 		_metadata->exit_code = KSFT_FAIL;
 
-	/* Purges log from deallocated domains. */
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_dom_drop, sizeof(audit_tv_dom_drop)));
+	/*
+	 * Purges log from deallocated domains.  Records arrive in LIFO order
+	 * (innermost domain first) because landlock_put_hierarchy() walks the
+	 * chain sequentially in a single kworker context.
+	 */
 	for (i = ARRAY_SIZE(*domain_stack) - 1; i >= 0; i--) {
 		__u64 deallocated_dom = 2;
 
 		EXPECT_EQ(0, matches_log_domain_deallocated(self->audit_fd, 1,
+							    (*domain_stack)[i],
 							    &deallocated_dom));
 		EXPECT_EQ((*domain_stack)[i], deallocated_dom)
 		{
 			TH_LOG("Failed to match domain %llx (#%d)",
-			       (*domain_stack)[i], i);
+			       (unsigned long long)(*domain_stack)[i], i);
 		}
 	}
 	EXPECT_EQ(0, munmap(domain_stack, sizeof(*domain_stack)));
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_default, sizeof(audit_tv_default)));
 	EXPECT_EQ(0, close(ruleset_fd));
 }
 
@@ -270,13 +271,9 @@ TEST_F(audit, thread)
 	EXPECT_EQ(0, close(pipe_parent[1]));
 	ASSERT_EQ(0, pthread_join(thread, NULL));
 
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_dom_drop, sizeof(audit_tv_dom_drop)));
-	EXPECT_EQ(0, matches_log_domain_deallocated(self->audit_fd, 1,
-						    &deallocated_dom));
+	EXPECT_EQ(0, matches_log_domain_deallocated(
+			     self->audit_fd, 1, denial_dom, &deallocated_dom));
 	EXPECT_EQ(denial_dom, deallocated_dom);
-	EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-				&audit_tv_default, sizeof(audit_tv_default)));
 }
 
 FIXTURE(audit_flags)
@@ -432,22 +429,21 @@ TEST_F(audit_flags, signal)
 
 	if (variant->restrict_flags &
 	    LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF) {
+		/*
+		 * No deallocation record: denials=0 never matches a real
+		 * record.
+		 */
 		EXPECT_EQ(-EAGAIN,
-			  matches_log_domain_deallocated(self->audit_fd, 0,
+			  matches_log_domain_deallocated(self->audit_fd, 0, 0,
 							 &deallocated_dom));
 		EXPECT_EQ(deallocated_dom, 2);
 	} else {
-		EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-					&audit_tv_dom_drop,
-					sizeof(audit_tv_dom_drop)));
 		EXPECT_EQ(0, matches_log_domain_deallocated(self->audit_fd, 2,
+							    *self->domain_id,
 							    &deallocated_dom));
 		EXPECT_NE(deallocated_dom, 2);
 		EXPECT_NE(deallocated_dom, 0);
 		EXPECT_EQ(deallocated_dom, *self->domain_id);
-		EXPECT_EQ(0, setsockopt(self->audit_fd, SOL_SOCKET, SO_RCVTIMEO,
-					&audit_tv_default,
-					sizeof(audit_tv_default)));
 	}
 }
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 1/5] selftests/landlock: Fix snprintf truncation checks in audit helpers
From: Mickaël Salaün @ 2026-04-02 19:26 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Justin Suess,
	Tingmao Wang, stable
In-Reply-To: <20260402192608.1458252-1-mic@digikod.net>

snprintf() returns the number of characters that would have been
written, excluding the terminating NUL byte.  When the output is
truncated, this return value equals or exceeds the buffer size.  Fix
matches_log_domain_allocated() and matches_log_domain_deallocated() to
detect truncation with ">=" instead of ">".

Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- New patch (split from the drain fix).
---
 tools/testing/selftests/landlock/audit.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 44eb433e9666..1049a0582af5 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
 
 	log_match_len =
 		snprintf(log_match, sizeof(log_match), log_template, pid);
-	if (log_match_len > sizeof(log_match))
+	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
 	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
@@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocated(
 
 	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
 				 num_denials);
-	if (log_match_len > sizeof(log_match))
+	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
 	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
-- 
2.53.0


^ permalink raw reply related

* Re: LSM namespacing API
From: Casey Schaufler @ 2026-04-02 17:49 UTC (permalink / raw)
  To: Dr. Greg, Paul Moore
  Cc: Stephen Smalley, Ondrej Mosnacek, linux-security-module, selinux,
	John Johansen, Casey Schaufler
In-Reply-To: <ac5MKr4lFQhc44i6@wind.enjellic.com>

On 4/2/2026 3:59 AM, Dr. Greg wrote:
> That still leaves the question of whether or not CAP_MAC_ADMIN is
> appropriate for gating the creation of a new security namespace.

That will have to be up to the individual LSMs. Not all LSMs implement
Mandatory Access Controls. It would be inappropriate for an LSM that
provides finer grain privilege than capabilities do to be gated by
CAP_MAC_ADMIN. An LSM that implements a novel access control list scheme
would fall under CAP_DAC_SOMETHING, not CAP_MAC_ADMIN. While a time-of-day
access scheme might require CAP_MAC_ADMIN, it might not. Implying that all
LSMs enforce a MAC policy is not a good idea.

^ permalink raw reply

* Re: [PATCH] landlock: Document fallocate(2) as another truncation corner case
From: Mickaël Salaün @ 2026-04-02 18:16 UTC (permalink / raw)
  To: Günther Noack; +Cc: linux-security-module
In-Reply-To: <ac1SP3cGuEeIZFmM@google.com>

On Wed, Apr 01, 2026 at 07:13:35PM +0200, Günther Noack wrote:
> On Wed, Apr 01, 2026 at 06:30:28PM +0200, Mickaël Salaün wrote:
> > On Wed, Apr 01, 2026 at 05:09:10PM +0200, Günther Noack wrote:
> > > Reinforce the already stated policy that LANDLOCK_ACCESS_FS_TRUNCATE should
> > > always go hand in hand with LANDLOCK_ACCESS_FS_WRITE_FILE, as their
> > > meanings and enforcement overlap in counterintuitive ways.
> > > 
> > > On many common file systems, fallocate(2) offers a way to shorten files as
> > > long as the file is opened for writing, side-stepping the
> > > LANDLOCK_ACCESS_FS_TRUNCATE right.
> > > 
> > > Assisted-by: Gemini-CLI:gemini-3.1
> > > Signed-off-by: Günther Noack <gnoack@google.com>
> > > ---
> > >  Documentation/userspace-api/landlock.rst | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> > > index 7f86d7a37dc2..d5691ec136cc 100644
> > > --- a/Documentation/userspace-api/landlock.rst
> > > +++ b/Documentation/userspace-api/landlock.rst
> > > @@ -378,8 +378,8 @@ Truncating files
> > >  
> > >  The operations covered by ``LANDLOCK_ACCESS_FS_WRITE_FILE`` and
> > >  ``LANDLOCK_ACCESS_FS_TRUNCATE`` both change the contents of a file and sometimes
> > > -overlap in non-intuitive ways.  It is recommended to always specify both of
> > > -these together.
> > > +overlap in non-intuitive ways.  It is strongly recommended to always specify
> > > +both of these together (either granting both, or granting none).
> > >  
> > >  A particularly surprising example is :manpage:`creat(2)`.  The name suggests
> > >  that this system call requires the rights to create and write files.  However,
> > > @@ -391,6 +391,10 @@ It should also be noted that truncating files does not require the
> > >  system call, this can also be done through :manpage:`open(2)` with the flags
> > >  ``O_RDONLY | O_TRUNC``.
> > >  
> > > +At the same time, on some filesystems, :manpage:`fallocate(2)` offers a way to
> > > +shorten file contents with ``FALLOC_FL_COLLAPSE_RANGE`` when the file is opened
> > > +for writing, sidestepping the ``LANDLOCK_ACCESS_FS_TRUNCATE`` right.
> > 
> > Interesting, which filesystems?  Shouldn't it be fixed in the code
> > instead?
> 
> It works on ext4, and I also see mentions of FALLOC_FL_COLLAPSE_RANGE
> in XFS, F2FS, SMB and NTFS3.
> 
> I should mention, it is not *exactly* the same as a truncation, but
> you can remove a chunk of the file from the middle, which also leads
> to a shorter file.  For example, assuming a block size of 1024:
> 
>   1. Make a file with 2*1024 bytes: 1024*'A', then 1024*'B'
>   2. fallocate(collapse range, 0, 1024)
> 
> Resulting file is 1024*'B', and the file is shortened to 1024 bytes.
> 
> So this is not *exactly* a truncation.  (The man page says that an
> attempt to remove the end of a file results in EINVAL, so you have to
> take it from the middle, and it needs to align with block boundaries.)
> 
> But it's quite similar, also shortens the file, and it does not
> require the Landlock truncation access right.
> 
> I agree, another way would potentially be to call the LSM ftruncate
> hook.  I suspect this would stay compatible with other LSMs, because
> the LSM ftruncate hook is a relatively recent addition (but have not
> checked in detail).
> 
> The implementation of fallocate is vfs_fallocate() in fs/open.c - I
> only had a tentative look now; it checks that the file->f_mode is open
> for writing and calls security_file_permission() with MAY_WRITE.
> 
> I always saw LANDLOCK_ACCESS_FS_WRITE_FILE and
> LANDLOCK_ACCESS_FS_TRUNCATE as rights that should always go together,
> so I suspect that it does not make a big difference in practice, and
> that is why I am suggesting to just document it more clearly for now.

OK, I agree, I'll take this patch. Thanks!

> 
> —Günther
> 

^ permalink raw reply

* Re: [PATCH v8 04/12] landlock: Control pathname UNIX domain socket resolution by path
From: Kuniyuki Iwashima @ 2026-04-02 18:09 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, John Johansen, Tingmao Wang,
	Justin Suess, Sebastian Andrzej Siewior, Jann Horn,
	linux-security-module, Samasth Norway Ananda, Matthieu Buffet,
	Mikhail Ivanov, konstantin.meskhidze, Demi Marie Obenour,
	Alyssa Ross, Tahera Fahimi, Georgia Garcia
In-Reply-To: <20260327164838.38231-5-gnoack3000@gmail.com>

On Fri, Mar 27, 2026 at 9:49 AM Günther Noack <gnoack3000@gmail.com> wrote:
>
> * Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
>   controls the lookup operations for named UNIX domain sockets.  The
>   resolution happens during connect() and sendmsg() (depending on
>   socket type).
> * Change access_mask_t from u16 to u32 (see below)
> * Hook into the path lookup in unix_find_bsd() in af_unix.c, using a
>   LSM hook.  Make policy decisions based on the new access rights
> * Increment the Landlock ABI version.
> * Minor test adaptations to keep the tests working.
> * Document the design rationale for scoped access rights,
>   and cross-reference it from the header documentation.
>
> With this access right, access is granted if either of the following
> conditions is met:
>
> * The target socket's filesystem path was allow-listed using a
>   LANDLOCK_RULE_PATH_BENEATH rule, *or*:
> * The target socket was created in the same Landlock domain in which
>   LANDLOCK_ACCESS_FS_RESOLVE_UNIX was restricted.
>
> In case of a denial, connect() and sendmsg() return EACCES, which is
> the same error as it is returned if the user does not have the write
> bit in the traditional UNIX file system permissions of that file.
>
> The access_mask_t type grows from u16 to u32 to make space for the new
> access right.  This also doubles the size of struct layer_access_masks
> from 32 byte to 64 byte.
>
> Document the (possible future) interaction between scoped flags and
> other access rights in struct landlock_ruleset_attr, and summarize the
> rationale, as discussed in code review leading up to [2].
>
> This feature was created with substantial discussion and input from
> Justin Suess, Tingmao Wang and Mickaël Salaün.
>
> Cc: Tingmao Wang <m@maowtm.org>
> Cc: Justin Suess <utilityemal77@gmail.com>
> Cc: Mickaël Salaün <mic@digikod.net>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Kuniyuki Iwashima <kuniyu@google.com>
> Suggested-by: Jann Horn <jannh@google.com>
> Link[1]: https://github.com/landlock-lsm/linux/issues/36
> Link[2]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
> Signed-off-by: Günther Noack <gnoack3000@gmail.com>
> ---
>  Documentation/security/landlock.rst          |  42 +++++-
>  Documentation/userspace-api/landlock.rst     |   2 +-
>  include/uapi/linux/landlock.h                |  21 +++
>  security/landlock/access.h                   |   2 +-
>  security/landlock/audit.c                    |   1 +
>  security/landlock/fs.c                       | 130 ++++++++++++++++++-
>  security/landlock/limits.h                   |   2 +-
>  security/landlock/syscalls.c                 |   2 +-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  tools/testing/selftests/landlock/fs_test.c   |   5 +-
>  10 files changed, 200 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> index 3e4d4d04cfae..c3f8f43073a7 100644
> --- a/Documentation/security/landlock.rst
> +++ b/Documentation/security/landlock.rst
> @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
>  ==================================
>
>  :Author: Mickaël Salaün
> -:Date: September 2025
> +:Date: March 2026
>
>  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
>  harden a whole system, this feature should be available to any process,
> @@ -89,6 +89,46 @@ this is required to keep access controls consistent over the whole system, and
>  this avoids unattended bypasses through file descriptor passing (i.e. confused
>  deputy attack).
>
> +.. _scoped-flags-interaction:
> +
> +Interaction between scoped flags and other access rights
> +--------------------------------------------------------
> +
> +The ``scoped`` flags in ``struct landlock_ruleset_attr`` restrict the
> +use of *outgoing* IPC from the created Landlock domain, while they
> +permit reaching out to IPC endpoints *within* the created Landlock
> +domain.
> +
> +In the future, scoped flags *may* interact with other access rights,
> +e.g. so that abstract UNIX sockets can be allow-listed by name, or so
> +that signals can be allow-listed by signal number or target process.
> +
> +When introducing ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``, we defined it to
> +implicitly have the same scoping semantics as a
> +``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` flag would have: connecting to
> +UNIX sockets within the same domain (where
> +``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` is used) is unconditionally
> +allowed.
> +
> +The reasoning is:
> +
> +* Like other IPC mechanisms, connecting to named UNIX sockets in the
> +  same domain should be expected and harmless.  (If needed, users can
> +  further refine their Landlock policies with nested domains or by
> +  restricting ``LANDLOCK_ACCESS_FS_MAKE_SOCK``.)
> +* We reserve the option to still introduce
> +  ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the future.  (This would
> +  be useful if we wanted to have a Landlock rule to permit IPC access
> +  to other Landlock domains.)
> +* But we can postpone the point in time when users have to deal with
> +  two interacting flags visible in the userspace API.  (In particular,
> +  it is possible that it won't be needed in practice, in which case we
> +  can avoid the second flag altogether.)
> +* If we *do* introduce ``LANDLOCK_SCOPE_PATHNAME_UNIX_SOCKET`` in the
> +  future, setting this scoped flag in a ruleset does *not reduce* the
> +  restrictions, because access within the same scope is already
> +  allowed based on ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``.
> +
>  Tests
>  =====
>
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index 13134bccdd39..1490f879f621 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -8,7 +8,7 @@ Landlock: unprivileged access control
>  =====================================
>
>  :Author: Mickaël Salaün
> -:Date: January 2026
> +:Date: March 2026
>
>  The goal of Landlock is to enable restriction of ambient rights (e.g. global
>  filesystem or network access) for a set of processes.  Because Landlock
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index f88fa1f68b77..3157d257555b 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -248,6 +248,26 @@ struct landlock_net_port_attr {
>   *
>   *   This access right is available since the fifth version of the Landlock
>   *   ABI.
> + * - %LANDLOCK_ACCESS_FS_RESOLVE_UNIX: Look up pathname UNIX domain sockets
> + *   (:manpage:`unix(7)`).  On UNIX domain sockets, this restricts both calls to
> + *   :manpage:`connect(2)` as well as calls to :manpage:`sendmsg(2)` with an
> + *   explicit recipient address.
> + *
> + *   This access right only applies to connections to UNIX server sockets which
> + *   were created outside of the newly created Landlock domain (e.g. from within
> + *   a parent domain or from an unrestricted process).  Newly created UNIX
> + *   servers within the same Landlock domain continue to be accessible.  In this
> + *   regard, %LANDLOCK_ACCESS_FS_RESOLVE_UNIX has the same semantics as the
> + *   ``LANDLOCK_SCOPE_*`` flags.
> + *
> + *   If a resolve attempt is denied, the operation returns an ``EACCES`` error,
> + *   in line with other filesystem access rights (but different to denials for
> + *   abstract UNIX domain sockets).
> + *
> + *   This access right is available since the ninth version of the Landlock ABI.
> + *
> + *   The rationale for this design is described in
> + *   :ref:`Documentation/security/landlock.rst <scoped-flags-interaction>`.
>   *
>   * Whether an opened file can be truncated with :manpage:`ftruncate(2)` or used
>   * with `ioctl(2)` is determined during :manpage:`open(2)`, in the same way as
> @@ -333,6 +353,7 @@ struct landlock_net_port_attr {
>  #define LANDLOCK_ACCESS_FS_REFER                       (1ULL << 13)
>  #define LANDLOCK_ACCESS_FS_TRUNCATE                    (1ULL << 14)
>  #define LANDLOCK_ACCESS_FS_IOCTL_DEV                   (1ULL << 15)
> +#define LANDLOCK_ACCESS_FS_RESOLVE_UNIX                        (1ULL << 16)
>  /* clang-format on */
>
>  /**
> diff --git a/security/landlock/access.h b/security/landlock/access.h
> index 277b6ed7f7bb..99c709f7979e 100644
> --- a/security/landlock/access.h
> +++ b/security/landlock/access.h
> @@ -34,7 +34,7 @@
>         LANDLOCK_ACCESS_FS_IOCTL_DEV)
>  /* clang-format on */
>
> -typedef u16 access_mask_t;
> +typedef u32 access_mask_t;
>
>  /* Makes sure all filesystem access rights can be stored. */
>  static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
> diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> index 60ff217ab95b..8d0edf94037d 100644
> --- a/security/landlock/audit.c
> +++ b/security/landlock/audit.c
> @@ -37,6 +37,7 @@ static const char *const fs_access_strings[] = {
>         [BIT_INDEX(LANDLOCK_ACCESS_FS_REFER)] = "fs.refer",
>         [BIT_INDEX(LANDLOCK_ACCESS_FS_TRUNCATE)] = "fs.truncate",
>         [BIT_INDEX(LANDLOCK_ACCESS_FS_IOCTL_DEV)] = "fs.ioctl_dev",
> +       [BIT_INDEX(LANDLOCK_ACCESS_FS_RESOLVE_UNIX)] = "fs.resolve_unix",
>  };
>
>  static_assert(ARRAY_SIZE(fs_access_strings) == LANDLOCK_NUM_ACCESS_FS);
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 97065d51685a..fcf69b3d734d 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -27,6 +27,7 @@
>  #include <linux/lsm_hooks.h>
>  #include <linux/mount.h>
>  #include <linux/namei.h>
> +#include <linux/net.h>
>  #include <linux/path.h>
>  #include <linux/pid.h>
>  #include <linux/rcupdate.h>
> @@ -36,6 +37,7 @@
>  #include <linux/types.h>
>  #include <linux/wait_bit.h>
>  #include <linux/workqueue.h>
> +#include <net/af_unix.h>
>  #include <uapi/linux/fiemap.h>
>  #include <uapi/linux/landlock.h>
>
> @@ -314,7 +316,8 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
>         LANDLOCK_ACCESS_FS_WRITE_FILE | \
>         LANDLOCK_ACCESS_FS_READ_FILE | \
>         LANDLOCK_ACCESS_FS_TRUNCATE | \
> -       LANDLOCK_ACCESS_FS_IOCTL_DEV)
> +       LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> +       LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
>  /* clang-format on */
>
>  /*
> @@ -1557,6 +1560,130 @@ static int hook_path_truncate(const struct path *const path)
>         return current_check_access_path(path, LANDLOCK_ACCESS_FS_TRUNCATE);
>  }
>
> +/**
> + * unmask_scoped_access - Remove access right bits in @masks in all layers
> + *                        where @client and @server have the same domain
> + *
> + * This does the same as domain_is_scoped(), but unmasks bits in @masks.
> + * It can not return early as domain_is_scoped() does.
> + *
> + * A scoped access for a given access right bit is allowed iff, for all layer
> + * depths where the access bit is set, the client and server domain are the
> + * same.  This function clears the access rights @access in @masks at all layer
> + * depths where the client and server domain are the same, so that, when they
> + * are all cleared, the access is allowed.
> + *
> + * @client: Client domain
> + * @server: Server domain
> + * @masks: Layer access masks to unmask
> + * @access: Access bits that control scoping
> + */
> +static void unmask_scoped_access(const struct landlock_ruleset *const client,
> +                                const struct landlock_ruleset *const server,
> +                                struct layer_access_masks *const masks,
> +                                const access_mask_t access)
> +{
> +       int client_layer, server_layer;
> +       const struct landlock_hierarchy *client_walker, *server_walker;
> +
> +       /* This should not happen. */
> +       if (WARN_ON_ONCE(!client))
> +               return;
> +
> +       /* Server has no Landlock domain; nothing to clear. */
> +       if (!server)
> +               return;
> +
> +       /*
> +        * client_layer must be a signed integer with greater capacity
> +        * than client->num_layers to ensure the following loop stops.
> +        */
> +       BUILD_BUG_ON(sizeof(client_layer) > sizeof(client->num_layers));
> +
> +       client_layer = client->num_layers - 1;
> +       client_walker = client->hierarchy;
> +       server_layer = server->num_layers - 1;
> +       server_walker = server->hierarchy;
> +
> +       /*
> +        * Clears the access bits at all layers where the client domain is the
> +        * same as the server domain.  We start the walk at min(client_layer,
> +        * server_layer).  The layer bits until there can not be cleared because
> +        * either the client or the server domain is missing.
> +        */
> +       for (; client_layer > server_layer; client_layer--)
> +               client_walker = client_walker->parent;
> +
> +       for (; server_layer > client_layer; server_layer--)
> +               server_walker = server_walker->parent;
> +
> +       for (; client_layer >= 0; client_layer--) {
> +               if (masks->access[client_layer] & access &&
> +                   client_walker == server_walker)
> +                       masks->access[client_layer] &= ~access;
> +
> +               client_walker = client_walker->parent;
> +               server_walker = server_walker->parent;
> +       }
> +}
> +
> +static int hook_unix_find(const struct path *const path, struct sock *other,
> +                         int flags)
> +{
> +       const struct landlock_ruleset *dom_other;
> +       const struct landlock_cred_security *subject;
> +       struct layer_access_masks layer_masks;
> +       struct landlock_request request = {};
> +       static const struct access_masks fs_resolve_unix = {
> +               .fs = LANDLOCK_ACCESS_FS_RESOLVE_UNIX,
> +       };
> +
> +       /* Lookup for the purpose of saving coredumps is OK. */
> +       if (unlikely(flags & SOCK_COREDUMP))
> +               return 0;
> +
> +       subject = landlock_get_applicable_subject(current_cred(),
> +                                                 fs_resolve_unix, NULL);
> +
> +       if (!subject)
> +               return 0;
> +
> +       /*
> +        * Ignoring return value: that the domains apply was already checked in
> +        * landlock_get_applicable_subject() above.
> +        */
> +       landlock_init_layer_masks(subject->domain, fs_resolve_unix.fs,
> +                                 &layer_masks, LANDLOCK_KEY_INODE);
> +
> +       /* Checks the layers in which we are connecting within the same domain. */
> +       unix_state_lock(other);
> +       if (unlikely(sock_flag(other, SOCK_DEAD) || !other->sk_socket ||
> +                    !other->sk_socket->file)) {

When will the latter two condition be true when !SOCK_DEAD ?

unix_find_bsd() should not find embryo sockets.


> +               unix_state_unlock(other);
> +               /*
> +                * We rely on the caller to catch the (non-reversible) SOCK_DEAD
> +                * condition and retry the lookup.  If we returned an error
> +                * here, the lookup would not get retried.
> +                */
> +               return 0;
> +       }
> +       dom_other = landlock_cred(other->sk_socket->file->f_cred)->domain;
> +
> +       /* Access to the same (or a lower) domain is always allowed. */
> +       unmask_scoped_access(subject->domain, dom_other, &layer_masks,
> +                            fs_resolve_unix.fs);
> +       unix_state_unlock(other);
> +
> +       /* Checks the connections to allow-listed paths. */
> +       if (is_access_to_paths_allowed(subject->domain, path,
> +                                      fs_resolve_unix.fs, &layer_masks,
> +                                      &request, NULL, 0, NULL, NULL, NULL))
> +               return 0;
> +
> +       landlock_log_denial(subject, &request);
> +       return -EACCES;
> +}
> +
>  /* File hooks */
>
>  /**
> @@ -1834,6 +1961,7 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
>         LSM_HOOK_INIT(path_unlink, hook_path_unlink),
>         LSM_HOOK_INIT(path_rmdir, hook_path_rmdir),
>         LSM_HOOK_INIT(path_truncate, hook_path_truncate),
> +       LSM_HOOK_INIT(unix_find, hook_unix_find),
>
>         LSM_HOOK_INIT(file_alloc_security, hook_file_alloc_security),
>         LSM_HOOK_INIT(file_open, hook_file_open),
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index eb584f47288d..b454ad73b15e 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -19,7 +19,7 @@
>  #define LANDLOCK_MAX_NUM_LAYERS                16
>  #define LANDLOCK_MAX_NUM_RULES         U32_MAX
>
> -#define LANDLOCK_LAST_ACCESS_FS                LANDLOCK_ACCESS_FS_IOCTL_DEV
> +#define LANDLOCK_LAST_ACCESS_FS                LANDLOCK_ACCESS_FS_RESOLVE_UNIX
>  #define LANDLOCK_MASK_ACCESS_FS                ((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
>  #define LANDLOCK_NUM_ACCESS_FS         __const_hweight64(LANDLOCK_MASK_ACCESS_FS)
>
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 3b33839b80c7..a6e23657f3ce 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -166,7 +166,7 @@ static const struct file_operations ruleset_fops = {
>   * If the change involves a fix that requires userspace awareness, also update
>   * the errata documentation in Documentation/userspace-api/landlock.rst .
>   */
> -const int landlock_abi_version = 8;
> +const int landlock_abi_version = 9;
>
>  /**
>   * sys_landlock_create_ruleset - Create a new ruleset
> diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
> index 0fea236ef4bd..30d37234086c 100644
> --- a/tools/testing/selftests/landlock/base_test.c
> +++ b/tools/testing/selftests/landlock/base_test.c
> @@ -76,7 +76,7 @@ TEST(abi_version)
>         const struct landlock_ruleset_attr ruleset_attr = {
>                 .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
>         };
> -       ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
> +       ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
>                                              LANDLOCK_CREATE_RULESET_VERSION));
>
>         ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
> diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
> index 968a91c927a4..b318627e7561 100644
> --- a/tools/testing/selftests/landlock/fs_test.c
> +++ b/tools/testing/selftests/landlock/fs_test.c
> @@ -575,9 +575,10 @@ TEST_F_FORK(layout1, inval)
>         LANDLOCK_ACCESS_FS_WRITE_FILE | \
>         LANDLOCK_ACCESS_FS_READ_FILE | \
>         LANDLOCK_ACCESS_FS_TRUNCATE | \
> -       LANDLOCK_ACCESS_FS_IOCTL_DEV)
> +       LANDLOCK_ACCESS_FS_IOCTL_DEV | \
> +       LANDLOCK_ACCESS_FS_RESOLVE_UNIX)
>
> -#define ACCESS_LAST LANDLOCK_ACCESS_FS_IOCTL_DEV
> +#define ACCESS_LAST LANDLOCK_ACCESS_FS_RESOLVE_UNIX
>
>  #define ACCESS_ALL ( \
>         ACCESS_FILE | \
> --
> 2.53.0
>

^ permalink raw reply

* Re: LSM namespacing API
From: Dr. Greg @ 2026-04-02 10:59 UTC (permalink / raw)
  To: Paul Moore
  Cc: Stephen Smalley, Ondrej Mosnacek, linux-security-module, selinux,
	John Johansen
In-Reply-To: <CAHC9VhR1R7TcC2a2wZ9-G8dmXTuhcDK1YedDduq0sFgPC8QxFw@mail.gmail.com>

On Sun, Mar 29, 2026 at 08:56:37PM -0400, Paul Moore wrote:

Good morning, hopefully the week is going well for everyone.

> On Sun, Mar 29, 2026 at 12:09???PM Dr. Greg <greg@enjellic.com> wrote:
> > On Tue, Mar 24, 2026 at 05:31:09PM -0400, Paul Moore wrote:
> > > On Tue, Mar 3, 2026 at 11:46???AM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > I'd really like to hear from some of the other LSMs before we start
> > > > diving into the code.  It may sound funny, but from my perspective
> > > > doing the work to get the API definition "right" is far more important
> > > > than implementing it.
> >
> > > It's been three weeks now, and I haven't seen any strong arguments for
> > > supporting the clone() API at this time, so we can leave that out for
> > > now and stick with just the unshare() API for an initial attempt.  We
> > > can always add a clone() API at a later date if needed; going small
> > > and expanding over time is usually a better decision anyway.
> > >
> > > So to quickly summarize, here is where I think the discussion landed:
> > >
> > > * Implement the lsm_unshare() syscall
> > >
> > > I expect it would look something like 'lsm_unshare(struct lsm_ctx
> > > *ctx, u32 size, u32 flags)' with @ctx specifying the particular LSM
> > > being unshared, and @flags being 0/unused at this point in time
> > > (unless we can think of something we want to specify here).  Like
> > > lsm_set_self_attr(), only one @ctx can be specified at a time, so you
> > > can only unshare one LSM at a time.
> >
> > Unless we miss something, it would seem that there needs to be
> > additional thought as to how a process moves, atomically, from one
> > effective security configuration to the next.
> >
> > At a minimum, if we restrict ourselves to the model of simply changing
> > the namespace for a single LSM, there would seem to be a need to have
> > a 2-step process in order to atomically transition from one security
> > model/policy to the next.

> That depends on the individual LSMs, they are free to interpret the
> unshare request and handle it however they like.

No argument there.

An LSM will obviously need to allocate an LSM namespace specific
security 'blob' in order to hold the security context for the new
namespace.

Christian had proposed patches for a generic mechanism to create
LSM security namespace blobs, is implementation of that in scope for
this effort?

> > The interim between the first and second steps would allow an
> > orchestrator to configure the new namespace and load new namespace
> > specific policy into the security namespace ...

> As discussed previously, the LSM policy load syscalls might include
> some LSM namespace options. However, I first want to focus on
> finalizing the most basic namespace API, which on Linux is arguably
> the unshare() syscall concept.

Unfortunately, without considering all the implications and
requirements of various LSM's we may end up with lsm_share2() and
beyond.

See below.

> > It would seem that the flags variable might be a good option to use to
> > handle this 2-stage transition, for example LSM_NS_INIT and
> > LSM_NS_CHANGE, respectively, to specify the initialization and
> > execution phases of the transition.

> No.  The lsm_unshare() syscall is intended to mimic the existing
> unshare() syscall as a single step process from a user's
> perspective.  If it returns successfully the caller will be in a new
> LSM namespace as defined by the individual LSM specified in the
> syscall.

OK, we can reason forward with that paradigm.

An orchestrator issues the unshare call for an LSM namespace and upon
return from the system call the calling task is in a new namespace for
that particular LSM, the goal of which is presumably to implement a
security policy/model different than what had been in force
previously.

So the process is in a new LSM specific namespace, but still
implementing the policy from the previous namespace, until the
orchestrator can load the new policy and then trigger the LSM to
change from its previous policy to the newly loaded policy.

Is this consistent with your vision as to how all of this will work?

> > The other unanswered issue, or perhaps we missed it, are the security
> > controls that should be associated with the unshare call.

> Each LSM is free to implement whatever access controls it deems
> necessary in its lsm_unshare() callback.

Just to be clear.

When you refer to 'lsm_unshare() callback' are you referring to a new
LSM security hook to be be implemented that will allow all of the
active LSM's to pass judgement on whether or not the unshare should be
allowed to complete successfully?

See below.

> > Will there be a new LSM hook that allows other LSM's to veto the
> > creation of a namespace either for itself or for another LSM?
> 
> I would expect the lsm_unshare() syscall to operate similarly to the
> lsm_set_self_attr() syscall in this regard.

The reference to handling this like lsm_set_self_attr() is unclear.

With lsm_set_self_attr() there is no reason for another LSM to deny
setting what is an LSM specific attribute, as you note above, each LSM
gets to decide if the request to set an attribute for the LSM should
be accepted or denied.

Since lsm_unshare() is changing the overall platform security state,
it seems consistent with the design of the LSM for other LSM's to be
able to veto this action.

Once again, this seems like an action that would be consistent with
the notion of the lockdown LSM,

> > Is there a need to have yet another kernel command-line parameter that
> > would completely deny the ability to create security namespaces?

> No, at least not at this point in time.

This would seem to reinforce issues in the previous discussion.

Given that distributions are 'kitchen sink' implementations it would
seem desirable that system security architects would want to use a
lockdown option to insure that the platform security configuration
cannot be changed.

> Individual LSMs can decide how they want to gate their own namespace
> functionality, if they implement namespaces at all.
> 
> > Is CAP_MAC_ADMIN appropriate as the required capability to create a
> > new namespace or does there need to be, for security rigor, a specific
> > capability (CAP_LSM_NS?) that gates the ability to execute whatever
> > form of the system call is adopted?

> Once again, this is up to the individual LSMs, not the framework
> layer.

Fair enough.

That still leaves the question of whether or not CAP_MAC_ADMIN is
appropriate for gating the creation of a new security namespace.

> > Should there be an option to completely compile LSM namespaces out of
> > the kernel?

> That doesn't belong in the LSM framework layer, that is up to the
> individual LSMs.

You noted above the desire for lsm_unshare to be consistent with other
namespaces.

The current kernel paradigm is to allow classes of namespace
resources, ie. CONFIG_UTS_NS, CONFIG_TIME_NS et.al., to be compiled in
our out of the kernel.

It seems that CONFIG_LSM_NS would be consistent with that model.

> > > * Implement /proc/pid/ns/lsm and setns(CLONE_NEWLSM)
> > >
> > > As discussed previously, this allows us to move a process into an
> > > existing, established LSM namespace set.  The caller cannot
> > > selectively choose which individual LSM namespaces they join from the
> > > given LSM namespace set, they receive the same LSM namespace
> > > configuration as the target process.
> >
> > As an initial aside.  It would be assumed that a positive result of a
> > setns call would be to cause the calling process to atomically change
> > its security namespace set.  This would further suggest the need to
> > have the security namespace creation process also execute atomically
> > in a multi-LSM namespace change environment.

> In the setns case no new LSM namespaces should be created, the process
> simply joins an existing set of LSM namespaces.

The issue isn't about new namespaces being created, the issue is
atomicity of a change to a new set of security policies.

With setns an atomic transition is implemented.

The proposed lsm_unshare() behavior results in a period of time when
multiple and varying security policies are active, depending on
various race issues in the orchestrator implementation.

This opens the door to a raft of potential security issues that we can
have a new acronym for, Time Of Implementation Time Of Use (TOITOU).

> > ... That is the concept of whether or not a setns
> > call, for any resource namespace, should also force a security
> > namespace change if the security namespace of the calling process
> > differs from that of the target process.

> That decision is left to the individual LSMs.

That is reasonable.

In order to support that model, there would seem to be a need to have
a new LSM call in the setns code that allows LSM's to determine
whether or not a change in the active security namespace set should be
forced, correct?

If so, is implementation of this in scope for the lsm_unshare()
infrastructure?

To close, at the risk of being the devils advocate.

Given that the sentiment is to force almost all of these
issues/decisions into the individual LSM's, what is the advantage of
having a common lsm_unshare() system call?

In the proposed model, a resource orchestrator is going to need to
have extensive knowledge over the mechanics of all the LSM's that
implement namespace functionality.  At a very minimum, intrinsic to
the concept of security namespaces, there will be a need to load a new
policy or model into the namespace, an action that will be deeply LSM
specific.

At this point, the only common functionality may be the allocation of
a new LSM namespace 'blob'.  An argument for not doing that in
lsm_unshare() is that it precludes the ability of an orchestrator to
implement an atomic policy change, as that would require an
orchestrator to somehow load a policy/model before lsm_unshare() is
called, which in turn would require a new security context to be
allocated prior to the unshare operation.

All of this tends to be an issue with integrity or measurement based
namespaces, which are important with respect to supporting
confidential computing initiatives.  Without two stage namespace
transition, you stumble into subtle problems associated with
'Heisenberg dilemma' issues.

> paul-moore.com

Hopefully all of this will assist in defining the requirements for all
of this.

Have a good remainder of the week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply

* Re: [PATCH v8 04/12] landlock: Control pathname UNIX domain socket resolution by path
From: Sebastian Andrzej Siewior @ 2026-04-02  9:51 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, John Johansen, Tingmao Wang,
	Justin Suess, Kuniyuki Iwashima, Jann Horn, linux-security-module,
	Samasth Norway Ananda, Matthieu Buffet, Mikhail Ivanov,
	konstantin.meskhidze, Demi Marie Obenour, Alyssa Ross,
	Tahera Fahimi, Georgia Garcia
In-Reply-To: <20260327164838.38231-5-gnoack3000@gmail.com>

On 2026-03-27 17:48:29 [+0100], Günther Noack wrote:
> * Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
>   controls the lookup operations for named UNIX domain sockets.  The
>   resolution happens during connect() and sendmsg() (depending on
>   socket type).
…
> Cc: Tingmao Wang <m@maowtm.org>
> Cc: Justin Suess <utilityemal77@gmail.com>
> Cc: Mickaël Salaün <mic@digikod.net>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Kuniyuki Iwashima <kuniyu@google.com>
> Suggested-by: Jann Horn <jannh@google.com>
> Link[1]: https://github.com/landlock-lsm/linux/issues/36
> Link[2]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
> Signed-off-by: Günther Noack <gnoack3000@gmail.com>

The unix bits look okay to me,

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian

^ permalink raw reply

* Re: [PATCH v6.1] apparmor: fix unprivileged local user can do privileged policy management
From: Keerthana Kalyanasundaram @ 2026-04-02  8:03 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, john.johansen, paul, jmorris, serge, georgia.garcia,
	cengiz.can, sashal, apparmor, linux-security-module, linux-kernel,
	ajay.kaher, alexey.makhalov, vamsi-krishna.brahmajosyula,
	yin.ding, tapas.kundu, Qualys Security Advisory,
	Salvatore Bonaccorso
In-Reply-To: <2026040249-fable-sasquatch-4864@gregkh>


[-- Attachment #1.1: Type: text/plain, Size: 942 bytes --]

On Thu, Apr 2, 2026 at 11:31 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> On Thu, Apr 02, 2026 at 05:47:00AM +0000, Keerthana K wrote:
> > From: John Johansen <john.johansen@canonical.com>
> >
> > commit 6601e13e82841879406bf9f369032656f441a425 upstream.
>
> <snip>
>
> Does your group/company/whatever actually use apparmor?  If so, this
> isn't the only commit that needs to be backported.  I'm waiting on a
> "correct" set of 6.1.y patches from John before applying all of them to
> 6.1.y and then I can take the patch series that he gave me for 5.10.y
> and 5.15.y and will queue them up.
>
> So thanks for this backport, but it's not going to help resolve all of
> the recent fixes that went in as part of this series by just applying
> one of them.
>
> Thanks for the update, Greg. We will wait for John to queue and apply the
complete series of patches to the stable branches.

 thanks,
>
> greg k-h
>

[-- Attachment #1.2: Type: text/html, Size: 1696 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5459 bytes --]

^ permalink raw reply

* Re: [PATCH v6.1] apparmor: fix unprivileged local user can do privileged policy management
From: Greg KH @ 2026-04-02  6:01 UTC (permalink / raw)
  To: Keerthana K
  Cc: stable, john.johansen, paul, jmorris, serge, georgia.garcia,
	cengiz.can, sashal, apparmor, linux-security-module, linux-kernel,
	ajay.kaher, alexey.makhalov, vamsi-krishna.brahmajosyula,
	yin.ding, tapas.kundu, Qualys Security Advisory,
	Salvatore Bonaccorso
In-Reply-To: <20260402054700.2798707-1-keerthana.kalyanasundaram@broadcom.com>

On Thu, Apr 02, 2026 at 05:47:00AM +0000, Keerthana K wrote:
> From: John Johansen <john.johansen@canonical.com>
> 
> commit 6601e13e82841879406bf9f369032656f441a425 upstream.

<snip>

Does your group/company/whatever actually use apparmor?  If so, this
isn't the only commit that needs to be backported.  I'm waiting on a
"correct" set of 6.1.y patches from John before applying all of them to
6.1.y and then I can take the patch series that he gave me for 5.10.y
and 5.15.y and will queue them up.

So thanks for this backport, but it's not going to help resolve all of
the recent fixes that went in as part of this series by just applying
one of them.

thanks,

greg k-h

^ permalink raw reply

* [PATCH v5.10-v5.15] apparmor: fix unprivileged local user can do privileged policy management
From: Keerthana K @ 2026-04-02  5:47 UTC (permalink / raw)
  To: stable, gregkh
  Cc: john.johansen, paul, jmorris, serge, georgia.garcia, cengiz.can,
	sashal, apparmor, linux-security-module, linux-kernel, ajay.kaher,
	alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
	tapas.kundu, Qualys Security Advisory, Salvatore Bonaccorso,
	Keerthana K

From: John Johansen <john.johansen@canonical.com>

commit 6601e13e82841879406bf9f369032656f441a425 upstream.

An unprivileged local user can load, replace, and remove profiles by
opening the apparmorfs interfaces, via a confused deputy attack, by
passing the opened fd to a privileged process, and getting the
privileged process to write to the interface.

This does require a privileged target that can be manipulated to do
the write for the unprivileged process, but once such access is
achieved full policy management is possible and all the possible
implications that implies: removing confinement, DoS of system or
target applications by denying all execution, by-passing the
unprivileged user namespace restriction, to exploiting kernel bugs for
a local privilege escalation.

The policy management interface can not have its permissions simply
changed from 0666 to 0600 because non-root processes need to be able
to load policy to different policy namespaces.

Instead ensure the task writing the interface has privileges that
are a subset of the task that opened the interface. This is already
done via policy for confined processes, but unconfined can delegate
access to the opened fd, by-passing the usual policy check.

Fixes: b7fd2c0340eac ("apparmor: add per policy ns .load, .replace, .remove interface files")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Keerthana: aa_may_manage_policy() does not take a subj_cred
parameter (added in 90c436a64a6e, merged in v6.7). Pass current_cred()
directly to is_subset_of_obj_privilege() in place of subj_cred, which
is equivalent since all call sites pass current_cred() as subj_cred.]
Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com>
---
 security/apparmor/apparmorfs.c     | 16 ++++++++------
 security/apparmor/include/policy.h |  2 +-
 security/apparmor/policy.c         | 35 +++++++++++++++++++++++++++++-
 3 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index e736936f4f0b..3053e5731b02 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -409,7 +409,8 @@ static struct aa_loaddata *aa_simple_write_to_buffer(const char __user *userbuf,
 }
 
 static ssize_t policy_update(u32 mask, const char __user *buf, size_t size,
-			     loff_t *pos, struct aa_ns *ns)
+			     loff_t *pos, struct aa_ns *ns,
+			     const struct cred *ocred)
 {
 	struct aa_loaddata *data;
 	struct aa_label *label;
@@ -420,7 +421,7 @@ static ssize_t policy_update(u32 mask, const char __user *buf, size_t size,
 	/* high level check about policy management - fine grained in
 	 * below after unpack
 	 */
-	error = aa_may_manage_policy(label, ns, mask);
+	error = aa_may_manage_policy(label, ns, ocred, mask);
 	if (error)
 		goto end_section;
 
@@ -441,7 +442,8 @@ static ssize_t profile_load(struct file *f, const char __user *buf, size_t size,
 			    loff_t *pos)
 {
 	struct aa_ns *ns = aa_get_ns(f->f_inode->i_private);
-	int error = policy_update(AA_MAY_LOAD_POLICY, buf, size, pos, ns);
+	int error = policy_update(AA_MAY_LOAD_POLICY, buf, size, pos, ns,
+				  f->f_cred);
 
 	aa_put_ns(ns);
 
@@ -459,7 +461,7 @@ static ssize_t profile_replace(struct file *f, const char __user *buf,
 {
 	struct aa_ns *ns = aa_get_ns(f->f_inode->i_private);
 	int error = policy_update(AA_MAY_LOAD_POLICY | AA_MAY_REPLACE_POLICY,
-				  buf, size, pos, ns);
+				  buf, size, pos, ns, f->f_cred);
 	aa_put_ns(ns);
 
 	return error;
@@ -483,7 +485,7 @@ static ssize_t profile_remove(struct file *f, const char __user *buf,
 	/* high level check about policy management - fine grained in
 	 * below after unpack
 	 */
-	error = aa_may_manage_policy(label, ns, AA_MAY_REMOVE_POLICY);
+	error = aa_may_manage_policy(label, ns, f->f_cred, AA_MAY_REMOVE_POLICY);
 	if (error)
 		goto out;
 
@@ -1796,7 +1798,7 @@ static int ns_mkdir_op(struct inode *dir, struct dentry *dentry, umode_t mode)
 	int error;
 
 	label = begin_current_label_crit_section();
-	error = aa_may_manage_policy(label, NULL, AA_MAY_LOAD_POLICY);
+	error = aa_may_manage_policy(label, NULL, NULL, AA_MAY_LOAD_POLICY);
 	end_current_label_crit_section(label);
 	if (error)
 		return error;
@@ -1845,7 +1847,7 @@ static int ns_rmdir_op(struct inode *dir, struct dentry *dentry)
 	int error;
 
 	label = begin_current_label_crit_section();
-	error = aa_may_manage_policy(label, NULL, AA_MAY_LOAD_POLICY);
+	error = aa_may_manage_policy(label, NULL, NULL, AA_MAY_LOAD_POLICY);
 	end_current_label_crit_section(label);
 	if (error)
 		return error;
diff --git a/security/apparmor/include/policy.h b/security/apparmor/include/policy.h
index b5aa4231af68..f6682a31df23 100644
--- a/security/apparmor/include/policy.h
+++ b/security/apparmor/include/policy.h
@@ -304,6 +304,6 @@ static inline int AUDIT_MODE(struct aa_profile *profile)
 bool policy_view_capable(struct aa_ns *ns);
 bool policy_admin_capable(struct aa_ns *ns);
 int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns,
-			 u32 mask);
+			 const struct cred *ocred, u32 mask);
 
 #endif /* __AA_POLICY_H */
diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index e59bdb750ef0..f2bc865bc7b6 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -671,14 +671,42 @@ bool policy_admin_capable(struct aa_ns *ns)
 	return policy_view_capable(ns) && capable && !aa_g_lock_policy;
 }
 
+static bool is_subset_of_obj_privilege(const struct cred *cred,
+				       struct aa_label *label,
+				       const struct cred *ocred)
+{
+	if (cred == ocred)
+		return true;
+
+	if (!aa_label_is_subset(label, cred_label(ocred)))
+		return false;
+	/* don't allow crossing userns for now */
+	if (cred->user_ns != ocred->user_ns)
+		return false;
+	if (!cap_issubset(cred->cap_inheritable, ocred->cap_inheritable))
+		return false;
+	if (!cap_issubset(cred->cap_permitted, ocred->cap_permitted))
+		return false;
+	if (!cap_issubset(cred->cap_effective, ocred->cap_effective))
+		return false;
+	if (!cap_issubset(cred->cap_bset, ocred->cap_bset))
+		return false;
+	if (!cap_issubset(cred->cap_ambient, ocred->cap_ambient))
+		return false;
+	return true;
+}
+
+
 /**
  * aa_may_manage_policy - can the current task manage policy
  * @label: label to check if it can manage policy
+ * @ocred: object cred if request is coming from an open object
  * @op: the policy manipulation operation being done
  *
  * Returns: 0 if the task is allowed to manipulate policy else error
  */
-int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns, u32 mask)
+int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns,
+			 const struct cred *ocred, u32 mask)
 {
 	const char *op;
 
@@ -694,6 +722,11 @@ int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns, u32 mask)
 		return audit_policy(label, op, NULL, NULL, "policy_locked",
 				    -EACCES);
 
+	if (ocred && !is_subset_of_obj_privilege(current_cred(), label, ocred))
+		return audit_policy(label, op, NULL, NULL,
+				    "not privileged for target profile",
+				    -EACCES);
+
 	if (!policy_admin_capable(ns))
 		return audit_policy(label, op, NULL, NULL, "not policy admin",
 				    -EACCES);
-- 
2.43.7


^ permalink raw reply related

* [PATCH v6.1] apparmor: fix unprivileged local user can do privileged policy management
From: Keerthana K @ 2026-04-02  5:47 UTC (permalink / raw)
  To: stable, gregkh
  Cc: john.johansen, paul, jmorris, serge, georgia.garcia, cengiz.can,
	sashal, apparmor, linux-security-module, linux-kernel, ajay.kaher,
	alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
	tapas.kundu, Qualys Security Advisory, Salvatore Bonaccorso,
	Keerthana K

From: John Johansen <john.johansen@canonical.com>

commit 6601e13e82841879406bf9f369032656f441a425 upstream.

An unprivileged local user can load, replace, and remove profiles by
opening the apparmorfs interfaces, via a confused deputy attack, by
passing the opened fd to a privileged process, and getting the
privileged process to write to the interface.

This does require a privileged target that can be manipulated to do
the write for the unprivileged process, but once such access is
achieved full policy management is possible and all the possible
implications that implies: removing confinement, DoS of system or
target applications by denying all execution, by-passing the
unprivileged user namespace restriction, to exploiting kernel bugs for
a local privilege escalation.

The policy management interface can not have its permissions simply
changed from 0666 to 0600 because non-root processes need to be able
to load policy to different policy namespaces.

Instead ensure the task writing the interface has privileges that
are a subset of the task that opened the interface. This is already
done via policy for confined processes, but unconfined can delegate
access to the opened fd, by-passing the usual policy check.

Fixes: b7fd2c0340eac ("apparmor: add per policy ns .load, .replace, .remove interface files")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Keerthana: aa_may_manage_policy() does not take a subj_cred
parameter (added in 90c436a64a6e, merged in v6.7). Pass current_cred()
directly to is_subset_of_obj_privilege() in place of subj_cred, which
is equivalent since all call sites pass current_cred() as subj_cred.]
Signed-off-by: Keerthana K <keerthana.kalyanasundaram@broadcom.com>
---
 security/apparmor/apparmorfs.c     | 16 ++++++++------
 security/apparmor/include/policy.h |  2 +-
 security/apparmor/policy.c         | 35 +++++++++++++++++++++++++++++-
 3 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index fa518cd82366..fa4a6f20f58e 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -412,7 +412,8 @@ static struct aa_loaddata *aa_simple_write_to_buffer(const char __user *userbuf,
 }
 
 static ssize_t policy_update(u32 mask, const char __user *buf, size_t size,
-			     loff_t *pos, struct aa_ns *ns)
+			     loff_t *pos, struct aa_ns *ns,
+			     const struct cred *ocred)
 {
 	struct aa_loaddata *data;
 	struct aa_label *label;
@@ -423,7 +424,7 @@ static ssize_t policy_update(u32 mask, const char __user *buf, size_t size,
 	/* high level check about policy management - fine grained in
 	 * below after unpack
 	 */
-	error = aa_may_manage_policy(label, ns, mask);
+	error = aa_may_manage_policy(label, ns, ocred, mask);
 	if (error)
 		goto end_section;
 
@@ -444,7 +445,8 @@ static ssize_t profile_load(struct file *f, const char __user *buf, size_t size,
 			    loff_t *pos)
 {
 	struct aa_ns *ns = aa_get_ns(f->f_inode->i_private);
-	int error = policy_update(AA_MAY_LOAD_POLICY, buf, size, pos, ns);
+	int error = policy_update(AA_MAY_LOAD_POLICY, buf, size, pos, ns,
+				  f->f_cred);
 
 	aa_put_ns(ns);
 
@@ -462,7 +464,7 @@ static ssize_t profile_replace(struct file *f, const char __user *buf,
 {
 	struct aa_ns *ns = aa_get_ns(f->f_inode->i_private);
 	int error = policy_update(AA_MAY_LOAD_POLICY | AA_MAY_REPLACE_POLICY,
-				  buf, size, pos, ns);
+				  buf, size, pos, ns, f->f_cred);
 	aa_put_ns(ns);
 
 	return error;
@@ -486,7 +488,7 @@ static ssize_t profile_remove(struct file *f, const char __user *buf,
 	/* high level check about policy management - fine grained in
 	 * below after unpack
 	 */
-	error = aa_may_manage_policy(label, ns, AA_MAY_REMOVE_POLICY);
+	error = aa_may_manage_policy(label, ns, f->f_cred, AA_MAY_REMOVE_POLICY);
 	if (error)
 		goto out;
 
@@ -1808,7 +1810,7 @@ static int ns_mkdir_op(struct user_namespace *mnt_userns, struct inode *dir,
 	int error;
 
 	label = begin_current_label_crit_section();
-	error = aa_may_manage_policy(label, NULL, AA_MAY_LOAD_POLICY);
+	error = aa_may_manage_policy(label, NULL, NULL, AA_MAY_LOAD_POLICY);
 	end_current_label_crit_section(label);
 	if (error)
 		return error;
@@ -1857,7 +1859,7 @@ static int ns_rmdir_op(struct inode *dir, struct dentry *dentry)
 	int error;
 
 	label = begin_current_label_crit_section();
-	error = aa_may_manage_policy(label, NULL, AA_MAY_LOAD_POLICY);
+	error = aa_may_manage_policy(label, NULL, NULL, AA_MAY_LOAD_POLICY);
 	end_current_label_crit_section(label);
 	if (error)
 		return error;
diff --git a/security/apparmor/include/policy.h b/security/apparmor/include/policy.h
index 639b5b248e63..3f776f5e8de4 100644
--- a/security/apparmor/include/policy.h
+++ b/security/apparmor/include/policy.h
@@ -308,7 +308,7 @@ static inline int AUDIT_MODE(struct aa_profile *profile)
 bool aa_policy_view_capable(struct aa_label *label, struct aa_ns *ns);
 bool aa_policy_admin_capable(struct aa_label *label, struct aa_ns *ns);
 int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns,
-			 u32 mask);
+			 const struct cred *ocred, u32 mask);
 bool aa_current_policy_view_capable(struct aa_ns *ns);
 bool aa_current_policy_admin_capable(struct aa_ns *ns);
 
diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index 4ee5a450d118..e7412a221551 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -712,14 +712,42 @@ bool aa_current_policy_admin_capable(struct aa_ns *ns)
 	return res;
 }
 
+static bool is_subset_of_obj_privilege(const struct cred *cred,
+				       struct aa_label *label,
+				       const struct cred *ocred)
+{
+	if (cred == ocred)
+		return true;
+
+	if (!aa_label_is_subset(label, cred_label(ocred)))
+		return false;
+	/* don't allow crossing userns for now */
+	if (cred->user_ns != ocred->user_ns)
+		return false;
+	if (!cap_issubset(cred->cap_inheritable, ocred->cap_inheritable))
+		return false;
+	if (!cap_issubset(cred->cap_permitted, ocred->cap_permitted))
+		return false;
+	if (!cap_issubset(cred->cap_effective, ocred->cap_effective))
+		return false;
+	if (!cap_issubset(cred->cap_bset, ocred->cap_bset))
+		return false;
+	if (!cap_issubset(cred->cap_ambient, ocred->cap_ambient))
+		return false;
+	return true;
+}
+
+
 /**
  * aa_may_manage_policy - can the current task manage policy
  * @label: label to check if it can manage policy
+ * @ocred: object cred if request is coming from an open object
  * @op: the policy manipulation operation being done
  *
  * Returns: 0 if the task is allowed to manipulate policy else error
  */
-int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns, u32 mask)
+int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns,
+			 const struct cred *ocred, u32 mask)
 {
 	const char *op;
 
@@ -735,6 +763,11 @@ int aa_may_manage_policy(struct aa_label *label, struct aa_ns *ns, u32 mask)
 		return audit_policy(label, op, NULL, NULL, "policy_locked",
 				    -EACCES);
 
+	if (ocred && !is_subset_of_obj_privilege(current_cred(), label, ocred))
+		return audit_policy(label, op, NULL, NULL,
+				    "not privileged for target profile",
+				    -EACCES);
+
 	if (!aa_policy_admin_capable(label, ns))
 		return audit_policy(label, op, NULL, NULL, "not policy admin",
 				    -EACCES);
-- 
2.43.7


^ permalink raw reply related

* Re: [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records
From: Mickaël Salaün @ 2026-04-01 18:48 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Günther Noack, Paul Moore, Serge E . Hallyn, Justin Suess,
	Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
	Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-kernel, linux-security-module
In-Reply-To: <20260401.XohYeim7tah8@digikod.net>

On Wed, Apr 01, 2026 at 06:38:34PM +0200, Mickaël Salaün wrote:
> On Wed, Mar 25, 2026 at 01:32:42PM +0100, Christian Brauner wrote:
> > On Thu, Mar 12, 2026 at 11:04:35AM +0100, Mickaël Salaün wrote:
> > > Add a new LSM audit data type LSM_AUDIT_DATA_NS that logs namespace
> > > information in audit records.  Two fields are provided, matching the
> > > field names of struct ns_common:
> > > 
> > > - ns_type: the CLONE_NEW* flag identifying the namespace type, logged in
> > >   hexadecimal.
> > > 
> > > - inum: the proc inode number identifying a specific namespace instance.
> > >   Namespace inode numbers are allocated by proc_alloc_inum() via
> > >   ida_alloc_max() bounded to UINT_MAX, so the value always fits in 32
> > >   bits.
> > > 
> > > A new audit data type is needed because no existing LSM_AUDIT_DATA_*
> > > type carries namespace information.  The closest alternatives (e.g.
> > > LSM_AUDIT_DATA_TASK or LSM_AUDIT_DATA_NONE with custom strings) would
> > > either lose the namespace type or require ad-hoc formatting that
> > > bypasses the structured audit data union.
> > > 
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Günther Noack <gnoack@google.com>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > ---
> > >  include/linux/lsm_audit.h | 5 +++++
> > >  security/lsm_audit.c      | 4 ++++
> > >  2 files changed, 9 insertions(+)
> > > 
> > > diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
> > > index 382c56a97bba..6e20a56b8c22 100644
> > > --- a/include/linux/lsm_audit.h
> > > +++ b/include/linux/lsm_audit.h
> > > @@ -78,6 +78,7 @@ struct common_audit_data {
> > >  #define LSM_AUDIT_DATA_NOTIFICATION 16
> > >  #define LSM_AUDIT_DATA_ANONINODE	17
> > >  #define LSM_AUDIT_DATA_NLMSGTYPE	18
> > > +#define LSM_AUDIT_DATA_NS		19
> > >  	union 	{
> > >  		struct path path;
> > >  		struct dentry *dentry;
> > > @@ -100,6 +101,10 @@ struct common_audit_data {
> > >  		int reason;
> > >  		const char *anonclass;
> > >  		u16 nlmsg_type;
> > > +		struct {
> > > +			u32 ns_type;
> > > +			unsigned int inum;
> > 
> > fwiw, you might want to start the 64-bit namespace id as well.
> > But either way:
> 
> Right now these numbers are generated by ida_alloc_max(), which return
> an int.  Is there an ongoing patch series for this change?

OK, we should not use the inode's number (32-bit) but the namespace ID
(64-bit) which is readable with the NS_GET_ID IOCTL on the namespace
FDs.  I'll use that with ns_id instead of inum.  I'll also update the
Landlock code and tests accordingly.

^ permalink raw reply

* Re: [PATCH v8 03/12] landlock: Replace union access_masks_all with helper functions
From: Mickaël Salaün @ 2026-04-01 17:57 UTC (permalink / raw)
  To: Günther Noack
  Cc: John Johansen, kernel test robot, linux-security-module,
	Tingmao Wang, Justin Suess, Samasth Norway Ananda,
	Matthieu Buffet, Mikhail Ivanov, konstantin.meskhidze,
	Demi Marie Obenour, Alyssa Ross, Jann Horn, Tahera Fahimi,
	Sebastian Andrzej Siewior, Kuniyuki Iwashima, Georgia Garcia
In-Reply-To: <20260330.6229e57c9563@gnoack.org>

On Mon, Mar 30, 2026 at 09:00:31PM +0200, Günther Noack wrote:
> On Mon, Mar 30, 2026 at 12:53:21PM +0200, Mickaël Salaün wrote:
> > On Mon, Mar 30, 2026 at 11:56:40AM +0200, Mickaël Salaün wrote:
> > > On Fri, Mar 27, 2026 at 05:48:28PM +0100, Günther Noack wrote:
> > > > * Stop using a union for access_masks_all.
> > > > * Expose helper functions for intersection checks and union operations.
> > > > 
> > > > The memory layout of bitfields is only loosely defined by the C
> > > > standard, so our static assertion that expects a fixed size was
> > > > brittle, and it broke on some compilers when we attempted to add a
> > > > 17th file system access right.
> > > > 
> > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > Closes: https://lore.kernel.org/oe-kbuild-all/202603261438.jBx2DGNe-lkp@intel.com/
> > > > Signed-off-by: Günther Noack <gnoack3000@gmail.com>
> > > > ---
> > > >  security/landlock/access.h  | 21 ++++++++++++++-------
> > > >  security/landlock/cred.h    | 10 ++--------
> > > >  security/landlock/ruleset.h | 13 ++++---------
> > > >  3 files changed, 20 insertions(+), 24 deletions(-)
> > > 
> > > I'd prefer this approach:
> > > 
> > > diff --git a/security/landlock/access.h b/security/landlock/access.h
> > > index 89dc8e7b93da..bc9efbb5c900 100644
> > > --- a/security/landlock/access.h
> > > +++ b/security/landlock/access.h
> > > @@ -50,7 +50,7 @@ struct access_masks {
> > >         access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
> > >         access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
> > >         access_mask_t scope : LANDLOCK_NUM_SCOPE;
> > > -};
> > > +} __packed;
> > 
> > Actually, we can just use '__packed __aligned(sizeof(u32))' and avoid
> > the static_assert change.  That would have no impact on x86, but pack it
> > on m68k.
> 
> Thanks, good catch (and thanks for pushing it to mic-next).
> Fingers crossed that this works on m68k.

So, this works!  I did some experiments with m68k and this architecture
is very special: it packs bitfields at byte granularity, not at
storage-unit granularity, except when the size of a bitfield is a
multiple of 8, in which case it aligns on this size.

I also look at the past versions of Landlock (in the stable branches),
and they are good because struct access_masks (and the related assert)
was introduced in v6.11 and fs was exactly 16 bits, which makes m68k
aligns on 2 bytes and then the size of the struct was 4 bytes.
Switching fs to 17 bits removes this optimization (I guess) and pack
(back) to 3 bytes, so recording more bits can take less space!

That's why we don't need a standalone fix to backport...

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox