Linux Trace Kernel
 help / color / mirror / Atom feed
* [RESEND PATCH v16 0/5] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu (Google) @ 2026-04-07  1:12 UTC (permalink / raw)
  To: Steven Rostedt, Catalin Marinas, Will Deacon
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, Ian Rogers, linux-arm-kernel

[Resend this series with base-commit tag so that bot can apply this correctly]

Hi,

Here is the 16th version of improvement patches for making persistent
ring buffers robust to failures.
The previous version is here:

https://lore.kernel.org/all/177494615421.71933.3679132057004156013.stgit@mhiramat.tok.corp.google.com/

This version adds Catalin's Ack [1/5] and update description and
document[4/5][5/5]. Also, rebased on ring-buffer/for-next.

Thank you,

Masami Hiramatsu (Google) (5):
      ring-buffer: Flush and stop persistent ring buffer on panic
      ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
      ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
      ring-buffer: Add persistent ring buffer invalid-page inject test
      ring-buffer: Show commit numbers in buffer_meta file


 arch/alpha/include/asm/Kbuild        |    1 
 arch/arc/include/asm/Kbuild          |    1 
 arch/arm/include/asm/Kbuild          |    1 
 arch/arm64/include/asm/ring_buffer.h |   10 +
 arch/csky/include/asm/Kbuild         |    1 
 arch/hexagon/include/asm/Kbuild      |    1 
 arch/loongarch/include/asm/Kbuild    |    1 
 arch/m68k/include/asm/Kbuild         |    1 
 arch/microblaze/include/asm/Kbuild   |    1 
 arch/mips/include/asm/Kbuild         |    1 
 arch/nios2/include/asm/Kbuild        |    1 
 arch/openrisc/include/asm/Kbuild     |    1 
 arch/parisc/include/asm/Kbuild       |    1 
 arch/powerpc/include/asm/Kbuild      |    1 
 arch/riscv/include/asm/Kbuild        |    1 
 arch/s390/include/asm/Kbuild         |    1 
 arch/sh/include/asm/Kbuild           |    1 
 arch/sparc/include/asm/Kbuild        |    1 
 arch/um/include/asm/Kbuild           |    1 
 arch/x86/include/asm/Kbuild          |    1 
 arch/xtensa/include/asm/Kbuild       |    1 
 include/asm-generic/ring_buffer.h    |   13 ++
 include/linux/ring_buffer.h          |    1 
 kernel/trace/Kconfig                 |   34 ++++
 kernel/trace/ring_buffer.c           |  258 ++++++++++++++++++++++++++--------
 kernel/trace/trace.c                 |    4 +
 26 files changed, 276 insertions(+), 64 deletions(-)
 create mode 100644 arch/arm64/include/asm/ring_buffer.h
 create mode 100644 include/asm-generic/ring_buffer.h


base-commit: 3515572dd068895ffd241b8a69399a0ebfac7593
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH 3/4] livepatch: Add "replaceable" attribute to klp_patch
From: Joe Lawrence @ 2026-04-06 21:12 UTC (permalink / raw)
  To: Song Liu
  Cc: Yafang Shao, Dylan Hatch, jpoimboe, jikos, mbenes, pmladek,
	rostedt, mhiramat, mathieu.desnoyers, kpsingh, mattbobrowski,
	jolsa, ast, daniel, andrii, martin.lau, eddyz87, memxor,
	yonghong.song, live-patching, linux-kernel, linux-trace-kernel,
	bpf
In-Reply-To: <CAPhsuW4B00-grg9XJa+AO3xgGwM_u8FC+GH3JrkYZOJx4PuV8Q@mail.gmail.com>

On Mon, Apr 06, 2026 at 11:11:27AM -0700, Song Liu wrote:
> On Mon, Apr 6, 2026 at 4:08 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Sat, Apr 4, 2026 at 5:36 AM Song Liu <song@kernel.org> wrote:
> > >
> > > On Fri, Apr 3, 2026 at 1:55 PM Dylan Hatch <dylanbhatch@google.com> wrote:
> > > [...]
> > > > > IIRC, the use case for this change is when multiple users load various
> > > > > livepatch modules on the same system. I still don't believe this is the
> > > > > right way to manage livepatches. That said, I won't really NACK this
> > > > > if other folks think this is a useful option.
> > > >
> > > > In our production fleet, we apply exactly one cumulative livepatch
> > > > module, and we use per-kernel build "livepatch release" branches to
> > > > track the contents of these cumulative livepatches. This model has
> > > > worked relatively well for us, but there are some painpoints.
> > > >
> > > > We are often under pressure to selectively deploy a livepatch fix to
> > > > certain subpopulations of production. If the subpopulation is running
> > > > the same build of everything else, this would require us to introduce
> > > > another branching factor to the "livepatch release" branches --
> > > > something we do not support due to the added toil and complexity.
> > > >
> > > > However, if we had the ability to build "off-band" livepatch modules
> > > > that were marked as non-replaceable, we could support these selective
> > > > patches without the additional branching factor. I will have to
> > > > circulate the idea internally, but to me this seems like a very useful
> > > > option to have in certain cases.
> > >
> > >  IIUC, the plan is:
> > >
> > > - The regular livepatches are cumulative, have the replace flag; and
> > >   are replaceable.
> > > - The occasional "off-band" livepatches do not have the replace flag,
> > >   and are not replaceable.
> > >
> > > With this setup, for systems with off-band livepatches loaded, we can
> > > still release a cumulative livepatch to replace the previous cumulative
> > > livepatch. Is this the expected use case?
> >
> > That matches our expected use case.
> 
> If we really want to serve use cases like this, I think we can introduce
> some replace tag concept: Each livepatch will have a tag, u32 number.
> Newly loaded livepatch will only replace existing livepatch with the
> same tag. We can even reuse the existing "bool replace" in klp_patch,
> and make it u32: replace=0 means no replace; replace > 0 are the
> replace tag.
> 
> For current users of cumulative patches, all the livepatch will have the
> same tag, say 1. For your use case, you can assign each user a
> unique tag. Then all these users can do atomic upgrades of their
> own livepatches.
> 
> We may also need to check whether two livepatches of different tags
> touch the same kernel function. When that happens, the later
> livepatch should fail to load.
> 
> Does this make sense?
> 

I haven't been following the thread carefully, but could the Livepatch
system state API (see Documentation/livepatch/system-state.rst) be
leveraged somehow instead of adding further replace semantics?

--
Joe


^ permalink raw reply

* Re: [RFC PATCH bpf-next v5 1/2] tracing: Prefer vmlinux symbols over module symbols for unqualified kprobes
From: bot+bpf-ci @ 2026-04-06 20:15 UTC (permalink / raw)
  To: andrey.grodzovsky, bpf, linux-trace-kernel
  Cc: ast, daniel, andrii, jolsa, rostedt, mhiramat, ihor.solodrai,
	emil, linux-open-source, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260406193158.754498-2-andrey.grodzovsky@crowdstrike.com>

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]

> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
> Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev>

This commit fixes a bug introduced when module symbol counting was
added to number_of_same_symbols(). Would it be worth adding a
Fixes: tag?

Suggested:
Fixes: 926fe783c8a6 ("tracing/kprobes: Fix symbol counting logic by looking at modules as well")

That commit added module_kallsyms_on_each_symbol() to
number_of_same_symbols(), which caused unqualified kprobe targets
matching both vmlinux and a module to return count > 1 and fail
with -EADDRNOTAVAIL.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24047967861

^ permalink raw reply

* Re: [PATCH bpf v3 2/2] selftests/bpf: Add test to ensure kprobe_multi is not sleepable
From: Kumar Kartikeya Dwivedi @ 2026-04-06 20:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Varun R Mallya, bpf, ast, daniel, yonghong.song, rostedt,
	mhiramat, linux-kernel, linux-trace-kernel
In-Reply-To: <ac47BIEUBBkTch31@krava>

On Thu, 2 Apr 2026 at 11:46, Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Apr 02, 2026 at 12:50:10AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 1 Apr 2026 at 21:11, Varun R Mallya <varunrmallya@gmail.com> wrote:
> > >
> > > Add a selftest to ensure that kprobe_multi programs cannot be attached
> > > using the BPF_F_SLEEPABLE flag. This test succeeds when the kernel
> > > rejects attachment of kprobe_multi when the BPF_F_SLEEPABLE flag is set.
> > >
> > > Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
> > > ---
> >
> > Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> >
> > >  .../bpf/prog_tests/kprobe_multi_test.c        | 41 +++++++++++++++++++
> > >  .../bpf/progs/kprobe_multi_sleepable.c        | 13 ++++++
> > >  2 files changed, 54 insertions(+)
> > >  create mode 100644 tools/testing/selftests/bpf/progs/kprobe_multi_sleepable.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
> > > index 78c974d4ea33..f02fec2b6fda 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
> > > @@ -10,6 +10,7 @@
> > >  #include "kprobe_multi_session_cookie.skel.h"
> > >  #include "kprobe_multi_verifier.skel.h"
> > >  #include "kprobe_write_ctx.skel.h"
> > > +#include "kprobe_multi_sleepable.skel.h"
> > >  #include "bpf/libbpf_internal.h"
> > >  #include "bpf/hashmap.h"
> > >
> > > @@ -633,6 +634,44 @@ static void test_attach_write_ctx(void)
> > >  }
> > >  #endif
> > >
> > > +static void test_attach_multi_sleepable(void)
> > > +{
> > > +       struct kprobe_multi_sleepable *skel;
> > > +       int err;
> > > +
> > > +       skel = kprobe_multi_sleepable__open();
> > > +       if (!ASSERT_OK_PTR(skel, "kprobe_multi_sleepable__open"))
> > > +               return;
> > > +
> > > +       err = bpf_program__set_flags(skel->progs.handle_kprobe_multi_sleepable,
> > > +                                    BPF_F_SLEEPABLE);
> > > +       if (!ASSERT_OK(err, "bpf_program__set_flags"))
> > > +               goto cleanup;
> > > +
> > > +       /* Load should succeed even with BPF_F_SLEEPABLE for KPROBE types */
> > > +       err = kprobe_multi_sleepable__load(skel);
> > > +       if (!ASSERT_OK(err, "kprobe_multi_sleepable__load"))
> > > +               goto cleanup;
> > > +
> > > +       /* Attachment must fail for kprobe.multi + BPF_F_SLEEPABLE.
> > > +        * Also chosen a stable symbol to send into opts
> > > +        */
> > > +       LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
> > > +       const char *sym = "vfs_read";
> > > +
> > > +       opts.syms = &sym;
> > > +       opts.cnt = 1;
> > > +
> > > +       skel->links.handle_kprobe_multi_sleepable =
> > > +               bpf_program__attach_kprobe_multi_opts(skel->progs.handle_kprobe_multi_sleepable,
> > > +                                                     NULL, &opts);
> > > +       ASSERT_ERR_PTR(skel->links.handle_kprobe_multi_sleepable,
> > > +                      "bpf_program__attach_kprobe_multi_opts");
> >
> > Nit: While vfs_read will likely remain stable, the check could
> > probably be stronger to distinguish an attach error from -EINVAL?
> > I added a typo to vfs_read and it still passed, because it failed to
> > attach instead of getting rejected on unfixed kernel.
> > May not be a big deal since vfs_read is unlikely to break.
> > I verified it works by adding bpf_copy_from_user to the program and
> > attaching to SYS_PREFIX sys_getpid and invoking the splat though, so
> > LGTM otherwise.
>
> why not use bpf_fentry_test2 ? you could also put it in pattern argument
> and bypass opts completely (up to you)
>
> also there's test_attach_api_fails test, please move it over there
>

Varun, the selftest is still not applied, only the fix. Please follow
up and target bpf-next tree this time.
Thanks.

^ permalink raw reply

* [RFC PATCH bpf-next v5 0/2] tracing: Fix kprobe attachment when module shadows vmlinux symbol
From: Andrey Grodzovsky @ 2026-04-06 19:31 UTC (permalink / raw)
  To: bpf, linux-trace-kernel
  Cc: ast, daniel, andrii, jolsa, rostedt, mhiramat, ihor.solodrai,
	emil, linux-open-source

When a kernel module exports a symbol with the same name as an existing
vmlinux symbol, kprobe attachment fails with -EADDRNOTAVAIL because
number_of_same_symbols() counts matches across both vmlinux and all
loaded modules, returning a count greater than 1.

This series takes a different approach from v1-v4, which implemented a
libbpf-side fallback parsing /proc/kallsyms and retrying with the
absolute address. That approach was rejected (Andrii Nakryiko, Ihor
Solodrai) because ambiguous symbol resolution does not belong in libbpf,
and because it did not cover the kprobe_multi path.

Following Ihor's suggestion, this series fixes the root cause in the
kernel: when an unqualified symbol name is given and the symbol is found
in vmlinux, prefer the vmlinux symbol and do not scan loaded modules.
This makes the skeleton auto-attach path work transparently with no
libbpf changes needed.

Patch 1: Kernel fix - return vmlinux-only count from
         number_of_same_symbols() when the symbol is found in vmlinux,
         preventing module shadows from causing -EADDRNOTAVAIL.
Patch 2: Selftests with bpf_testmod_dup_sym.ko test module validating
         kprobe attachment across all four attach modes with a duplicate
         symbol present. Unchaged from V4.

Changes since v4 [1]:
  - Completely rework the approach: move fix from libbpf to the kernel
    (number_of_same_symbols() in trace_kprobe.c) as suggested by Ihor
    Solodrai. No libbpf changes needed.
  - When mod==NULL and vmlinux contains the symbol (count > 0), return
    the vmlinux-only count immediately, skipping module scan entirely.
  - Preserves all existing semantics: MOD:SYM qualification unchanged,
    module-only symbols unchanged, vmlinux-ambiguous symbols unchanged.

[1] https://lore.kernel.org/bpf/20260302210532.381083-1-andrey.grodzovsky@crowdstrike.com/

Andrey Grodzovsky (2):
  tracing: Prefer vmlinux symbols over module symbols for unqualified
    kprobes
  selftests/bpf: Add tests for duplicate kprobe symbol handling

 kernel/trace/trace_kprobe.c                   |  7 +++
 tools/testing/selftests/bpf/Makefile          |  2 +-
 .../selftests/bpf/prog_tests/attach_probe.c   | 63 +++++++++++++++++++
 .../testing/selftests/bpf/test_kmods/Makefile |  2 +-
 .../bpf/test_kmods/bpf_testmod_dup_sym.c      | 48 ++++++++++++++
 5 files changed, 120 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_kmods/bpf_testmod_dup_sym.c

-- 
2.34.1


^ permalink raw reply

* Re: [syzbot] [block?] [trace?] INFO: task hung in blk_trace_startstop
From: syzbot @ 2026-04-06 19:55 UTC (permalink / raw)
  To: axboe, linux-block, linux-kernel, linux-trace-kernel,
	mathieu.desnoyers, mhiramat, rostedt, syzkaller-bugs
In-Reply-To: <691367ae.a70a0220.22f260.0141.GAE@google.com>

syzbot has found a reproducer for the following issue on:

HEAD commit:    591cd656a1bf Linux 7.0-rc7
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=129136ba580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6754c86e8d9e4c91
dashboard link: https://syzkaller.appspot.com/bug?extid=774863666ef5b025c9d0
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1268ad4e580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1580b3da580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/6382829d7cc5/disk-591cd656.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/17a325d524d5/vmlinux-591cd656.xz
kernel image: https://storage.googleapis.com/syzbot-assets/0a06ea295210/bzImage-591cd656.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+774863666ef5b025c9d0@syzkaller.appspotmail.com

INFO: task syz.2.19:6128 blocked for more than 143 seconds.
      Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.2.19        state:D stack:27960 pid:6128  tgid:6125  ppid:5955   task_flags:0x400040 flags:0x00080002
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5298 [inline]
 __schedule+0x15dd/0x52d0 kernel/sched/core.c:6911
 __schedule_loop kernel/sched/core.c:6993 [inline]
 schedule+0x164/0x360 kernel/sched/core.c:7008
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7065
 __mutex_lock_common kernel/locking/mutex.c:692 [inline]
 __mutex_lock+0x7fe/0x1300 kernel/locking/mutex.c:776
 blk_debugfs_lock_nomemsave block/blk.h:740 [inline]
 blk_trace_startstop+0x8f/0x610 kernel/trace/blktrace.c:903
 blk_trace_ioctl+0x314/0x920 kernel/trace/blktrace.c:949
 blkdev_common_ioctl+0x13a7/0x3250 block/ioctl.c:724
 blkdev_ioctl+0x528/0x740 block/ioctl.c:798
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7f6ab9c819
RSP: 002b:00007f7f6baad028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f7f6ae16090 RCX: 00007f7f6ab9c819
RDX: 0000000000000000 RSI: 0000000000001274 RDI: 0000000000000003
RBP: 00007f7f6ac32c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply

* [RFC PATCH bpf-next v5 2/2] selftests/bpf: Add tests for duplicate kprobe symbol handling
From: Andrey Grodzovsky @ 2026-04-06 19:31 UTC (permalink / raw)
  To: bpf, linux-trace-kernel
  Cc: ast, daniel, andrii, jolsa, rostedt, mhiramat, ihor.solodrai,
	emil, linux-open-source
In-Reply-To: <20260406193158.754498-1-andrey.grodzovsky@crowdstrike.com>

Add bpf_testmod_dup_sym.ko test module that creates a duplicate
nanosleep symbol to test kprobe attachment when a module exports
a symbol with the same name as a vmlinux symbol.

Add test_attach_probe_dup_sym() to attach_probe tests that loads
the duplicate symbol module and validates kprobe attachment succeeds
across all four attach modes: default, legacy, perf_event_open, and
link — relying on the kernel fix to vmlinux-prefer unqualified symbol
resolution.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
---
 tools/testing/selftests/bpf/Makefile          |  2 +-
 .../selftests/bpf/prog_tests/attach_probe.c   | 63 +++++++++++++++++++
 .../testing/selftests/bpf/test_kmods/Makefile |  2 +-
 .../bpf/test_kmods/bpf_testmod_dup_sym.c      | 48 ++++++++++++++
 4 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_kmods/bpf_testmod_dup_sym.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index f75c4f52c028..cceb3fcc97a2 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -121,7 +121,7 @@ TEST_PROGS_EXTENDED := \
 	ima_setup.sh verify_sig_setup.sh
 
 TEST_KMODS := bpf_testmod.ko bpf_test_no_cfi.ko bpf_test_modorder_x.ko \
-	bpf_test_modorder_y.ko bpf_test_rqspinlock.ko
+	bpf_test_modorder_y.ko bpf_test_rqspinlock.ko bpf_testmod_dup_sym.ko
 TEST_KMOD_TARGETS = $(addprefix $(OUTPUT)/,$(TEST_KMODS))
 
 # Compile but not part of 'make run_tests'
diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index 12a841afda68..04b177ee3adf 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -4,6 +4,7 @@
 #include "test_attach_probe_manual.skel.h"
 #include "test_attach_probe.skel.h"
 #include "kprobe_write_ctx.skel.h"
+#include "testing_helpers.h"
 
 /* this is how USDT semaphore is actually defined, except volatile modifier */
 volatile unsigned short uprobe_ref_ctr __attribute__((unused)) __attribute((section(".probes")));
@@ -197,6 +198,59 @@ static void test_attach_kprobe_legacy_by_addr_reject(void)
 	test_attach_probe_manual__destroy(skel);
 }
 
+/* Test kprobe attachment with duplicate symbols.
+ * This test loads bpf_testmod_dup_sym.ko which creates a duplicate
+ * __x64_sys_nanosleep symbol. The kernel fix should prefer the vmlinux
+ * symbol over the module symbol when attaching kprobes.
+ */
+static void test_attach_probe_dup_sym(enum probe_attach_mode attach_mode)
+{
+	DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, kprobe_opts);
+	struct bpf_link *kprobe_link, *kretprobe_link;
+	struct test_attach_probe_manual *skel;
+	int err;
+
+	/* Load module with duplicate symbol */
+	err = load_module("bpf_testmod_dup_sym.ko", false);
+	if (!ASSERT_OK(err, "load_bpf_testmod_dup_sym")) {
+		test__skip();
+		return;
+	}
+
+	skel = test_attach_probe_manual__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_dup_sym_open_and_load"))
+		goto unload_module;
+
+	/* manual-attach kprobe/kretprobe with duplicate symbol present */
+	kprobe_opts.attach_mode = attach_mode;
+	kprobe_opts.retprobe = false;
+	kprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kprobe,
+						      SYS_NANOSLEEP_KPROBE_NAME,
+						      &kprobe_opts);
+	if (!ASSERT_OK_PTR(kprobe_link, "attach_kprobe_dup_sym"))
+		goto cleanup;
+	skel->links.handle_kprobe = kprobe_link;
+
+	kprobe_opts.retprobe = true;
+	kretprobe_link = bpf_program__attach_kprobe_opts(skel->progs.handle_kretprobe,
+							 SYS_NANOSLEEP_KPROBE_NAME,
+							 &kprobe_opts);
+	if (!ASSERT_OK_PTR(kretprobe_link, "attach_kretprobe_dup_sym"))
+		goto cleanup;
+	skel->links.handle_kretprobe = kretprobe_link;
+
+	/* trigger & validate kprobe && kretprobe */
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->kprobe_res, 1, "check_kprobe_dup_sym_res");
+	ASSERT_EQ(skel->bss->kretprobe_res, 2, "check_kretprobe_dup_sym_res");
+
+cleanup:
+	test_attach_probe_manual__destroy(skel);
+unload_module:
+	unload_module("bpf_testmod_dup_sym", false);
+}
+
 /* attach uprobe/uretprobe long event name testings */
 static void test_attach_uprobe_long_event_name(void)
 {
@@ -559,6 +613,15 @@ void test_attach_probe(void)
 	if (test__start_subtest("kprobe-legacy-by-addr-reject"))
 		test_attach_kprobe_legacy_by_addr_reject();
 
+	if (test__start_subtest("dup-sym-default"))
+		test_attach_probe_dup_sym(PROBE_ATTACH_MODE_DEFAULT);
+	if (test__start_subtest("dup-sym-legacy"))
+		test_attach_probe_dup_sym(PROBE_ATTACH_MODE_LEGACY);
+	if (test__start_subtest("dup-sym-perf"))
+		test_attach_probe_dup_sym(PROBE_ATTACH_MODE_PERF);
+	if (test__start_subtest("dup-sym-link"))
+		test_attach_probe_dup_sym(PROBE_ATTACH_MODE_LINK);
+
 	if (test__start_subtest("auto"))
 		test_attach_probe_auto(skel);
 	if (test__start_subtest("kprobe-sleepable"))
diff --git a/tools/testing/selftests/bpf/test_kmods/Makefile b/tools/testing/selftests/bpf/test_kmods/Makefile
index 63c4d3f6a12f..938c462a103b 100644
--- a/tools/testing/selftests/bpf/test_kmods/Makefile
+++ b/tools/testing/selftests/bpf/test_kmods/Makefile
@@ -8,7 +8,7 @@ Q = @
 endif
 
 MODULES = bpf_testmod.ko bpf_test_no_cfi.ko bpf_test_modorder_x.ko \
-	bpf_test_modorder_y.ko bpf_test_rqspinlock.ko
+	bpf_test_modorder_y.ko bpf_test_rqspinlock.ko bpf_testmod_dup_sym.ko
 
 $(foreach m,$(MODULES),$(eval obj-m += $(m:.ko=.o)))
 
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod_dup_sym.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_dup_sym.c
new file mode 100644
index 000000000000..0e12f68afe3a
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod_dup_sym.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 CrowdStrike */
+/* Test module for duplicate kprobe symbol handling */
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+
+/* Duplicate symbol to test kprobe attachment with duplicate symbols.
+ * This creates a duplicate of the syscall wrapper used in attach_probe tests.
+ * The libbpf fix should handle this by preferring the vmlinux symbol.
+ * This function should NEVER be called - kprobes should attach to vmlinux version.
+ */
+#ifdef __x86_64__
+int __x64_sys_nanosleep(void);
+noinline int __x64_sys_nanosleep(void)
+#elif defined(__s390x__)
+int __s390x_sys_nanosleep(void);
+noinline int __s390x_sys_nanosleep(void)
+#elif defined(__aarch64__)
+int __arm64_sys_nanosleep(void);
+noinline int __arm64_sys_nanosleep(void)
+#elif defined(__riscv)
+int __riscv_sys_nanosleep(void);
+noinline int __riscv_sys_nanosleep(void)
+#else
+int sys_nanosleep(void);
+noinline int sys_nanosleep(void)
+#endif
+{
+	WARN_ONCE(1, "bpf_testmod_dup_sym: dummy nanosleep symbol called - this should never execute!\n");
+	return -EINVAL;
+}
+
+static int __init bpf_testmod_dup_sym_init(void)
+{
+	return 0;
+}
+
+static void __exit bpf_testmod_dup_sym_exit(void)
+{
+}
+
+module_init(bpf_testmod_dup_sym_init);
+module_exit(bpf_testmod_dup_sym_exit);
+
+MODULE_AUTHOR("Andrey Grodzovsky");
+MODULE_DESCRIPTION("BPF selftest duplicate symbol module");
+MODULE_LICENSE("GPL");
-- 
2.34.1


^ permalink raw reply related

* [RFC PATCH bpf-next v5 1/2] tracing: Prefer vmlinux symbols over module symbols for unqualified kprobes
From: Andrey Grodzovsky @ 2026-04-06 19:31 UTC (permalink / raw)
  To: bpf, linux-trace-kernel
  Cc: ast, daniel, andrii, jolsa, rostedt, mhiramat, ihor.solodrai,
	emil, linux-open-source
In-Reply-To: <20260406193158.754498-1-andrey.grodzovsky@crowdstrike.com>

When an unqualified kprobe target exists in both vmlinux and a loaded
module, number_of_same_symbols() returns a count greater than 1,
causing kprobe attachment to fail with -EADDRNOTAVAIL even though the
vmlinux symbol is unambiguous.

When no module qualifier is given and the symbol is found in vmlinux,
return the vmlinux-only count without scanning loaded modules. This
preserves the existing behavior for all other cases:
- Symbol only in a module: vmlinux count is 0, falls through to module
  scan as before.
- Symbol qualified with MOD:SYM: mod != NULL, unchanged path.
- Symbol ambiguous within vmlinux itself: count > 1 is returned as-is.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
---
 kernel/trace/trace_kprobe.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index a5dbb72528e0..99c41ea8b6d7 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -765,6 +765,13 @@ static unsigned int number_of_same_symbols(const char *mod, const char *func_nam
 	if (!mod)
 		kallsyms_on_each_match_symbol(count_symbols, func_name, &ctx.count);
 
+	/* If the symbol is found in vmlinux, use vmlinux resolution only.
+	 * This prevents module symbols from shadowing vmlinux symbols
+	 * and causing -EADDRNOTAVAIL for unqualified kprobe targets.
+	 */
+	if (!mod && ctx.count > 0)
+		return ctx.count;
+
 	module_kallsyms_on_each_symbol(mod, count_mod_symbols, &ctx);
 
 	return ctx.count;
-- 
2.34.1


^ permalink raw reply related

* Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions
From: Song Liu @ 2026-04-06 18:26 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, pmladek, joe.lawrence, rostedt, mhiramat,
	mathieu.desnoyers, kpsingh, mattbobrowski, jolsa, ast, daniel,
	andrii, martin.lau, eddyz87, memxor, yonghong.song, live-patching,
	linux-kernel, linux-trace-kernel, bpf
In-Reply-To: <CALOAHbDG8=eUV53kF+xn=izs2rpydCk=a9RznU-EEOzmkB8mQg@mail.gmail.com>

On Mon, Apr 6, 2026 at 3:55 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Apr 4, 2026 at 12:07 AM Song Liu <song@kernel.org> wrote:
> >
> > Hi Yafang,
> >
> > On Thu, Apr 2, 2026 at 2:26 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > Livepatching allows for rapid experimentation with new kernel features
> > > without interrupting production workloads. However, static livepatches lack
> > > the flexibility required to tune features based on task-specific attributes,
> > > such as cgroup membership, which is critical in multi-tenant k8s
> > > environments. Furthermore, hardcoding logic into a livepatch prevents
> > > dynamic adjustments based on the runtime environment.
> > >
> > > To address this, we propose a hybrid approach using BPF. Our production use
> > > case involves:
> > >
> > > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> > >
> > > 2. Utilizing bpf_override_return() to dynamically modify the return value
> > >    of that hook based on the current task's context.
> >
> > Could you please provide a specific use case that can benefit from this?
> > AFAICT, livepatch is more flexible but risky (may cause crash); while
> > BPF is safe, but less flexible. The combination you are proposing seems
> > to get the worse of the two sides. Maybe it can indeed get the benefit of
> > both sides in some cases, but I cannot think of such examples.
> >
>
> Here is an example we recently deployed on our production servers:
>
>   https://lore.kernel.org/bpf/CALOAHbDnNba_w_nWH3-S9GAXw0+VKuLTh1gy5hy9Yqgeo4C0iA@mail.gmail.com/
>
> In one of our specific clusters, we needed to send BGP traffic out
> through specific NICs based on the destination IP. To achieve this
> without interrupting service, we live-patched
> bond_xmit_3ad_xor_slave_get(), added a new hook called
> bond_get_slave_hook(), and then ran a BPF program attached to that
> hook to select the outgoing NIC from the SKB. This allowed us to
> rapidly deploy the feature with zero downtime.

I guess the idea here is: keep the risk part simple, and implement
it in module/livepatch, then use BPF for the flexible and programmable
part safe.

Can we use struct_ops instead of bpf_override_return for this case?
This should make the solution more flexible.

Thanks,
Song

^ permalink raw reply

* Re: [RFC PATCH 3/4] livepatch: Add "replaceable" attribute to klp_patch
From: Song Liu @ 2026-04-06 18:11 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Dylan Hatch, jpoimboe, jikos, mbenes, pmladek, joe.lawrence,
	rostedt, mhiramat, mathieu.desnoyers, kpsingh, mattbobrowski,
	jolsa, ast, daniel, andrii, martin.lau, eddyz87, memxor,
	yonghong.song, live-patching, linux-kernel, linux-trace-kernel,
	bpf
In-Reply-To: <CALOAHbCbcw2jpjk9JD9yyf+SMpQ-s9FAonSaz7Gs4XUeP+w+2g@mail.gmail.com>

On Mon, Apr 6, 2026 at 4:08 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Apr 4, 2026 at 5:36 AM Song Liu <song@kernel.org> wrote:
> >
> > On Fri, Apr 3, 2026 at 1:55 PM Dylan Hatch <dylanbhatch@google.com> wrote:
> > [...]
> > > > IIRC, the use case for this change is when multiple users load various
> > > > livepatch modules on the same system. I still don't believe this is the
> > > > right way to manage livepatches. That said, I won't really NACK this
> > > > if other folks think this is a useful option.
> > >
> > > In our production fleet, we apply exactly one cumulative livepatch
> > > module, and we use per-kernel build "livepatch release" branches to
> > > track the contents of these cumulative livepatches. This model has
> > > worked relatively well for us, but there are some painpoints.
> > >
> > > We are often under pressure to selectively deploy a livepatch fix to
> > > certain subpopulations of production. If the subpopulation is running
> > > the same build of everything else, this would require us to introduce
> > > another branching factor to the "livepatch release" branches --
> > > something we do not support due to the added toil and complexity.
> > >
> > > However, if we had the ability to build "off-band" livepatch modules
> > > that were marked as non-replaceable, we could support these selective
> > > patches without the additional branching factor. I will have to
> > > circulate the idea internally, but to me this seems like a very useful
> > > option to have in certain cases.
> >
> >  IIUC, the plan is:
> >
> > - The regular livepatches are cumulative, have the replace flag; and
> >   are replaceable.
> > - The occasional "off-band" livepatches do not have the replace flag,
> >   and are not replaceable.
> >
> > With this setup, for systems with off-band livepatches loaded, we can
> > still release a cumulative livepatch to replace the previous cumulative
> > livepatch. Is this the expected use case?
>
> That matches our expected use case.

If we really want to serve use cases like this, I think we can introduce
some replace tag concept: Each livepatch will have a tag, u32 number.
Newly loaded livepatch will only replace existing livepatch with the
same tag. We can even reuse the existing "bool replace" in klp_patch,
and make it u32: replace=0 means no replace; replace > 0 are the
replace tag.

For current users of cumulative patches, all the livepatch will have the
same tag, say 1. For your use case, you can assign each user a
unique tag. Then all these users can do atomic upgrades of their
own livepatches.

We may also need to check whether two livepatches of different tags
touch the same kernel function. When that happens, the later
livepatch should fail to load.

Does this make sense?

Thanks,
Song

^ permalink raw reply

* [PATCH] tracing: preserve module tracepoint strings
From: Cao Ruichuang @ 2026-04-06 17:09 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, mathieu.desnoyers, mcgrof, petr.pavlu, da.gomez,
	samitolvanen, atomlin, linux-kernel, linux-trace-kernel,
	linux-modules

tracepoint_string() is documented as exporting constant strings
through printk_formats, including when it is used from modules.
That currently does not work.

A small test module that calls
tracepoint_string("tracepoint_string_test_module_string") loads
successfully and gets a pointer back, but the string never appears
in /sys/kernel/tracing/printk_formats. The loader only collects
__trace_printk_fmt from modules and ignores __tracepoint_str.

Collect module __tracepoint_str entries too, copy them to stable
tracing-managed storage like module trace_printk formats, and let
trace_is_tracepoint_string() recognize those copied strings. This
makes module tracepoint strings visible through printk_formats and
keeps them accepted by the trace string safety checks.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217196
Signed-off-by: Cao Ruichuang <create0818@163.com>
---
 include/linux/module.h      |  2 ++
 kernel/module/main.c        |  4 +++
 kernel/trace/trace_printk.c | 63 ++++++++++++++++++++++++++++---------
 3 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 14f391b18..e475466a7 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -515,6 +515,8 @@ struct module {
 #ifdef CONFIG_TRACING
 	unsigned int num_trace_bprintk_fmt;
 	const char **trace_bprintk_fmt_start;
+	unsigned int num_tracepoint_strings;
+	const char **tracepoint_strings_start;
 #endif
 #ifdef CONFIG_EVENT_TRACING
 	struct trace_event_call **trace_events;
diff --git a/kernel/module/main.c b/kernel/module/main.c
index c3ce106c7..d7d890138 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2672,6 +2672,10 @@ static int find_module_sections(struct module *mod, struct load_info *info)
 	mod->trace_bprintk_fmt_start = section_objs(info, "__trace_printk_fmt",
 					 sizeof(*mod->trace_bprintk_fmt_start),
 					 &mod->num_trace_bprintk_fmt);
+	mod->tracepoint_strings_start =
+		section_objs(info, "__tracepoint_str",
+			     sizeof(*mod->tracepoint_strings_start),
+			     &mod->num_tracepoint_strings);
 #endif
 #ifdef CONFIG_DYNAMIC_FTRACE
 	/* sechdrs[0].sh_size is always zero */
diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
index 5ea5e0d76..9f67ce42e 100644
--- a/kernel/trace/trace_printk.c
+++ b/kernel/trace/trace_printk.c
@@ -22,8 +22,9 @@
 #ifdef CONFIG_MODULES
 
 /*
- * modules trace_printk()'s formats are autosaved in struct trace_bprintk_fmt
- * which are queued on trace_bprintk_fmt_list.
+ * modules trace_printk() formats and tracepoint_string() strings are
+ * autosaved in struct trace_bprintk_fmt, which are queued on
+ * trace_bprintk_fmt_list.
  */
 static LIST_HEAD(trace_bprintk_fmt_list);
 
@@ -33,8 +34,12 @@ static DEFINE_MUTEX(btrace_mutex);
 struct trace_bprintk_fmt {
 	struct list_head list;
 	const char *fmt;
+	unsigned int type;
 };
 
+#define TRACE_BPRINTK_TYPE		BIT(0)
+#define TRACE_TRACEPOINT_TYPE		BIT(1)
+
 static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
 {
 	struct trace_bprintk_fmt *pos;
@@ -49,22 +54,24 @@ static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
 	return NULL;
 }
 
-static
-void hold_module_trace_bprintk_format(const char **start, const char **end)
+static void hold_module_trace_format(const char **start, const char **end,
+				     unsigned int type)
 {
 	const char **iter;
 	char *fmt;
 
 	/* allocate the trace_printk per cpu buffers */
-	if (start != end)
+	if ((type & TRACE_BPRINTK_TYPE) && start != end)
 		trace_printk_init_buffers();
 
 	mutex_lock(&btrace_mutex);
 	for (iter = start; iter < end; iter++) {
 		struct trace_bprintk_fmt *tb_fmt = lookup_format(*iter);
 		if (tb_fmt) {
-			if (!IS_ERR(tb_fmt))
+			if (!IS_ERR(tb_fmt)) {
+				tb_fmt->type |= type;
 				*iter = tb_fmt->fmt;
+			}
 			continue;
 		}
 
@@ -76,6 +83,7 @@ void hold_module_trace_bprintk_format(const char **start, const char **end)
 				list_add_tail(&tb_fmt->list, &trace_bprintk_fmt_list);
 				strcpy(fmt, *iter);
 				tb_fmt->fmt = fmt;
+				tb_fmt->type = type;
 			} else
 				kfree(tb_fmt);
 		}
@@ -85,17 +93,28 @@ void hold_module_trace_bprintk_format(const char **start, const char **end)
 	mutex_unlock(&btrace_mutex);
 }
 
-static int module_trace_bprintk_format_notify(struct notifier_block *self,
-		unsigned long val, void *data)
+static int module_trace_format_notify(struct notifier_block *self,
+				      unsigned long val, void *data)
 {
 	struct module *mod = data;
+
+	if (val != MODULE_STATE_COMING)
+		return NOTIFY_OK;
+
 	if (mod->num_trace_bprintk_fmt) {
 		const char **start = mod->trace_bprintk_fmt_start;
 		const char **end = start + mod->num_trace_bprintk_fmt;
 
-		if (val == MODULE_STATE_COMING)
-			hold_module_trace_bprintk_format(start, end);
+		hold_module_trace_format(start, end, TRACE_BPRINTK_TYPE);
+	}
+
+	if (mod->num_tracepoint_strings) {
+		const char **start = mod->tracepoint_strings_start;
+		const char **end = start + mod->num_tracepoint_strings;
+
+		hold_module_trace_format(start, end, TRACE_TRACEPOINT_TYPE);
 	}
+
 	return NOTIFY_OK;
 }
 
@@ -171,8 +190,8 @@ static void format_mod_stop(void)
 
 #else /* !CONFIG_MODULES */
 __init static int
-module_trace_bprintk_format_notify(struct notifier_block *self,
-		unsigned long val, void *data)
+module_trace_format_notify(struct notifier_block *self,
+			   unsigned long val, void *data)
 {
 	return NOTIFY_OK;
 }
@@ -193,8 +212,8 @@ void trace_printk_control(bool enabled)
 }
 
 __initdata_or_module static
-struct notifier_block module_trace_bprintk_format_nb = {
-	.notifier_call = module_trace_bprintk_format_notify,
+struct notifier_block module_trace_format_nb = {
+	.notifier_call = module_trace_format_notify,
 };
 
 int __trace_bprintk(unsigned long ip, const char *fmt, ...)
@@ -254,11 +273,25 @@ EXPORT_SYMBOL_GPL(__ftrace_vprintk);
 bool trace_is_tracepoint_string(const char *str)
 {
 	const char **ptr = __start___tracepoint_str;
+#ifdef CONFIG_MODULES
+	struct trace_bprintk_fmt *tb_fmt;
+#endif
 
 	for (ptr = __start___tracepoint_str; ptr < __stop___tracepoint_str; ptr++) {
 		if (str == *ptr)
 			return true;
 	}
+
+#ifdef CONFIG_MODULES
+	mutex_lock(&btrace_mutex);
+	list_for_each_entry(tb_fmt, &trace_bprintk_fmt_list, list) {
+		if ((tb_fmt->type & TRACE_TRACEPOINT_TYPE) && str == tb_fmt->fmt) {
+			mutex_unlock(&btrace_mutex);
+			return true;
+		}
+	}
+	mutex_unlock(&btrace_mutex);
+#endif
 	return false;
 }
 
@@ -824,7 +857,7 @@ fs_initcall(init_trace_printk_function_export);
 
 static __init int init_trace_printk(void)
 {
-	return register_module_notifier(&module_trace_bprintk_format_nb);
+	return register_module_notifier(&module_trace_format_nb);
 }
 
 early_initcall(init_trace_printk);
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related

* [PATCH v2] ring-buffer: report header_page overwrite as char
From: Cao Ruichuang @ 2026-04-06 16:53 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260406162843.41592-1-create0818@163.com>

The header_page tracefs metadata currently reports overwrite as an
int field with size 1. That makes parsers warn about a type and
size mismatch even though the field is only used as a one-byte flag
within commit.

Keep the shared offset with commit as-is, but report overwrite as
char so the declared type matches the hardcoded size. The signedness
is already carried separately by the emitted signed field.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216999
Signed-off-by: Cao Ruichuang <create0818@163.com>
---
 kernel/trace/ring_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 170170bd8..6811dfffa 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -627,11 +627,11 @@ int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq
 			 (unsigned int)sizeof(field.commit),
 			 (unsigned int)is_signed_type(long));
 
-	trace_seq_printf(s, "\tfield: int overwrite;\t"
+	trace_seq_printf(s, "\tfield: char overwrite;\t"
 			 "offset:%u;\tsize:%u;\tsigned:%u;\n",
 			 (unsigned int)offsetof(typeof(field), commit),
 			 1,
-			 (unsigned int)is_signed_type(long));
+			 (unsigned int)is_signed_type(char));
 
 	trace_seq_printf(s, "\tfield: char data;\t"
 			 "offset:%u;\tsize:%u;\tsigned:%u;\n",
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related

* Re: [PATCH] ring-buffer: report header_page overwrite as signed char
From: Steven Rostedt @ 2026-04-06 16:45 UTC (permalink / raw)
  To: CaoRuichuang
  Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260406162843.41592-1-create0818@163.com>

On Tue,  7 Apr 2026 00:28:43 +0800
CaoRuichuang <create0818@163.com> wrote:

> The header_page tracefs metadata currently reports overwrite as an
> int field with size 1. That makes parsers warn about a type and
> size mismatch even though the field is only used as a one-byte flag
> within commit.
> 
> Keep the shared offset with commit as-is, but report overwrite as
> signed char so the declared type matches the hardcoded size and the
> emitted signedness.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216999
> Signed-off-by: CaoRuichuang <create0818@163.com>
> ---
>  kernel/trace/ring_buffer.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 170170bd8..c4c2361b0 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -627,11 +627,11 @@ int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq
>  			 (unsigned int)sizeof(field.commit),
>  			 (unsigned int)is_signed_type(long));
>  
> -	trace_seq_printf(s, "\tfield: int overwrite;\t"
> +	trace_seq_printf(s, "\tfield: signed char overwrite;\t"

From the Bugzilla, the issue was with the rust parser. Would this still not
cause a warning if the "int" was switched to "char" and not "signed char".
The signed is redundant as it is already specified in the fields.

-- Steve


>  			 "offset:%u;\tsize:%u;\tsigned:%u;\n",
>  			 (unsigned int)offsetof(typeof(field), commit),
>  			 1,
> -			 (unsigned int)is_signed_type(long));
> +			 (unsigned int)is_signed_type(signed char));
>  
>  	trace_seq_printf(s, "\tfield: char data;\t"
>  			 "offset:%u;\tsize:%u;\tsigned:%u;\n",


^ permalink raw reply

* [PATCH] ring-buffer: report header_page overwrite as signed char
From: CaoRuichuang @ 2026-04-06 16:28 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel

The header_page tracefs metadata currently reports overwrite as an
int field with size 1. That makes parsers warn about a type and
size mismatch even though the field is only used as a one-byte flag
within commit.

Keep the shared offset with commit as-is, but report overwrite as
signed char so the declared type matches the hardcoded size and the
emitted signedness.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216999
Signed-off-by: CaoRuichuang <create0818@163.com>
---
 kernel/trace/ring_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 170170bd8..c4c2361b0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -627,11 +627,11 @@ int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq
 			 (unsigned int)sizeof(field.commit),
 			 (unsigned int)is_signed_type(long));
 
-	trace_seq_printf(s, "\tfield: int overwrite;\t"
+	trace_seq_printf(s, "\tfield: signed char overwrite;\t"
 			 "offset:%u;\tsize:%u;\tsigned:%u;\n",
 			 (unsigned int)offsetof(typeof(field), commit),
 			 1,
-			 (unsigned int)is_signed_type(long));
+			 (unsigned int)is_signed_type(signed char));
 
 	trace_seq_printf(s, "\tfield: char data;\t"
 			 "offset:%u;\tsize:%u;\tsigned:%u;\n",
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related

* [PATCH] tracing/ipi: report ipi_raise target CPUs as cpumask
From: CaoRuichuang @ 2026-04-06 16:24 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel

Bugzilla 217447 points out that ftrace bitmask fields still use the
legacy dynamic-array format, which makes trace consumers treat them
as unsigned long arrays instead of bitmaps.

This is visible in the ipi events today: ipi_send_cpumask already
reports its CPU mask as '__data_loc cpumask_t', but ipi_raise still
exposes target_cpus as '__data_loc unsigned long[]'.

Switch ipi_raise to __cpumask() and the matching helpers so its
tracefs format matches the existing cpumask representation used by
the other ipi event. The underlying storage size stays the same, but
trace data consumers can now recognize the field as a cpumask
directly.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217447
Signed-off-by: CaoRuichuang <create0818@163.com>
---
 include/trace/events/ipi.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/ipi.h b/include/trace/events/ipi.h
index 9912f0ded..fae4f8eac 100644
--- a/include/trace/events/ipi.h
+++ b/include/trace/events/ipi.h
@@ -68,16 +68,16 @@ TRACE_EVENT(ipi_raise,
 	TP_ARGS(mask, reason),
 
 	TP_STRUCT__entry(
-		__bitmask(target_cpus, nr_cpumask_bits)
+		__cpumask(target_cpus)
 		__field(const char *, reason)
 	),
 
 	TP_fast_assign(
-		__assign_bitmask(target_cpus, cpumask_bits(mask), nr_cpumask_bits);
+		__assign_cpumask(target_cpus, cpumask_bits(mask));
 		__entry->reason = reason;
 	),
 
-	TP_printk("target_mask=%s (%s)", __get_bitmask(target_cpus), __entry->reason)
+	TP_printk("target_mask=%s (%s)", __get_cpumask(target_cpus), __entry->reason)
 );
 
 DECLARE_EVENT_CLASS(ipi_handler,
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related

* Re: [PATCH v5 0/3] PCI Controller event and LTSSM tracepoint support
From: Steven Rostedt @ 2026-04-06 15:08 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Shawn Lin, Bjorn Helgaas, linux-rockchip, linux-pci,
	linux-trace-kernel, linux-doc
In-Reply-To: <u2dh2os5qyuuv636uwzttvohfyics7tvqiobheftjzdnuegq33@n77svn2nlqu2>

On Sat, 4 Apr 2026 22:23:32 +0530
Manivannan Sadhasivam <mani@kernel.org> wrote:

> On Wed, Mar 25, 2026 at 09:58:29AM +0800, Shawn Lin wrote:
> > 
> > This patch-set adds new pci controller event and LTSSM tracepoint used by host drivers
> > which provide LTSSM trace functionality. The first user is pcie-dw-rockchip with a 256
> > Bytes FIFO for recording LTSSM transition.
> >   
> 
> Steve, could you please take a look at the tracing part?

I already have but didn't say anything because I didn't find anything ;-)

Anyway, for the tracing part:

Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply

* Re: [PATCH v2 12/17] landlock: Add tracepoints for ptrace and scope denials
From: Steven Rostedt @ 2026-04-06 15:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Günther Noack, Jann Horn, Jeff Xu,
	Justin Suess, Kees Cook, Masami Hiramatsu, Mathieu Desnoyers,
	Matthieu Buffet, Mikhail Ivanov, Tingmao Wang, kernel-team,
	linux-fsdevel, linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-13-mic@digikod.net>

On Mon,  6 Apr 2026 16:37:10 +0200
Mickaël Salaün <mic@digikod.net> wrote:

> ---
>  include/trace/events/landlock.h | 135 ++++++++++++++++++++++++++++++++
>  security/landlock/log.c         |  20 +++++
>  2 files changed, 155 insertions(+)
> 
> diff --git a/include/trace/events/landlock.h b/include/trace/events/landlock.h
> index 1afab091efba..9f96c9897f44 100644
> --- a/include/trace/events/landlock.h
> +++ b/include/trace/events/landlock.h
> @@ -11,6 +11,7 @@
>  #define _TRACE_LANDLOCK_H
>  
>  #include <linux/tracepoint.h>
> +#include <net/af_unix.h>
>  
>  struct dentry;
>  struct landlock_domain;
> @@ -19,6 +20,7 @@ struct landlock_rule;
>  struct landlock_ruleset;
>  struct path;
>  struct sock;
> +struct task_struct;
>  
>  /**
>   * DOC: Landlock trace events
> @@ -433,6 +435,139 @@ TRACE_EVENT(
>  		__entry->log_new_exec, __entry->blockers, __entry->sport,
>  		__entry->dport));
>  
> +/**
> + * landlock_deny_ptrace - ptrace access denied
> + * @hierarchy: Hierarchy node that blocked the access (never NULL)
> + * @same_exec: Whether the current task is the same executable that called
> + *             landlock_restrict_self() for the denying hierarchy node
> + * @tracee: Target task (never NULL); eBPF can read pid, comm, cred,
> + *          namespaces, and cgroup via BTF
> + */
> +TRACE_EVENT(
> +	landlock_deny_ptrace,
> +
> +	TP_PROTO(const struct landlock_hierarchy *hierarchy, bool same_exec,
> +		 const struct task_struct *tracee),
> +
> +	TP_ARGS(hierarchy, same_exec, tracee),
> +
> +	TP_STRUCT__entry(
> +		__field(__u64, domain_id) __field(bool, same_exec)
> +			__field(u32, log_same_exec) __field(u32, log_new_exec)
> +				__field(pid_t, tracee_pid)
> +					__string(tracee_comm, tracee->comm)),

Event formats are different than normal macro formatting. Please use the
event formatting. The above is a defined structure that is being created
for use. Keep it looking like a structure:

	TP_STRUCT__entry(
		__field(	__u64,		domain_id)
		__field(	bool,		same_exec)
		__field(	u32,		log_same_exec)
		__field(	u32,		log_new_exec)
		__field(	pid_t,		tracee_pid)
		__string(	tracee_comm,	tracee->comm)
	),

See how the above resembles:

struct entry {
	__u64		domain_id;
	bool		same_exec;
	u32		log_same_exec;
	u32		log_new_exec;
	pid_t		tracee_pid;
	string		tracee_comm;
};

Because that's pretty much what the trace event TP_STRUCT__entry() is going
to do with it. (The string will obviously be something else).

This way it's also easy to spot wholes in the structure that is written
into the ring buffer. The "same_exec" being a bool followed by two u32
types, is going to cause a hole. Move it to between tracee_pid and
tracee_comm.

Please fix the other events too.

-- Steve


> +
> +	TP_fast_assign(__entry->domain_id = hierarchy->id;
> +		       __entry->same_exec = same_exec;
> +		       __entry->log_same_exec = hierarchy->log_same_exec;
> +		       __entry->log_new_exec = hierarchy->log_new_exec;
> +		       __entry->tracee_pid =
> +			       task_tgid_nr((struct task_struct *)tracee);
> +		       __assign_str(tracee_comm);),
> +
> +	TP_printk(
> +		"domain=%llx same_exec=%d log_same_exec=%u log_new_exec=%u tracee_pid=%d comm=%s",
> +		__entry->domain_id, __entry->same_exec, __entry->log_same_exec,
> +		__entry->log_new_exec, __entry->tracee_pid,
> +		__print_untrusted_str(tracee_comm)));
> +
>

^ permalink raw reply

* [PATCH v2 15/17] selftests/landlock: Add network tracepoint tests
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Add trace tests for the landlock_deny_access_net tracepoint: denied
bind, allowed bind (no event), denied connect, bind field verification,
connect-after-bind field verification, and an unsandboxed baseline.

Cc: Günther Noack <gnoack@google.com>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 tools/testing/selftests/landlock/net_test.c | 547 +++++++++++++++++++-
 1 file changed, 546 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index 4c528154ea92..4fe41425995c 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -10,11 +10,12 @@
 #include <arpa/inet.h>
 #include <errno.h>
 #include <fcntl.h>
-#include <linux/landlock.h>
 #include <linux/in.h>
+#include <linux/landlock.h>
 #include <sched.h>
 #include <stdint.h>
 #include <string.h>
+#include <sys/mount.h>
 #include <sys/prctl.h>
 #include <sys/socket.h>
 #include <sys/syscall.h>
@@ -22,6 +23,9 @@
 
 #include "audit.h"
 #include "common.h"
+#include "trace.h"
+
+#define TRACE_TASK "net_test"
 
 const short sock_port_start = (1 << 10);
 
@@ -2026,4 +2030,545 @@ TEST_F(audit, connect)
 	EXPECT_EQ(0, close(sock_fd));
 }
 
+/* Trace tests */
+
+/* clang-format off */
+FIXTURE(trace_net) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace_net)
+{
+	int ret;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+
+	ret = tracefs_fixture_setup();
+	if (ret) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_DENY_ACCESS_NET_ENABLE, true));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN(trace_net)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_enable_event(TRACEFS_DENY_ACCESS_NET_ENABLE, false);
+	tracefs_fixture_teardown();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * Baseline: verifies that without Landlock, the bind succeeds and no
+ * deny_access_net trace event fires.
+ */
+/* clang-format off */
+FIXTURE_VARIANT(trace_net)
+{
+	/* clang-format on */
+	bool sandbox;
+	int bind_port_offset; /* 0 = allowed port, 1 = denied port */
+	int expect_denied;
+};
+
+/* Unsandboxed: no Landlock, bind should succeed with no events. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_net, unsandboxed) {
+	/* clang-format on */
+	.sandbox = false,
+	.bind_port_offset = 0,
+	.expect_denied = 0,
+};
+
+/* Denied: sandboxed, bind to port not in ruleset. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_net, bind_denied) {
+	/* clang-format on */
+	.sandbox = true,
+	.bind_port_offset = 1,
+	.expect_denied = 1,
+};
+
+/* Allowed: sandboxed, bind to port in ruleset. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_net, bind_allowed) {
+	/* clang-format on */
+	.sandbox = true,
+	.bind_port_offset = 0,
+	.expect_denied = 0,
+};
+
+TEST_F(trace_net, deny_access_net_bind)
+{
+	char *buf;
+	int count, status;
+	pid_t child;
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		struct sockaddr_in addr = {
+			.sin_family = AF_INET,
+			.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+		};
+		int sock_fd;
+
+		if (variant->sandbox) {
+			struct landlock_ruleset_attr ruleset_attr = {
+				.handled_access_net =
+					LANDLOCK_ACCESS_NET_BIND_TCP,
+			};
+			struct landlock_net_port_attr port_attr = {
+				.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
+				.port = sock_port_start,
+			};
+			int ruleset_fd;
+
+			ruleset_fd = landlock_create_ruleset(
+				&ruleset_attr, sizeof(ruleset_attr), 0);
+			if (ruleset_fd < 0)
+				_exit(1);
+
+			if (landlock_add_rule(ruleset_fd,
+					      LANDLOCK_RULE_NET_PORT,
+					      &port_attr, 0)) {
+				close(ruleset_fd);
+				_exit(1);
+			}
+
+			prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+			if (landlock_restrict_self(ruleset_fd, 0)) {
+				close(ruleset_fd);
+				_exit(1);
+			}
+			close(ruleset_fd);
+		}
+
+		sock_fd = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0);
+		if (sock_fd < 0)
+			_exit(1);
+
+		addr.sin_port =
+			htons(sock_port_start + variant->bind_port_offset);
+		if (variant->expect_denied) {
+			/* Bind should be denied. */
+			if (bind(sock_fd, (struct sockaddr *)&addr,
+				 sizeof(addr)) == 0) {
+				close(sock_fd);
+				_exit(2);
+			}
+			if (errno != EACCES) {
+				close(sock_fd);
+				_exit(3);
+			}
+		} else {
+			/* Bind should succeed. */
+			if (bind(sock_fd, (struct sockaddr *)&addr,
+				 sizeof(addr))) {
+				close(sock_fd);
+				_exit(2);
+			}
+		}
+		close(sock_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(buf, REGEX_DENY_ACCESS_NET(TRACE_TASK));
+	if (variant->expect_denied) {
+		EXPECT_LE(variant->expect_denied, count)
+		{
+			TH_LOG("Expected deny_access_net event, got %d\n%s",
+			       count, buf);
+		}
+	} else {
+		EXPECT_EQ(0, count)
+		{
+			TH_LOG("Expected 0 deny_access_net events, "
+			       "got %d\n%s",
+			       count, buf);
+		}
+	}
+
+	free(buf);
+}
+
+/* Connect and field-check tests use a separate fixture without variants. */
+
+/* clang-format off */
+FIXTURE(trace_net_connect) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace_net_connect)
+{
+	int ret;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+
+	ret = tracefs_fixture_setup();
+	if (ret) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_DENY_ACCESS_NET_ENABLE, true));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN(trace_net_connect)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_enable_event(TRACEFS_DENY_ACCESS_NET_ENABLE, false);
+	tracefs_fixture_teardown();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * Verifies that a denied connect emits a deny_access_net trace event with
+ * sport=0 and dport=<denied_port>.
+ */
+TEST_F(trace_net_connect, deny_access_net_connect_denied)
+{
+	pid_t child;
+	int status;
+	char *buf;
+	char field[64], expected[16];
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
+		};
+		struct landlock_net_port_attr port_attr = {
+			.allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP,
+			.port = sock_port_start,
+		};
+		struct sockaddr_in addr = {
+			.sin_family = AF_INET,
+			.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+		};
+		int ruleset_fd, sock_fd;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
+				      &port_attr, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(ruleset_fd);
+
+		/* Connect to denied port. */
+		sock_fd = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0);
+		if (sock_fd < 0)
+			_exit(1);
+
+		addr.sin_port = htons(sock_port_start + 1);
+		if (connect(sock_fd, (struct sockaddr *)&addr, sizeof(addr)) ==
+		    0) {
+			close(sock_fd);
+			_exit(2);
+		}
+		if (errno != EACCES) {
+			close(sock_fd);
+			_exit(3);
+		}
+		close(sock_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_LE(1, tracefs_count_matches(buf,
+					   REGEX_DENY_ACCESS_NET(TRACE_TASK)));
+
+	/*
+	 * Verify dport is the denied port and sport is 0.  The port
+	 * value must be in host endianness, matching the UAPI convention
+	 * (landlock_net_port_attr.port).  On little-endian,
+	 * htons(sock_port_start + 1) would produce a different decimal
+	 * value, so this comparison also catches byte-order bugs.
+	 */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_NET(TRACE_TASK),
+					"sport", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_NET(TRACE_TASK),
+					"dport", field, sizeof(field)));
+	snprintf(expected, sizeof(expected), "%llu",
+		 (unsigned long long)(sock_port_start + 1));
+	EXPECT_STREQ(expected, field);
+
+	free(buf);
+}
+
+/* Verifies that a denied bind emits sport=<port> dport=0. */
+TEST_F(trace_net_connect, deny_access_net_bind_fields)
+{
+	pid_t child;
+	int status;
+	char *buf;
+	char field[64], expected[16];
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP,
+		};
+		struct landlock_net_port_attr port_attr = {
+			.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
+			.port = sock_port_start,
+		};
+		struct sockaddr_in addr = {
+			.sin_family = AF_INET,
+			.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+		};
+		int ruleset_fd, sock_fd;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
+				      &port_attr, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(ruleset_fd);
+
+		/* Bind to denied port. */
+		sock_fd = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0);
+		if (sock_fd < 0)
+			_exit(1);
+
+		addr.sin_port = htons(sock_port_start + 1);
+		if (bind(sock_fd, (struct sockaddr *)&addr, sizeof(addr)) ==
+		    0) {
+			close(sock_fd);
+			_exit(2);
+		}
+		if (errno != EACCES) {
+			close(sock_fd);
+			_exit(3);
+		}
+		close(sock_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_LE(1, tracefs_count_matches(buf,
+					   REGEX_DENY_ACCESS_NET(TRACE_TASK)));
+
+	/* Verify sport is the denied port and dport is 0. */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_NET(TRACE_TASK),
+					"dport", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_NET(TRACE_TASK),
+					"sport", field, sizeof(field)));
+	snprintf(expected, sizeof(expected), "%llu",
+		 (unsigned long long)(sock_port_start + 1));
+	EXPECT_STREQ(expected, field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that a denied connect after a successful bind shows sport=0 and
+ * dport=<denied_port>.  The bind succeeds (allowed port), then the connect is
+ * denied.  sport=0 because the denied operation is connect, not bind.
+ */
+TEST_F(trace_net_connect, deny_access_net_connect_after_bind)
+{
+	pid_t child;
+	int status;
+	char *buf;
+	char field[64], expected[16];
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP |
+					      LANDLOCK_ACCESS_NET_CONNECT_TCP,
+		};
+		struct landlock_net_port_attr port_attr;
+		struct sockaddr_in bind_addr = {
+			.sin_family = AF_INET,
+			.sin_port = htons(sock_port_start),
+			.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+		};
+		struct sockaddr_in conn_addr = {
+			.sin_family = AF_INET,
+			.sin_port = htons(sock_port_start + 1),
+			.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+		};
+		int ruleset_fd, sock_fd, optval = 1;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		/* Allow bind and connect on sock_port_start only. */
+		port_attr.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP |
+					   LANDLOCK_ACCESS_NET_CONNECT_TCP;
+		port_attr.port = sock_port_start;
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
+				      &port_attr, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(ruleset_fd);
+
+		sock_fd = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0);
+		if (sock_fd < 0)
+			_exit(1);
+		setsockopt(sock_fd, SOL_SOCKET, SO_REUSEADDR, &optval,
+			   sizeof(optval));
+
+		/* Bind to allowed port (succeeds, no trace event). */
+		if (bind(sock_fd, (struct sockaddr *)&bind_addr,
+			 sizeof(bind_addr))) {
+			close(sock_fd);
+			_exit(1);
+		}
+
+		/* Connect to denied port (fails, emits trace event). */
+		if (connect(sock_fd, (struct sockaddr *)&conn_addr,
+			    sizeof(conn_addr)) == 0) {
+			close(sock_fd);
+			_exit(2);
+		}
+		if (errno != EACCES) {
+			close(sock_fd);
+			_exit(3);
+		}
+		close(sock_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_LE(1, tracefs_count_matches(buf,
+					   REGEX_DENY_ACCESS_NET(TRACE_TASK)));
+
+	/*
+	 * The denied operation is connect, so sport=0 and dport=<denied_port>,
+	 * regardless of the prior bind.
+	 */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_NET(TRACE_TASK),
+					"sport", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_NET(TRACE_TASK),
+					"dport", field, sizeof(field)));
+	snprintf(expected, sizeof(expected), "%llu",
+		 (unsigned long long)(sock_port_start + 1));
+	EXPECT_STREQ(expected, field);
+
+	free(buf);
+}
+
+/*
+ * IPv6 network trace tests are intentionally elided.  IPv6 hook dispatch uses
+ * the same current_check_access_socket() code path as IPv4, validated by the
+ * audit tests in this file.  The trace events use the same blockers/sport/dport
+ * fields regardless of address family.
+ */
+
 TEST_HARNESS_MAIN
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 13/17] selftests/landlock: Add trace event test infrastructure and tests
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Add tracefs test infrastructure in trace.h: helpers for mounting
tracefs, enabling/disabling events, reading the trace buffer, counting
regex matches, and extracting field values.  Add per-event regex
patterns for matching trace lines.

The TRACE_PREFIX macro matches the ftrace trace-file line format with
either the expected task name (truncated to TASK_COMM_LEN - 1) or
"<...>" (for evicted comm cache entries).  All regex patterns are
anchored with ^ and $, verify every TP_printk field, and use no
unescaped dot characters.

Extend the existing true helper to open its working directory before
exiting, which triggers a read_dir denial when executed inside a
sandbox.  The exec-based tests use this to verify same_exec=0 and log
flag behavior after exec.

Add trace_test.c with the trace fixture (setup enables all available
events with a PID filter, teardown disables and clears) and lifecycle
and API tests: no_trace_when_disabled, create_ruleset, ruleset_version,
restrict_self, restrict_self_nested, restrict_self_invalid,
add_rule_invalid_fd, add_rule_net_fields, free_domain,
free_ruleset_on_close.

Add denial field and log flag tests: deny_access_fs_fields,
same_exec_before_exec, same_exec_after_exec, log_flags_same_exec_off,
log_flags_new_exec_on, log_flags_subdomains_off,
non_audit_visible_denial_counting.

Move regex_escape() from audit.h to common.h for shared use by both
audit and trace tests.

Enable CONFIG_FTRACE_SYSCALLS alongside CONFIG_FTRACE in the selftest
config because CONFIG_FTRACE alone only enables the tracer menu without
activating any tracer.  CONFIG_FTRACE_SYSCALLS is the lightest tracer
option that selects GENERIC_TRACER, TRACING, and TRACEPOINTS, which are
required for tracefs and Landlock trace events.  Both UML and x86_64
provide the required HAVE_SYSCALL_TRACEPOINTS.  When CONFIG_FTRACE is
disabled, CONFIG_FTRACE_SYSCALLS is gated by the FTRACE menu and cannot
be set, so TRACEPOINTS is correctly disabled.

Cc: Günther Noack <gnoack@google.com>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 tools/testing/selftests/landlock/audit.h      |   35 +-
 tools/testing/selftests/landlock/common.h     |   47 +
 tools/testing/selftests/landlock/config       |    2 +
 tools/testing/selftests/landlock/trace.h      |  640 +++++++++
 tools/testing/selftests/landlock/trace_test.c | 1168 +++++++++++++++++
 tools/testing/selftests/landlock/true.c       |   10 +
 6 files changed, 1868 insertions(+), 34 deletions(-)
 create mode 100644 tools/testing/selftests/landlock/trace.h
 create mode 100644 tools/testing/selftests/landlock/trace_test.c

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 834005b2b0f0..84bb8f34bc83 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -206,40 +206,7 @@ static int audit_set_status(int fd, __u32 key, __u32 val)
 	return audit_request(fd, &msg, NULL);
 }
 
-/* Returns a pointer to the last filled character of @dst, which is `\0`.  */
-static __maybe_unused char *regex_escape(const char *const src, char *dst,
-					 size_t dst_size)
-{
-	char *d = dst;
-
-	for (const char *s = src; *s; s++) {
-		switch (*s) {
-		case '$':
-		case '*':
-		case '.':
-		case '[':
-		case '\\':
-		case ']':
-		case '^':
-			if (d >= dst + dst_size - 2)
-				return (char *)-ENOMEM;
-
-			*d++ = '\\';
-			*d++ = *s;
-			break;
-		default:
-			if (d >= dst + dst_size - 1)
-				return (char *)-ENOMEM;
-
-			*d++ = *s;
-		}
-	}
-	if (d >= dst + dst_size - 1)
-		return (char *)-ENOMEM;
-
-	*d = '\0';
-	return d;
-}
+/* regex_escape() is defined in common.h */
 
 /*
  * @domain_id: The domain ID extracted from the audit message (if the first part
diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
index 90551650299c..dfc0df543e56 100644
--- a/tools/testing/selftests/landlock/common.h
+++ b/tools/testing/selftests/landlock/common.h
@@ -251,3 +251,50 @@ static void __maybe_unused set_unix_address(struct service_fixture *const srv,
 	srv->unix_addr_len = SUN_LEN(&srv->unix_addr);
 	srv->unix_addr.sun_path[0] = '\0';
 }
+
+/**
+ * regex_escape - Escape BRE metacharacters in a string
+ *
+ * @src: Source string to escape.
+ * @dst: Destination buffer for the escaped string.
+ * @dst_size: Size of the destination buffer.
+ *
+ * Escapes characters that have special meaning in POSIX Basic Regular
+ * Expressions: $ * . [ \ ] ^
+ *
+ * Returns a pointer to the NUL terminator in @dst (cursor-style API for
+ * chaining), or (char *)-ENOMEM if the buffer is too small.
+ */
+static __maybe_unused char *regex_escape(const char *const src, char *dst,
+					 size_t dst_size)
+{
+	char *d = dst;
+
+	for (const char *s = src; *s; s++) {
+		switch (*s) {
+		case '$':
+		case '*':
+		case '.':
+		case '[':
+		case '\\':
+		case ']':
+		case '^':
+			if (d >= dst + dst_size - 2)
+				return (char *)-ENOMEM;
+
+			*d++ = '\\';
+			*d++ = *s;
+			break;
+		default:
+			if (d >= dst + dst_size - 1)
+				return (char *)-ENOMEM;
+
+			*d++ = *s;
+		}
+	}
+	if (d >= dst + dst_size - 1)
+		return (char *)-ENOMEM;
+
+	*d = '\0';
+	return d;
+}
diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
index 8fe9b461b1fd..acfa31670c44 100644
--- a/tools/testing/selftests/landlock/config
+++ b/tools/testing/selftests/landlock/config
@@ -2,6 +2,8 @@ CONFIG_AF_UNIX_OOB=y
 CONFIG_AUDIT=y
 CONFIG_CGROUPS=y
 CONFIG_CGROUP_SCHED=y
+CONFIG_FTRACE=y
+CONFIG_FTRACE_SYSCALLS=y
 CONFIG_INET=y
 CONFIG_IPV6=y
 CONFIG_KEYS=y
diff --git a/tools/testing/selftests/landlock/trace.h b/tools/testing/selftests/landlock/trace.h
new file mode 100644
index 000000000000..d8a4eb0906f0
--- /dev/null
+++ b/tools/testing/selftests/landlock/trace.h
@@ -0,0 +1,640 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Landlock trace test helpers
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <regex.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "kselftest_harness.h"
+
+#define TRACEFS_ROOT "/sys/kernel/tracing"
+#define TRACEFS_LANDLOCK_DIR TRACEFS_ROOT "/events/landlock"
+#define TRACEFS_CREATE_RULESET_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_create_ruleset/enable"
+#define TRACEFS_RESTRICT_SELF_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_restrict_self/enable"
+#define TRACEFS_ADD_RULE_FS_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_add_rule_fs/enable"
+#define TRACEFS_ADD_RULE_NET_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_add_rule_net/enable"
+#define TRACEFS_CHECK_RULE_FS_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_check_rule_fs/enable"
+#define TRACEFS_CHECK_RULE_NET_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_check_rule_net/enable"
+#define TRACEFS_DENY_ACCESS_FS_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_deny_access_fs/enable"
+#define TRACEFS_DENY_ACCESS_NET_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_deny_access_net/enable"
+#define TRACEFS_DENY_PTRACE_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_deny_ptrace/enable"
+#define TRACEFS_DENY_SCOPE_SIGNAL_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_deny_scope_signal/enable"
+#define TRACEFS_DENY_SCOPE_ABSTRACT_UNIX_SOCKET_ENABLE \
+	TRACEFS_LANDLOCK_DIR                           \
+	"/landlock_deny_scope_abstract_unix_socket/enable"
+#define TRACEFS_FREE_DOMAIN_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_free_domain/enable"
+#define TRACEFS_FREE_RULESET_ENABLE \
+	TRACEFS_LANDLOCK_DIR "/landlock_free_ruleset/enable"
+#define TRACEFS_TRACE TRACEFS_ROOT "/trace"
+#define TRACEFS_SET_EVENT_PID TRACEFS_ROOT "/set_event_pid"
+#define TRACEFS_OPTIONS_EVENT_FORK TRACEFS_ROOT "/options/event-fork"
+
+#define TRACE_BUFFER_SIZE (64 * 1024)
+
+/*
+ * Trace line prefix: matches the ftrace "trace" file format.  Format: "
+ * <task>-<pid> [<cpu>] <flags> <timestamp>: "
+ *
+ * The task parameter must be a string literal truncated to 15 chars
+ * (TASK_COMM_LEN - 1), matching what the kernel stores in task->comm.  The
+ * pattern accepts either the expected task name or "<...>" because the ftrace
+ * comm cache may evict short-lived processes (e.g., forked children that exit
+ * before the trace buffer is read).
+ *
+ * No unescaped '.' in any REGEX macro; literal dots use '\\.'.
+ */
+/* clang-format off */
+#define TRACE_PREFIX(task)                                                     \
+	"^ *\\(<\\.\\.\\.>"                                                    \
+	"\\|" task "\\)"                                                       \
+	"-[0-9]\\+ *\\[[0-9]\\+\\] [^ ]\\+ \\+[0-9]\\+\\.[0-9]\\+: "
+
+/*
+ * Task name for events emitted by kworker threads (e.g., free_domain fires from
+ * a work queue, not from the test process).
+ */
+#define KWORKER_TASK "kworker/[0-9]\\+:[0-9]\\+"
+
+#define REGEX_ADD_RULE_FS(task)            \
+	TRACE_PREFIX(task)                 \
+	"landlock_add_rule_fs: "           \
+	"ruleset=[0-9a-f]\\+\\.[0-9]\\+ " \
+	"access_rights=0x[0-9a-f]\\+ "    \
+	"dev=[0-9]\\+:[0-9]\\+ "          \
+	"ino=[0-9]\\+ "                   \
+	"path=[^ ]\\+$"
+
+#define REGEX_ADD_RULE_NET(task)            \
+	TRACE_PREFIX(task)                  \
+	"landlock_add_rule_net: "           \
+	"ruleset=[0-9a-f]\\+\\.[0-9]\\+ " \
+	"access_rights=0x[0-9a-f]\\+ "    \
+	"port=[0-9]\\+$"
+
+#define REGEX_CREATE_RULESET(task)          \
+	TRACE_PREFIX(task)                  \
+	"landlock_create_ruleset: "         \
+	"ruleset=[0-9a-f]\\+\\.[0-9]\\+ " \
+	"handled_fs=0x[0-9a-f]\\+ "       \
+	"handled_net=0x[0-9a-f]\\+ "      \
+	"scoped=0x[0-9a-f]\\+$"
+
+#define REGEX_RESTRICT_SELF(task)           \
+	TRACE_PREFIX(task)                  \
+	"landlock_restrict_self: "          \
+	"ruleset=[0-9a-f]\\+\\.[0-9]\\+ " \
+	"domain=[0-9a-f]\\+ "             \
+	"parent=[0-9a-f]\\+$"
+
+#define REGEX_CHECK_RULE_FS(task)    \
+	TRACE_PREFIX(task)           \
+	"landlock_check_rule_fs: "   \
+	"domain=[0-9a-f]\\+ "       \
+	"request=0x[0-9a-f]\\+ "    \
+	"dev=[0-9]\\+:[0-9]\\+ "    \
+	"ino=[0-9]\\+ "             \
+	"allowed={[0-9a-fx, ]*}$"
+
+#define REGEX_CHECK_RULE_NET(task)    \
+	TRACE_PREFIX(task)            \
+	"landlock_check_rule_net: "   \
+	"domain=[0-9a-f]\\+ "        \
+	"request=0x[0-9a-f]\\+ "     \
+	"port=[0-9]\\+ "             \
+	"allowed={[0-9a-fx, ]*}$"
+
+#define REGEX_DENY_ACCESS_FS(task)    \
+	TRACE_PREFIX(task)              \
+	"landlock_deny_access_fs: "    \
+	"domain=[0-9a-f]\\+ "          \
+	"same_exec=[01] "              \
+	"log_same_exec=[01] "          \
+	"log_new_exec=[01] "           \
+	"blockers=0x[0-9a-f]\\+ "      \
+	"dev=[0-9]\\+:[0-9]\\+ "       \
+	"ino=[0-9]\\+ "                \
+	"path=[^ ]*$"
+
+#define REGEX_DENY_ACCESS_NET(task)    \
+	TRACE_PREFIX(task)              \
+	"landlock_deny_access_net: "   \
+	"domain=[0-9a-f]\\+ "          \
+	"same_exec=[01] "              \
+	"log_same_exec=[01] "          \
+	"log_new_exec=[01] "           \
+	"blockers=0x[0-9a-f]\\+ "      \
+	"sport=[0-9]\\+ "              \
+	"dport=[0-9]\\+$"
+
+#define REGEX_DENY_PTRACE(task)    \
+	TRACE_PREFIX(task)           \
+	"landlock_deny_ptrace: "    \
+	"domain=[0-9a-f]\\+ "       \
+	"same_exec=[01] "           \
+	"log_same_exec=[01] "       \
+	"log_new_exec=[01] "        \
+	"tracee_pid=[0-9]\\+ "      \
+	"comm=[^ ]*$"
+
+#define REGEX_DENY_SCOPE_SIGNAL(task)        \
+	TRACE_PREFIX(task)                    \
+	"landlock_deny_scope_signal: "       \
+	"domain=[0-9a-f]\\+ "                \
+	"same_exec=[01] "                    \
+	"log_same_exec=[01] "                \
+	"log_new_exec=[01] "                 \
+	"target_pid=[0-9]\\+ "               \
+	"comm=[^ ]*$"
+
+#define REGEX_DENY_SCOPE_ABSTRACT_UNIX_SOCKET(task) \
+	TRACE_PREFIX(task)                           \
+	"landlock_deny_scope_abstract_unix_socket: " \
+	"domain=[0-9a-f]\\+ "                       \
+	"same_exec=[01] "                           \
+	"log_same_exec=[01] "                       \
+	"log_new_exec=[01] "                        \
+	"peer_pid=[0-9]\\+ "                        \
+	"sun_path=[^ ]*$"
+
+#define REGEX_FREE_DOMAIN(task)    \
+	TRACE_PREFIX(task)         \
+	"landlock_free_domain: "   \
+	"domain=[0-9a-f]\\+ "     \
+	"denials=[0-9]\\+$"
+
+#define REGEX_FREE_RULESET(task)    \
+	TRACE_PREFIX(task)          \
+	"landlock_free_ruleset: "   \
+	"ruleset=[0-9a-f]\\+\\.[0-9]\\+$"
+/* clang-format on */
+
+static int __maybe_unused tracefs_write(const char *path, const char *value)
+{
+	int fd;
+	ssize_t ret;
+	size_t len = strlen(value);
+
+	fd = open(path, O_WRONLY | O_TRUNC | O_CLOEXEC);
+	if (fd < 0)
+		return -errno;
+
+	ret = write(fd, value, len);
+	close(fd);
+	if (ret < 0)
+		return -errno;
+	if ((size_t)ret != len)
+		return -EIO;
+
+	return 0;
+}
+
+static int __maybe_unused tracefs_write_int(const char *path, int value)
+{
+	char buf[32];
+
+	snprintf(buf, sizeof(buf), "%d", value);
+	return tracefs_write(path, buf);
+}
+
+static int __maybe_unused tracefs_setup(void)
+{
+	struct stat st;
+
+	/* Mount tracefs if not already mounted. */
+	if (stat(TRACEFS_ROOT, &st) != 0) {
+		int ret = mount("tracefs", TRACEFS_ROOT, "tracefs", 0, NULL);
+
+		if (ret)
+			return -errno;
+	}
+
+	/* Verify landlock events are available. */
+	if (stat(TRACEFS_LANDLOCK_DIR, &st) != 0)
+		return -ENOENT;
+
+	return 0;
+}
+
+/*
+ * Set up PID-based event filtering so only events from the current process and
+ * its children are recorded.  This is analogous to audit's AUDIT_EXE filter: it
+ * prevents events from unrelated processes from polluting the trace buffer.
+ */
+static int __maybe_unused tracefs_set_pid_filter(pid_t pid)
+{
+	int ret;
+
+	/* Enable event-fork so children inherit the PID filter. */
+	ret = tracefs_write(TRACEFS_OPTIONS_EVENT_FORK, "1");
+	if (ret)
+		return ret;
+
+	return tracefs_write_int(TRACEFS_SET_EVENT_PID, pid);
+}
+
+/* Clear the PID filter to stop filtering by PID. */
+static int __maybe_unused tracefs_clear_pid_filter(void)
+{
+	return tracefs_write(TRACEFS_SET_EVENT_PID, "");
+}
+
+static int __maybe_unused tracefs_enable_event(const char *enable_path,
+					       bool enable)
+{
+	return tracefs_write(enable_path, enable ? "1" : "0");
+}
+
+static int __maybe_unused tracefs_clear(void)
+{
+	return tracefs_write(TRACEFS_TRACE, "");
+}
+
+/*
+ * Reads the trace buffer content into a newly allocated buffer.  The caller is
+ * responsible for freeing the returned buffer.  Returns NULL on error.
+ */
+static char __maybe_unused *tracefs_read_trace(void)
+{
+	char *buf;
+	int fd;
+	ssize_t total = 0, ret;
+
+	buf = malloc(TRACE_BUFFER_SIZE);
+	if (!buf)
+		return NULL;
+
+	fd = open(TRACEFS_TRACE, O_RDONLY | O_CLOEXEC);
+	if (fd < 0) {
+		free(buf);
+		return NULL;
+	}
+
+	while (total < TRACE_BUFFER_SIZE - 1) {
+		ret = read(fd, buf + total, TRACE_BUFFER_SIZE - 1 - total);
+		if (ret <= 0)
+			break;
+		total += ret;
+	}
+	close(fd);
+	buf[total] = '\0';
+	return buf;
+}
+
+/* Counts the number of lines in @buf matching the basic regex @pattern. */
+static int __maybe_unused tracefs_count_matches(const char *buf,
+						const char *pattern)
+{
+	regex_t regex;
+	int count = 0;
+	const char *line, *end;
+
+	if (regcomp(&regex, pattern, 0) != 0)
+		return -EINVAL;
+
+	line = buf;
+	while (*line) {
+		end = strchr(line, '\n');
+		if (!end)
+			end = line + strlen(line);
+
+		/* Create a temporary null-terminated line. */
+		size_t len = end - line;
+		char *tmp = malloc(len + 1);
+
+		if (tmp) {
+			memcpy(tmp, line, len);
+			tmp[len] = '\0';
+			if (regexec(&regex, tmp, 0, NULL, 0) == 0)
+				count++;
+			free(tmp);
+		}
+
+		if (*end == '\n')
+			line = end + 1;
+		else
+			break;
+	}
+
+	regfree(&regex);
+	return count;
+}
+
+/*
+ * Extracts the value of a named field from a trace line in @buf.  Searches for
+ * the first line matching @line_pattern, then extracts the value after
+ * "@field_name=" into @out.  Stops at space or newline.
+ *
+ * Returns 0 on success, -ENOENT if no match.
+ */
+static int __maybe_unused tracefs_extract_field(const char *buf,
+						const char *line_pattern,
+						const char *field_name,
+						char *out, size_t out_size)
+{
+	regex_t regex;
+	const char *line, *end;
+
+	if (regcomp(&regex, line_pattern, 0) != 0)
+		return -EINVAL;
+
+	line = buf;
+	while (*line) {
+		end = strchr(line, '\n');
+		if (!end)
+			end = line + strlen(line);
+
+		size_t len = end - line;
+		char *tmp = malloc(len + 1);
+
+		if (tmp) {
+			const char *field, *val_start;
+			size_t field_len, val_len;
+
+			memcpy(tmp, line, len);
+			tmp[len] = '\0';
+
+			if (regexec(&regex, tmp, 0, NULL, 0) != 0) {
+				free(tmp);
+				goto next;
+			}
+
+			/*
+			 * Find "field_name=" in the line, ensuring a word
+			 * boundary before the field name to avoid substring
+			 * matches (e.g., "port" in "sport").
+			 */
+			field_len = strlen(field_name);
+			field = tmp;
+			while ((field = strstr(field, field_name))) {
+				if (field[field_len] == '=' &&
+				    (field == tmp || field[-1] == ' '))
+					break;
+				field++;
+			}
+			if (!field) {
+				free(tmp);
+				regfree(&regex);
+				return -ENOENT;
+			}
+
+			val_start = field + field_len + 1;
+			val_len = 0;
+			while (val_start[val_len] &&
+			       val_start[val_len] != ' ' &&
+			       val_start[val_len] != '\n')
+				val_len++;
+
+			if (val_len >= out_size)
+				val_len = out_size - 1;
+			memcpy(out, val_start, val_len);
+			out[val_len] = '\0';
+
+			free(tmp);
+			regfree(&regex);
+			return 0;
+		}
+next:
+		if (*end == '\n')
+			line = end + 1;
+		else
+			break;
+	}
+
+	regfree(&regex);
+	return -ENOENT;
+}
+
+/*
+ * Common fixture setup for trace tests.  Mounts tracefs if needed and
+ * sets a PID filter.  The caller must create a mount namespace first
+ * (unshare(CLONE_NEWNS) + mount(MS_REC | MS_PRIVATE)) to isolate
+ * tracefs state.
+ *
+ * Returns 0 on success, -errno on failure (caller should SKIP).
+ */
+static int __maybe_unused tracefs_fixture_setup(void)
+{
+	int ret;
+
+	ret = tracefs_setup();
+	if (ret)
+		return ret;
+
+	return tracefs_set_pid_filter(getpid());
+}
+
+static void __maybe_unused tracefs_fixture_teardown(void)
+{
+	tracefs_clear_pid_filter();
+}
+
+/*
+ * Temporarily raises CAP_SYS_ADMIN effective capability, calls @func, then
+ * drops the capability.  Returns the value from @func, or -EPERM if the
+ * capability manipulation fails.
+ */
+static int __maybe_unused tracefs_priv_call(int (*func)(void))
+{
+	const cap_value_t admin = CAP_SYS_ADMIN;
+	cap_t cap_p;
+	int ret;
+
+	cap_p = cap_get_proc();
+	if (!cap_p)
+		return -EPERM;
+
+	if (cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &admin, CAP_SET) ||
+	    cap_set_proc(cap_p)) {
+		cap_free(cap_p);
+		return -EPERM;
+	}
+
+	ret = func();
+
+	cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &admin, CAP_CLEAR);
+	cap_set_proc(cap_p);
+	cap_free(cap_p);
+	return ret;
+}
+
+/* Read the trace buffer with elevated privileges.  Returns NULL on failure. */
+static char __maybe_unused *tracefs_read_buf(void)
+{
+	/* Cannot use tracefs_priv_call() because the return type is char *. */
+	cap_t cap_p;
+	char *buf;
+	const cap_value_t admin = CAP_SYS_ADMIN;
+
+	cap_p = cap_get_proc();
+	if (!cap_p)
+		return NULL;
+
+	if (cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &admin, CAP_SET) ||
+	    cap_set_proc(cap_p)) {
+		cap_free(cap_p);
+		return NULL;
+	}
+
+	buf = tracefs_read_trace();
+
+	cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &admin, CAP_CLEAR);
+	cap_set_proc(cap_p);
+	cap_free(cap_p);
+	return buf;
+}
+
+/* Clear the trace buffer with elevated privileges.  Returns 0 on success. */
+static int __maybe_unused tracefs_clear_buf(void)
+{
+	return tracefs_priv_call(tracefs_clear);
+}
+
+/*
+ * Forks a child that creates a Landlock sandbox and performs an FS access.  The
+ * parent waits for the child, then reads the trace buffer.
+ *
+ * Requires common.h and wrappers.h to be included before trace.h.
+ */
+static void __maybe_unused sandbox_child_fs_access(
+	struct __test_metadata *const _metadata, const char *rule_path,
+	__u64 handled_access, __u64 allowed_access, const char *access_path)
+{
+	pid_t pid;
+	int status;
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_fs = handled_access,
+		};
+		struct landlock_path_beneath_attr path_beneath = {
+			.allowed_access = allowed_access,
+		};
+		int ruleset_fd, fd;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open(rule_path, O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				      &path_beneath, 0)) {
+			close(path_beneath.parent_fd);
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(path_beneath.parent_fd);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(ruleset_fd);
+
+		fd = open(access_path, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (fd >= 0)
+			close(fd);
+
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+}
+
+/*
+ * Forks a child that creates a Landlock sandbox allowing execute+read_dir for
+ * /usr and execute-only for ".", then execs ./true.  The true binary opens "."
+ * on startup, triggering a read_dir denial with same_exec=0.  The parent waits
+ * for the child to exit.
+ */
+static void __maybe_unused sandbox_child_exec_true(
+	struct __test_metadata *const _metadata, __u32 restrict_flags)
+{
+	pid_t pid;
+	int status;
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR |
+					     LANDLOCK_ACCESS_FS_EXECUTE,
+		};
+		struct landlock_path_beneath_attr path_beneath = {
+			.allowed_access = LANDLOCK_ACCESS_FS_EXECUTE |
+					  LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd;
+
+		ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd >= 0) {
+			landlock_add_rule(ruleset_fd,
+					  LANDLOCK_RULE_PATH_BENEATH,
+					  &path_beneath, 0);
+			close(path_beneath.parent_fd);
+		}
+
+		path_beneath.allowed_access = LANDLOCK_ACCESS_FS_EXECUTE;
+		path_beneath.parent_fd =
+			open(".", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd >= 0) {
+			landlock_add_rule(ruleset_fd,
+					  LANDLOCK_RULE_PATH_BENEATH,
+					  &path_beneath, 0);
+			close(path_beneath.parent_fd);
+		}
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, restrict_flags))
+			_exit(1);
+		close(ruleset_fd);
+
+		execl("./true", "./true", NULL);
+		_exit(1);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+}
diff --git a/tools/testing/selftests/landlock/trace_test.c b/tools/testing/selftests/landlock/trace_test.c
new file mode 100644
index 000000000000..0256383489fe
--- /dev/null
+++ b/tools/testing/selftests/landlock/trace_test.c
@@ -0,0 +1,1168 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Tracepoints
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "common.h"
+#include "trace.h"
+
+#define TRACE_TASK "trace_test"
+
+/* clang-format off */
+FIXTURE(trace) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace)
+{
+	int ret;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+
+	ret = tracefs_fixture_setup();
+	if (ret) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_CREATE_RULESET_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_RESTRICT_SELF_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_ADD_RULE_FS_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_ADD_RULE_NET_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_CHECK_RULE_FS_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_CHECK_RULE_NET_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_DENY_ACCESS_FS_ENABLE, true));
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_DENY_ACCESS_NET_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_FREE_DOMAIN_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_FREE_RULESET_ENABLE, true));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN(trace)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	/* Disables landlock events and clears PID filter. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_enable_event(TRACEFS_CREATE_RULESET_ENABLE, false);
+	tracefs_enable_event(TRACEFS_RESTRICT_SELF_ENABLE, false);
+	tracefs_enable_event(TRACEFS_ADD_RULE_FS_ENABLE, false);
+	tracefs_enable_event(TRACEFS_ADD_RULE_NET_ENABLE, false);
+	tracefs_enable_event(TRACEFS_CHECK_RULE_FS_ENABLE, false);
+	tracefs_enable_event(TRACEFS_CHECK_RULE_NET_ENABLE, false);
+	tracefs_enable_event(TRACEFS_DENY_ACCESS_FS_ENABLE, false);
+	tracefs_enable_event(TRACEFS_DENY_ACCESS_NET_ENABLE, false);
+	tracefs_enable_event(TRACEFS_FREE_DOMAIN_ENABLE, false);
+	tracefs_enable_event(TRACEFS_FREE_RULESET_ENABLE, false);
+	tracefs_clear_pid_filter();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/*
+	 * The mount namespace is cleaned up automatically when the test process
+	 * (harness child) exits.
+	 */
+}
+
+/*
+ * Verifies that no trace events are emitted when the tracepoints are disabled.
+ */
+TEST_F(trace, no_trace_when_disabled)
+{
+	char *buf;
+
+	/* Disable all landlock events. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_CREATE_RULESET_ENABLE, false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_RESTRICT_SELF_ENABLE, false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_ADD_RULE_FS_ENABLE, false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_ADD_RULE_NET_ENABLE, false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_CHECK_RULE_FS_ENABLE, false));
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_CHECK_RULE_NET_ENABLE, false));
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_DENY_ACCESS_FS_ENABLE, false));
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_DENY_ACCESS_NET_ENABLE, false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_DENY_PTRACE_ENABLE, false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_DENY_SCOPE_SIGNAL_ENABLE,
+					  false));
+	ASSERT_EQ(0, tracefs_enable_event(
+			     TRACEFS_DENY_SCOPE_ABSTRACT_UNIX_SOCKET_ENABLE,
+			     false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_FREE_DOMAIN_ENABLE, false));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_FREE_RULESET_ENABLE, false));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/*
+	 * Trigger both allowed and denied accesses to verify neither check_rule
+	 * nor check_access events fire when disabled.
+	 */
+	sandbox_child_fs_access(_metadata, "/usr", LANDLOCK_ACCESS_FS_READ_DIR,
+				LANDLOCK_ACCESS_FS_READ_DIR, "/tmp");
+
+	/* Read trace buffer and verify no landlock events at all. */
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_EQ(0, tracefs_count_matches(buf, "landlock_"))
+	{
+		TH_LOG("Expected 0 landlock events when disabled\n%s", buf);
+	}
+
+	free(buf);
+}
+
+/*
+ * Verifies that landlock_create_ruleset emits a trace event with the correct
+ * handled access masks.
+ */
+TEST_F(trace, create_ruleset)
+{
+	struct landlock_ruleset_attr ruleset_attr = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+		.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP,
+	};
+	int ruleset_fd;
+	char *buf, *dot;
+	char field[64];
+	char expected[32];
+
+	ruleset_fd =
+		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, close(ruleset_fd));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_EQ(1,
+		  tracefs_count_matches(buf, REGEX_CREATE_RULESET(TRACE_TASK)))
+	{
+		TH_LOG("Expected 1 create_ruleset event\n%s", buf);
+	}
+
+	/* Verify handled_fs matches what we requested. */
+	snprintf(expected, sizeof(expected), "0x%x",
+		 (unsigned int)LANDLOCK_ACCESS_FS_READ_FILE);
+	EXPECT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_CREATE_RULESET(TRACE_TASK),
+					"handled_fs", field, sizeof(field)));
+	EXPECT_STREQ(expected, field);
+
+	/* Verify handled_net matches. */
+	snprintf(expected, sizeof(expected), "0x%x",
+		 (unsigned int)LANDLOCK_ACCESS_NET_BIND_TCP);
+	EXPECT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_CREATE_RULESET(TRACE_TASK),
+					"handled_net", field, sizeof(field)));
+	EXPECT_STREQ(expected, field);
+
+	/* Verify version is 0 at creation (no rules added yet). */
+	EXPECT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_CREATE_RULESET(TRACE_TASK),
+					"ruleset", field, sizeof(field)));
+	/* Format is <hex>.<dec>; version is after the dot. */
+	dot = strchr(field, '.');
+	ASSERT_NE(0, !!dot);
+	EXPECT_STREQ("0", dot + 1);
+
+	free(buf);
+}
+
+/*
+ * Verifies that the ruleset version increments with each add_rule call and that
+ * restrict_self records the correct version.
+ */
+TEST_F(trace, ruleset_version)
+{
+	pid_t pid;
+	int status;
+	char *buf;
+	const char *dot;
+	char field[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		struct landlock_path_beneath_attr path_beneath = {
+			.allowed_access = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		/* First rule: version becomes 1. */
+		path_beneath.parent_fd =
+			open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0)
+			_exit(1);
+		landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				  &path_beneath, 0);
+		close(path_beneath.parent_fd);
+
+		/* Second rule: version becomes 2. */
+		path_beneath.parent_fd =
+			open("/tmp", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0)
+			_exit(1);
+		landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				  &path_beneath, 0);
+		close(path_beneath.parent_fd);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0))
+			_exit(1);
+		close(ruleset_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	/* Verify create_ruleset has version=0. */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_CREATE_RULESET(TRACE_TASK),
+					"ruleset", field, sizeof(field)));
+	dot = strchr(field, '.');
+	ASSERT_NE(0, !!dot);
+	EXPECT_STREQ("0", dot + 1);
+
+	/* Verify 2 add_rule_fs events were emitted. */
+	EXPECT_EQ(2, tracefs_count_matches(buf, REGEX_ADD_RULE_FS(TRACE_TASK)))
+	{
+		TH_LOG("Expected 2 add_rule_fs events\n%s", buf);
+	}
+
+	/*
+	 * Verify restrict_self records version=2 (after 2 add_rule calls).  The
+	 * ruleset field format is <hex_id>.<dec_version>.
+	 */
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_RESTRICT_SELF(TRACE_TASK),
+					   "ruleset", field, sizeof(field)));
+	dot = strchr(field, '.');
+	ASSERT_NE(0, !!dot);
+	EXPECT_STREQ("2", dot + 1);
+
+	free(buf);
+}
+
+/*
+ * Verifies that landlock_restrict_self emits a trace event linking the ruleset
+ * ID to the new domain ID.
+ */
+TEST_F(trace, restrict_self)
+{
+	pid_t pid;
+	int status, check_count;
+	char *buf;
+	char ruleset_id[64], domain_id[64], check_domain[64];
+
+	/* Clear before the sandboxed child. */
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		struct landlock_path_beneath_attr path_beneath = {
+			.allowed_access = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd, fd;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0)
+			_exit(1);
+
+		landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				  &path_beneath, 0);
+		close(path_beneath.parent_fd);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0))
+			_exit(1);
+		close(ruleset_fd);
+
+		/* Trigger a check_rule to verify domain_id correlation. */
+		fd = open("/usr", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (fd >= 0)
+			close(fd);
+
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	/* Verify restrict_self event exists. */
+	EXPECT_EQ(1,
+		  tracefs_count_matches(buf, REGEX_RESTRICT_SELF(TRACE_TASK)))
+	{
+		TH_LOG("Expected 1 restrict_self event\n%s", buf);
+	}
+
+	/* Extract the domain ID from restrict_self. */
+	EXPECT_EQ(0, tracefs_extract_field(buf, REGEX_RESTRICT_SELF(TRACE_TASK),
+					   "domain", domain_id,
+					   sizeof(domain_id)));
+
+	/* Extract the ruleset ID from restrict_self. */
+	EXPECT_EQ(0, tracefs_extract_field(buf, REGEX_RESTRICT_SELF(TRACE_TASK),
+					   "ruleset", ruleset_id,
+					   sizeof(ruleset_id)));
+
+	/* Verify domain ID is non-zero. */
+	EXPECT_NE(0, strcmp(domain_id, "0"));
+
+	/* Verify parent=0 (first restriction, no prior domain). */
+	EXPECT_EQ(0, tracefs_extract_field(buf, REGEX_RESTRICT_SELF(TRACE_TASK),
+					   "parent", ruleset_id,
+					   sizeof(ruleset_id)));
+	EXPECT_STREQ("0", ruleset_id);
+
+	/*
+	 * Verify the same domain ID appears in the check_rule event, confirming
+	 * end-to-end correlation.
+	 */
+	check_count =
+		tracefs_count_matches(buf, REGEX_CHECK_RULE_FS(TRACE_TASK));
+	ASSERT_LE(1, check_count)
+	{
+		TH_LOG("Expected check_rule_fs events\n%s", buf);
+	}
+
+	EXPECT_EQ(0, tracefs_extract_field(buf, REGEX_CHECK_RULE_FS(TRACE_TASK),
+					   "domain", check_domain,
+					   sizeof(check_domain)));
+	EXPECT_STREQ(domain_id, check_domain);
+
+	free(buf);
+}
+
+/*
+ * Verifies that nested landlock_restrict_self calls produce trace events with
+ * correct parent domain IDs: the second restrict_self's parent should be the
+ * first domain's ID.
+ */
+TEST_F(trace, restrict_self_nested)
+{
+	pid_t pid;
+	int status;
+	char *buf;
+	const char *after_first;
+	char first_domain[64], first_parent[64], second_parent[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		struct landlock_path_beneath_attr path_beneath = {
+			.allowed_access = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd;
+
+		/* First restriction. */
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0)
+			_exit(1);
+		landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				  &path_beneath, 0);
+		close(path_beneath.parent_fd);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0))
+			_exit(1);
+		close(ruleset_fd);
+
+		/* Second restriction (nested). */
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0)
+			_exit(1);
+		landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				  &path_beneath, 0);
+		close(path_beneath.parent_fd);
+
+		if (landlock_restrict_self(ruleset_fd, 0))
+			_exit(1);
+		close(ruleset_fd);
+
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	/* Should have 2 restrict_self events. */
+	EXPECT_EQ(2,
+		  tracefs_count_matches(buf, REGEX_RESTRICT_SELF(TRACE_TASK)))
+	{
+		TH_LOG("Expected 2 restrict_self events\n%s", buf);
+	}
+
+	/*
+	 * Extract domain and parent from each restrict_self event.  The first
+	 * event (parent=0) is the outer domain; the second (parent!=0) is the
+	 * nested domain whose parent should match the first domain's ID.
+	 */
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_RESTRICT_SELF(TRACE_TASK),
+					   "domain", first_domain,
+					   sizeof(first_domain)));
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_RESTRICT_SELF(TRACE_TASK),
+					   "parent", first_parent,
+					   sizeof(first_parent)));
+	EXPECT_STREQ("0", first_parent);
+
+	/*
+	 * Find the second restrict_self by scanning past the first.
+	 * tracefs_extract_field returns the first match, so search in the
+	 * buffer after the first event.
+	 *
+	 * Skip past the first restrict_self line. tracefs_extract_field
+	 * matches the first line that matches the regex, so passing the
+	 * buffer after the first matching line gives us the second
+	 * event.
+	 */
+	after_first = strstr(buf, "landlock_restrict_self:");
+	ASSERT_NE(NULL, after_first);
+	after_first = strchr(after_first, '\n');
+	ASSERT_NE(NULL, after_first);
+
+	ASSERT_EQ(0, tracefs_extract_field(
+			     after_first + 1, REGEX_RESTRICT_SELF(TRACE_TASK),
+			     "parent", second_parent, sizeof(second_parent)));
+
+	/* The second domain's parent should be the first domain's ID. */
+	EXPECT_STREQ(first_domain, second_parent);
+
+	free(buf);
+}
+
+/*
+ * Verifies that landlock_add_rule does not emit a trace event when the syscall
+ * fails (e.g., invalid ruleset fd).
+ */
+TEST_F(trace, add_rule_invalid_fd)
+{
+	struct landlock_path_beneath_attr path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE,
+	};
+	char *buf;
+
+	path_beneath.parent_fd = open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+
+	/* Invalid ruleset fd (-1). */
+	ASSERT_EQ(-1, landlock_add_rule(-1, LANDLOCK_RULE_PATH_BENEATH,
+					&path_beneath, 0));
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_EQ(0, tracefs_count_matches(buf, REGEX_ADD_RULE_FS(TRACE_TASK)))
+	{
+		TH_LOG("No add_rule_fs event expected on invalid fd\n%s", buf);
+	}
+
+	free(buf);
+}
+
+/*
+ * Verifies that landlock_restrict_self does not emit a trace event when the
+ * syscall fails (e.g., invalid ruleset fd or unknown flags).
+ */
+TEST_F(trace, restrict_self_invalid)
+{
+	struct landlock_ruleset_attr ruleset_attr = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+	};
+	int ruleset_fd;
+	char *buf;
+
+	ruleset_fd =
+		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Clear the trace buffer after create_ruleset event. */
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	/* Invalid fd. */
+	ASSERT_EQ(-1, landlock_restrict_self(-1, 0));
+
+	/* Unknown flags. */
+	ASSERT_EQ(-1, landlock_restrict_self(ruleset_fd, -1));
+
+	ASSERT_EQ(0, close(ruleset_fd));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_EQ(0,
+		  tracefs_count_matches(buf, REGEX_RESTRICT_SELF(TRACE_TASK)))
+	{
+		TH_LOG("No restrict_self event expected on error\n%s", buf);
+	}
+
+	free(buf);
+}
+
+/*
+ * Verifies that trace_landlock_free_domain fires when a domain is deallocated,
+ * with the correct denials count.
+ */
+TEST_F(trace, free_domain)
+{
+	char *buf;
+	int count;
+	char denials_field[32];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	/*
+	 * The domain is freed via a work queue (kworker), so the free_domain
+	 * trace event is emitted from a different PID.  Clear the PID filter
+	 * BEFORE the child exits, so the kworker event passes the filter when
+	 * it fires.
+	 */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_clear_pid_filter();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	sandbox_child_fs_access(_metadata, "/usr", LANDLOCK_ACCESS_FS_READ_DIR,
+				LANDLOCK_ACCESS_FS_READ_DIR, "/tmp");
+
+	/*
+	 * Wait for the deferred deallocation work to run.  The domain is freed
+	 * asynchronously from a kworker; poll until the event appears or a
+	 * timeout is reached.
+	 */
+	for (int retry = 0; retry < 10; retry++) {
+		/* TODO: Improve */
+		usleep(100000);
+
+		set_cap(_metadata, CAP_SYS_ADMIN);
+		buf = tracefs_read_trace();
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		ASSERT_NE(NULL, buf);
+
+		count = tracefs_count_matches(buf,
+					      REGEX_FREE_DOMAIN(KWORKER_TASK));
+		if (count >= 1)
+			break;
+		free(buf);
+		buf = NULL;
+	}
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, tracefs_set_pid_filter(getpid()));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	ASSERT_NE(NULL, buf);
+	EXPECT_LE(1, count)
+	{
+		TH_LOG("Expected free_domain event, got %d\n%s", count, buf);
+	}
+
+	/* Verify denials count matches the single denial we triggered. */
+	EXPECT_EQ(0, tracefs_extract_field(buf, REGEX_FREE_DOMAIN(KWORKER_TASK),
+					   "denials", denials_field,
+					   sizeof(denials_field)));
+	EXPECT_STREQ("1", denials_field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that deny_access_fs includes the enriched fields: same_exec,
+ * log_same_exec, log_new_exec.
+ */
+TEST_F(trace, deny_access_fs_fields)
+{
+	char *buf;
+	char field_buf[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	/* Trigger a denial: rule for /usr, access /tmp. */
+	sandbox_child_fs_access(_metadata, "/usr", LANDLOCK_ACCESS_FS_READ_DIR,
+				LANDLOCK_ACCESS_FS_READ_DIR, "/tmp");
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	/* Verify the enriched fields are present and have valid values. */
+	ASSERT_EQ(0, tracefs_extract_field(
+			     buf, REGEX_DENY_ACCESS_FS(TRACE_TASK), "same_exec",
+			     field_buf, sizeof(field_buf)));
+	/* Child is the same exec that restricted itself. */
+	EXPECT_STREQ("1", field_buf);
+
+	/* Default: log_same_exec=1 (not disabled). */
+	ASSERT_EQ(0, tracefs_extract_field(
+			     buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+			     "log_same_exec", field_buf, sizeof(field_buf)));
+	EXPECT_STREQ("1", field_buf);
+
+	/* Default: log_new_exec=0 (not enabled). */
+	ASSERT_EQ(0, tracefs_extract_field(
+			     buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+			     "log_new_exec", field_buf, sizeof(field_buf)));
+	EXPECT_STREQ("0", field_buf);
+
+	free(buf);
+}
+
+/*
+ * Verifies that same_exec is 1 (true) for denials from the same executable that
+ * called landlock_restrict_self().
+ */
+TEST_F(trace, same_exec_before_exec)
+{
+	pid_t pid;
+	int status;
+	char *buf;
+	char field[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd, dir_fd;
+
+		ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		/* No rules: all read_dir access is denied. */
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0))
+			_exit(1);
+		close(ruleset_fd);
+
+		/* Trigger denial without exec (same executable). */
+		dir_fd = open(".", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (dir_fd >= 0)
+			close(dir_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	/* Should have at least one deny_access_fs denial. */
+	EXPECT_LE(1,
+		  tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK)));
+
+	/* Verify same_exec=1 (same executable, no exec). */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+					"same_exec", field, sizeof(field)));
+	EXPECT_STREQ("1", field);
+
+	/* Verify default log flags. */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+					"log_same_exec", field, sizeof(field)));
+	EXPECT_STREQ("1", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+					"log_new_exec", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that same_exec is 0 (false) for denials from a process that has
+ * exec'd a new binary after landlock_restrict_self().  The sandboxed child
+ * exec's true which opens "." and triggers a read_dir denial.  Also verifies
+ * the default log flags (log_same_exec=1, log_new_exec=0) and covers the
+ * "trace-only" visibility condition: same_exec=0 AND log_new_exec=0 means audit
+ * suppresses the denial, but trace still fires.
+ */
+TEST_F(trace, same_exec_after_exec)
+{
+	char *buf;
+	char field[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	sandbox_child_exec_true(_metadata, 0);
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_LE(1, tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS("true")));
+
+	/* Verify same_exec=0 (different executable after exec). */
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS("true"),
+					   "same_exec", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	/* Default log flags should still be the same. */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS("true"),
+					"log_same_exec", field, sizeof(field)));
+	EXPECT_STREQ("1", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS("true"),
+					"log_new_exec", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF disables log_same_exec
+ * in the trace event.
+ */
+TEST_F(trace, log_flags_same_exec_off)
+{
+	pid_t pid;
+	int status;
+	char *buf;
+	char field[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd, dir_fd;
+
+		ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(
+			    ruleset_fd,
+			    LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF))
+			_exit(1);
+		close(ruleset_fd);
+
+		dir_fd = open(".", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (dir_fd >= 0)
+			close(dir_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_LE(1,
+		  tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK)));
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+					"log_same_exec", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+					"log_new_exec", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON enables log_new_exec in
+ * the trace event.  The child exec's true so that the denial comes from a new
+ * executable (same_exec=0).
+ */
+TEST_F(trace, log_flags_new_exec_on)
+{
+	char *buf;
+	char field[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	sandbox_child_exec_true(_metadata,
+				LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON);
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_LE(1, tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS("true")));
+
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS("true"),
+					   "same_exec", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS("true"),
+					"log_same_exec", field, sizeof(field)));
+	EXPECT_STREQ("1", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS("true"),
+					"log_new_exec", field, sizeof(field)));
+	EXPECT_STREQ("1", field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that denials suppressed by audit log flags are still counted in
+ * num_denials.  The child restricts itself with default flags (log_same_exec=1,
+ * log_new_exec=0), then execs true which attempts to read a denied directory.
+ * After exec, same_exec=0 and log_new_exec=0, so audit suppresses the denial.
+ * But the trace event fires unconditionally and free_domain must report the
+ * correct denials count.
+ */
+TEST_F(trace, non_audit_visible_denial_counting)
+{
+	char *buf = NULL;
+	char denials_field[32];
+	int count;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, tracefs_clear());
+	tracefs_clear_pid_filter();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	sandbox_child_exec_true(_metadata, 0);
+
+	/* Wait for free_domain event with retry. */
+	for (int retry = 0; retry < 10; retry++) {
+		usleep(100000);
+
+		set_cap(_metadata, CAP_SYS_ADMIN);
+		buf = tracefs_read_trace();
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		if (!buf)
+			break;
+
+		count = tracefs_count_matches(buf,
+					      REGEX_FREE_DOMAIN(KWORKER_TASK));
+		if (count >= 1)
+			break;
+		free(buf);
+		buf = NULL;
+	}
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, tracefs_set_pid_filter(getpid()));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/*
+	 * The denial happened after exec (same_exec=0), so audit would suppress
+	 * it.  But num_denials counts all denials regardless.
+	 */
+	ASSERT_NE(NULL, buf)
+	{
+		TH_LOG("free_domain event not found after 10 retries");
+	}
+	EXPECT_EQ(0, tracefs_extract_field(buf, REGEX_FREE_DOMAIN(KWORKER_TASK),
+					   "denials", denials_field,
+					   sizeof(denials_field)));
+	EXPECT_STREQ("1", denials_field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that landlock_add_rule_net emits a trace event with the correct port
+ * and allowed access mask fields.
+ */
+TEST_F(trace, add_rule_net_fields)
+{
+	struct landlock_ruleset_attr ruleset_attr = {
+		.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP,
+	};
+	struct landlock_net_port_attr net_port = {
+		.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
+		.port = 8080,
+	};
+	int ruleset_fd;
+	char *buf;
+	char field[64], expected[32];
+
+	ruleset_fd =
+		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
+				       &net_port, 0));
+	close(ruleset_fd);
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_EQ(1, tracefs_count_matches(buf, REGEX_ADD_RULE_NET(TRACE_TASK)))
+	{
+		TH_LOG("Expected 1 add_rule_net event\n%s", buf);
+	}
+
+	/*
+	 * Verify the port is in host endianness, matching the UAPI
+	 * convention (landlock_net_port_attr.port).  On little-endian,
+	 * htons(8080) is 36895, so this comparison catches byte-order
+	 * bugs.
+	 */
+	EXPECT_EQ(0, tracefs_extract_field(buf, REGEX_ADD_RULE_NET(TRACE_TASK),
+					   "port", field, sizeof(field)));
+	EXPECT_STREQ("8080", field);
+	/*
+	 * The allowed mask is the absolute value after transformation:
+	 * the user-requested BIND_TCP plus all unhandled access rights
+	 * (CONNECT_TCP is unhandled because the ruleset only handles
+	 * BIND_TCP).
+	 */
+	snprintf(expected, sizeof(expected), "0x%x",
+		 (unsigned int)(LANDLOCK_ACCESS_NET_BIND_TCP |
+				LANDLOCK_ACCESS_NET_CONNECT_TCP));
+	EXPECT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_ADD_RULE_NET(TRACE_TASK),
+					"access_rights", field, sizeof(field)));
+	EXPECT_STREQ(expected, field);
+
+	free(buf);
+}
+
+/*
+ * Verifies that LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF disables audit
+ * logging for child domains but trace events still fire.  The parent creates a
+ * domain with LOG_SUBDOMAINS_OFF, then the child creates a sub-domain and
+ * triggers a denial.  The trace event should fire (tracing is unconditional)
+ * with log_same_exec=1 and log_new_exec=0 (the child's default flags).
+ */
+TEST_F(trace, log_flags_subdomains_off)
+{
+	pid_t pid;
+	int status;
+	char *buf;
+	char field[64];
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int parent_fd, child_fd, dir_fd;
+
+		/* Parent domain with LOG_SUBDOMAINS_OFF. */
+		parent_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+		if (parent_fd < 0)
+			_exit(1);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(
+			    parent_fd,
+			    LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF))
+			_exit(1);
+		close(parent_fd);
+
+		/* Child sub-domain with default flags. */
+		child_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+		if (child_fd < 0)
+			_exit(1);
+
+		if (landlock_restrict_self(child_fd, 0))
+			_exit(1);
+		close(child_fd);
+
+		/* Trigger a denial from the child domain. */
+		dir_fd = open(".", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (dir_fd >= 0)
+			close(dir_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	/*
+	 * Trace fires unconditionally even though audit is disabled for the
+	 * child domain (parent had LOG_SUBDOMAINS_OFF).
+	 */
+	EXPECT_LE(1,
+		  tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK)))
+	{
+		TH_LOG("Expected deny_access_fs event despite "
+		       "LOG_SUBDOMAINS_OFF\n%s",
+		       buf);
+	}
+
+	/* The child domain's own flags: log_same_exec=1 (default). */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+					"log_same_exec", field, sizeof(field)));
+	EXPECT_STREQ("1", field);
+
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK),
+					"log_new_exec", field, sizeof(field)));
+	EXPECT_STREQ("0", field);
+
+	free(buf);
+}
+
+/* Verifies that landlock_free_ruleset fires when a ruleset FD is closed. */
+TEST_F(trace, free_ruleset_on_close)
+{
+	struct landlock_ruleset_attr ruleset_attr = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+	};
+	int ruleset_fd;
+	char *buf;
+
+	ruleset_fd =
+		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	/* Closing the FD should trigger free_ruleset. */
+	close(ruleset_fd);
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_EQ(1, tracefs_count_matches(buf, REGEX_FREE_RULESET(TRACE_TASK)))
+	{
+		TH_LOG("Expected 1 free_ruleset event\n%s", buf);
+	}
+
+	free(buf);
+}
+
+/*
+ * The following tests are intentionally elided because the underlying kernel
+ * mechanisms are already validated by audit tests:
+ *
+ * - Domain ID monotonicity: validated by audit_test.c:layers.  The same
+ *   landlock_get_id_range() function serves both audit and trace.
+ *
+ * - Domain deallocation order (LIFO): validated by audit_test.c:layers.  Trace
+ *   events fire from the same free_domain_work() code path.
+ *
+ * - Max-layer stacking (16 domains): validated by audit_test.c:layers.
+ *
+ * - IPv6 network tests: IPv6 hook dispatch uses the same
+ *   current_check_access_socket() as IPv4, validated by net_test.c:audit tests.
+ *
+ * - Per-access-right full matrix (all 16 FS rights): hook dispatch is validated
+ *   by fs_test.c:audit tests.  Trace tests verify representative samples to
+ *   ensure bitmask encoding is correct.
+ *
+ * - Combined log flag variants (e.g., LOG_SUBDOMAINS_OFF + LOG_NEW_EXEC_ON):
+ *   individual flag tests above cover each flag's effect on trace fields.  Flag
+ *   combination logic is validated by audit_test.c:audit_flags tests.
+ *
+ * - fs.refer multi-record denials and fs.change_topology (mount):
+ *   trace_denial() uses the same code path for all FS request types.  The
+ *   DENTRY union member fix (C1) is validated by the deny_access_fs_fields
+ *   test.  Audit tests in fs_test.c cover refer and mount denial specifics.
+ *
+ * - Ptrace TRACEME direction: the tracepoint fires from the same
+ *   hook_ptrace_access_check() for both ATTACH and TRACEME.  Audit tests in
+ *   ptrace_test.c cover both directions.
+ *
+ * - check_rule_net field verification: the tracepoint uses the same
+ *   landlock_unmask_layers() as check_rule_fs, just with a different key type.
+ *   The FS path is validated by trace_fs_test.c tests.
+ */
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/true.c b/tools/testing/selftests/landlock/true.c
index 3f9ccbf52783..1e39b664512d 100644
--- a/tools/testing/selftests/landlock/true.c
+++ b/tools/testing/selftests/landlock/true.c
@@ -1,5 +1,15 @@
 // SPDX-License-Identifier: GPL-2.0
+/*
+ * Minimal helper for Landlock selftests.  Opens its own working directory
+ * before exiting, which may trigger access denials depending on the sandbox
+ * configuration.
+ */
+
+#include <fcntl.h>
+#include <unistd.h>
+
 int main(void)
 {
+	close(open(".", O_RDONLY | O_DIRECTORY | O_CLOEXEC));
 	return 0;
 }
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 17/17] landlock: Document tracepoints
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Add tracepoint documentation to the kernel security documentation.
Describe the complete lifecycle of trace events (create, deny, free),
the enriched denial fields (same_exec, log_same_exec, log_new_exec), and
the design for both stateful (eBPF) and stateless (ftrace) consumers.

Cc: Günther Noack <gnoack@google.com>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 Documentation/admin-guide/LSM/landlock.rst | 210 ++++++++++++++++++++-
 Documentation/security/landlock.rst        |  35 +++-
 Documentation/trace/events-landlock.rst    | 160 ++++++++++++++++
 Documentation/trace/index.rst              |   1 +
 Documentation/userspace-api/landlock.rst   |  11 +-
 5 files changed, 412 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/trace/events-landlock.rst

diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
index 9923874e2156..cad5845b6ec7 100644
--- a/Documentation/admin-guide/LSM/landlock.rst
+++ b/Documentation/admin-guide/LSM/landlock.rst
@@ -1,12 +1,13 @@
 .. SPDX-License-Identifier: GPL-2.0
 .. Copyright © 2025 Microsoft Corporation
+.. Copyright © 2026 Cloudflare
 
 ================================
 Landlock: system-wide management
 ================================
 
 :Author: Mickaël Salaün
-:Date: January 2026
+:Date: April 2026
 
 Landlock can leverage the audit framework to log events.
 
@@ -176,11 +177,218 @@ filters to limit noise with two complementary ways:
   programs,
 - or with audit rules (see :manpage:`auditctl(8)`).
 
+Tracepoints
+===========
+
+Landlock also provides tracepoints as an alternative to audit for
+debugging and observability.  Tracepoints fire unconditionally,
+independent of audit configuration, ``audit_enabled``, and domain log
+flags.  This makes them suitable for always-on monitoring with eBPF or
+for ad-hoc debugging with ``trace-pipe``.  See
+:doc:`/trace/events-landlock` for the complete event reference.
+
+Enabling tracepoints
+--------------------
+
+Enable individual Landlock tracepoints via tracefs::
+
+  # Enable filesystem denial tracing:
+  echo 1 > /sys/kernel/tracing/events/landlock/landlock_deny_access_fs/enable
+
+  # Enable all Landlock events:
+  echo 1 > /sys/kernel/tracing/events/landlock/enable
+
+  # Read the trace output:
+  cat /sys/kernel/tracing/trace_pipe
+
+Available events
+----------------
+
+**Policy setup events:**
+
+- ``landlock_create_ruleset`` -- emitted when a ruleset is created.
+  Fields: ``ruleset`` (ID and version), ``handled_fs``, ``handled_net``,
+  ``scoped``.
+
+- ``landlock_add_rule_fs``, ``landlock_add_rule_net`` -- emitted when a
+  rule is added.  Fields: ``ruleset`` (ID and version),
+  ``access_rights`` (access mask),
+  target identifier (``dev:ino`` and ``path`` for FS, ``port`` for net).
+
+- ``landlock_restrict_self`` -- emitted when a task restricts itself.
+  Fields: ``ruleset`` (ID and version), ``domain`` (new domain ID),
+  ``parent`` (parent domain ID or 0).
+
+**Access check events (hot path):**
+
+- ``landlock_check_rule_fs``, ``landlock_check_rule_net`` -- emitted
+  when a rule matches during an access check.  Fires for every matching
+  rule in the pathwalk, regardless of the final outcome (allowed or
+  denied).
+
+**Denial events:**
+
+- ``landlock_deny_access_fs``, ``landlock_deny_access_net`` -- emitted
+  when a filesystem or network access is denied.
+- ``landlock_deny_ptrace``, ``landlock_deny_scope_signal``,
+  ``landlock_deny_scope_abstract_unix_socket`` -- emitted when a scope
+  check denies access.
+
+  Common fields include:
+
+  - ``domain`` -- the denying domain's ID
+  - ``blockers`` -- the denied access rights (bitmask,
+    ``deny_access_fs`` and ``deny_access_net`` only)
+  - ``same_exec`` -- whether the task is the same executable that
+    called ``landlock_restrict_self()`` for the denying domain
+  - ``log_same_exec``, ``log_new_exec`` -- the domain's configured log
+    flags (useful for filtering expected denials)
+  - Type-specific fields: ``path`` (FS), ``sport``/``dport`` (net),
+    ``tracee_pid``/``comm`` (ptrace), ``target_pid``/``comm`` (signal),
+    ``peer_pid``/``sun_path`` (abstract unix socket)
+
+**Lifecycle events:**
+
+- ``landlock_free_domain`` -- emitted when a domain is deallocated.
+  Fields: ``domain`` (ID), ``denials`` (total denial count).
+- ``landlock_free_ruleset`` -- emitted when a ruleset is freed.
+  Fields: ``ruleset`` (ID and version).
+
+Event samples
+-------------
+
+A sandboxed program tries to read ``/etc/passwd`` with only ``/tmp``
+writable::
+
+  $ echo 1 > /sys/kernel/tracing/events/landlock/enable
+  $ LL_FS_RO=/ LL_FS_RW=/tmp ./sandboxer cat /etc/passwd &
+  $ cat /sys/kernel/tracing/trace_pipe
+  sandboxer-286  landlock_create_ruleset: ruleset=10b556c58.0 handled_fs=0xdfff handled_net=0x0 scoped=0x0
+  sandboxer-286  landlock_restrict_self: ruleset=10b556c58.3 domain=10b556c61 parent=0
+  cat-287        landlock_deny_access_fs: domain=10b556c61 same_exec=0 log_same_exec=1 log_new_exec=0 blockers=0x4 dev=254:2 ino=143821 path=/etc/passwd
+  kworker/0:1-12 landlock_free_domain: domain=10b556c61 denials=1
+
+Unlike audit, tracepoints fire for all denials regardless of the
+domain's log flags.  This means ``deny_access_*`` events appear even
+when ``LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF`` would suppress the
+corresponding audit record.
+
+Filtering with ftrace
+---------------------
+
+Use ftrace filter expressions to select specific events::
+
+  # Only show denials that audit would also log:
+  echo 'same_exec == 1 && log_same_exec == 1 || same_exec == 0 && log_new_exec == 1' > \
+    /sys/kernel/tracing/events/landlock/landlock_deny_access_fs/filter
+
+Using eBPF
+----------
+
+eBPF programs can attach to Landlock tracepoints to build custom
+monitoring.  A stateful eBPF program observes the full event stream and
+maintains per-domain state in BPF maps:
+
+1. On ``landlock_restrict_self``: record the domain ID, parent, flags.
+2. On ``landlock_deny_access_*``: look up the domain, decide whether
+   to count, alert, or ignore the denial based on custom policy.
+3. On ``landlock_free_domain``: clean up the per-domain state, log
+   final statistics.
+
+This approach requires no kernel modification and no Landlock-specific
+BPF helpers.  The Landlock IDs serve as correlation keys across events.
+
+.. _landlock_observability:
+
+When to use tracing vs audit
+-----------------------------
+
+Audit and tracing both help diagnose Landlock policy issues:
+
+**Audit** records denied accesses with the blockers, domain, and object
+identification (path, port).  Audit is the standard Linux mechanism for
+security events, with a stable record format that is well established
+and already supported by log management systems, SIEM platforms, and EDR
+solutions.  Audit is always active (when ``CONFIG_AUDIT`` is set),
+filtered by log flags to reduce noise in production, and designed for
+long-term security monitoring and compliance.
+
+**Tracing** provides deeper introspection for policy debugging.  In
+addition to denied accesses, trace events cover the complete lifecycle
+of Landlock objects (rulesets, domains) and intermediate rule matching
+during access checks.  Trace events are disabled by default (zero
+overhead) and fire unconditionally, regardless of log flags.  eBPF
+programs attached to trace events can access the full kernel context
+(ruleset rules, domain hierarchy, process credentials) via BTF, enabling
+richer analysis than the flat fields in audit records.  For example, an
+eBPF-based live monitoring tool can correlate creation, rule-addition,
+and denial events to build a real-time view of all active Landlock
+domains and their policies.  However, BTF-based access depends on
+internal kernel struct layouts which have no stability guarantee.  CO-RE
+(Compile Once, Run Everywhere) provides best-effort field relocation.
+The ftrace printk format is also not a stable ABI, but is
+self-describing via the per-event ``format`` file, allowing tools to
+adapt dynamically.
+
+Observability guarantees and limitations
+-----------------------------------------
+
+Both audit records and trace events are emitted for every denied access,
+with these exceptions:
+
+- **Log flags** (audit only): ``LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF``,
+  ``LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON``, and
+  ``LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF`` control which denials
+  generate audit records.  Trace events fire regardless of these flags.
+
+- **NOAUDIT hooks**: Some LSM hooks suppress logging for speculative
+  permission probes (e.g., reading ``/proc/<pid>/status`` uses
+  ``PTRACE_MODE_NOAUDIT``).  When NOAUDIT is set, neither audit records
+  nor trace events are emitted, and the denial is not counted in
+  ``denials``.  The denial is still enforced.  This avoids performance
+  overhead and noise from speculative probes that test permissions
+  without performing an actual access.
+
+- **Audit rate limiting**: The audit subsystem may silently drop records
+  when the audit queue is full.  Trace events are not rate-limited.
+
+- **Tracepoint disabled**: When a trace event is disabled (the default
+  state), the tracepoint is a no-op with zero overhead.
+
+When both audit and tracing are active, every logged denial produces both
+an audit record (subject to log flags) and a trace event.  The
+``denials`` count in ``free_domain`` events reflects the total number of
+logged denials, which may be lower than the actual number of enforced
+denials due to NOAUDIT hooks.
+
+.. _landlock_observability_security:
+
+Observability security considerations
+---------------------------------------
+
+Both audit records and trace events expose information about all
+Landlock-sandboxed processes on the system, including filesystem paths
+being accessed, network ports, and process identities.  System
+administrators must ensure that access to audit logs (controlled by the
+audit subsystem configuration) and to trace events (requiring
+``CAP_SYS_ADMIN`` or ``CAP_BPF`` + ``CAP_PERFMON``) is restricted to
+trusted users.
+
+eBPF programs attached to Landlock trace events have access to the full
+kernel context of each event (ruleset rules, domain hierarchy, process
+credentials) via BTF.  This level of access is comparable to
+``CAP_SYS_ADMIN`` and must be treated accordingly.
+
+Audit logs and kernel trace events require elevated privileges and are
+system-wide; they are not designed for per-sandbox unprivileged
+monitoring.
+
 Additional documentation
 ========================
 
 * `Linux Audit Documentation`_
 * Documentation/userspace-api/landlock.rst
+* Documentation/trace/events-landlock.rst
 * Documentation/security/landlock.rst
 * https://landlock.io
 
diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
index c5186526e76f..5ef0164fbafb 100644
--- a/Documentation/security/landlock.rst
+++ b/Documentation/security/landlock.rst
@@ -1,13 +1,14 @@
 .. SPDX-License-Identifier: GPL-2.0
 .. Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
 .. Copyright © 2019-2020 ANSSI
+.. Copyright © 2026 Cloudflare
 
 ==================================
 Landlock LSM: kernel documentation
 ==================================
 
 :Author: Mickaël Salaün
-:Date: March 2026
+:Date: April 2026
 
 Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
 harden a whole system, this feature should be available to any process,
@@ -177,11 +178,43 @@ makes the reasoning much easier and helps avoid pitfalls.
 .. kernel-doc:: security/landlock/domain.h
     :identifiers:
 
+Denial logging
+==============
+
+Access denials are logged through two independent channels: audit
+records and tracepoints.  Both are managed by the common denial
+framework in ``log.c``, compiled under ``CONFIG_SECURITY_LANDLOCK_LOG``
+(automatically selected by ``CONFIG_AUDIT`` or ``CONFIG_TRACEPOINTS``).
+
+Audit records respect audit configuration, domain log flags, and
+``LANDLOCK_LOG_DISABLED``.  Tracepoints fire unconditionally,
+independent of audit configuration and domain log flags.  The denial
+counter (``num_denials``) is always incremented regardless of logging
+configuration.
+
+See Documentation/admin-guide/LSM/landlock.rst for audit record format,
+tracepoint usage, and filtering examples.
+
+.. kernel-doc:: security/landlock/log.h
+    :identifiers:
+
+Trace events
+------------
+
+See :doc:`/trace/events-landlock` for trace event usage and format details.
+
+.. kernel-doc:: include/trace/events/landlock.h
+    :doc: Landlock trace events
+
+.. kernel-doc:: include/trace/events/landlock.h
+    :internal:
+
 Additional documentation
 ========================
 
 * Documentation/userspace-api/landlock.rst
 * Documentation/admin-guide/LSM/landlock.rst
+* Documentation/trace/events-landlock.rst
 * https://landlock.io
 
 .. Links
diff --git a/Documentation/trace/events-landlock.rst b/Documentation/trace/events-landlock.rst
new file mode 100644
index 000000000000..802df09259ce
--- /dev/null
+++ b/Documentation/trace/events-landlock.rst
@@ -0,0 +1,160 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright © 2026 Cloudflare
+
+=====================
+Landlock Trace Events
+=====================
+
+:Date: April 2026
+
+Landlock emits trace events for sandbox lifecycle operations and access
+denials.  These events can be consumed by ftrace (for human-readable
+trace output and filtering) and by eBPF programs (for programmatic
+introspection via BTF).
+
+See Documentation/security/landlock.rst for Landlock kernel internals and
+Documentation/admin-guide/LSM/landlock.rst for system administration.
+
+.. warning::
+
+   Landlock trace events, like audit records, expose sensitive
+   information about all sandboxed processes on the system.  See
+   :ref:`landlock_observability_security` for security considerations
+   and privilege requirements.
+
+See Documentation/userspace-api/landlock.rst for the userspace API.
+
+Event overview
+==============
+
+Landlock trace events are organized in four categories:
+
+**Syscall events** are emitted during Landlock system calls:
+
+- ``landlock_create_ruleset``: a new ruleset is created
+- ``landlock_add_rule_fs``: a filesystem rule is added to a ruleset
+- ``landlock_add_rule_net``: a network port rule is added to a ruleset
+- ``landlock_restrict_self``: a new domain is created from a ruleset
+
+**Denial events** are emitted when an access is denied:
+
+- ``landlock_deny_access_fs``: filesystem access denied
+- ``landlock_deny_access_net``: network access denied
+- ``landlock_deny_ptrace``: ptrace access denied
+- ``landlock_deny_scope_signal``: signal delivery denied
+- ``landlock_deny_scope_abstract_unix_socket``: abstract unix socket
+  access denied
+
+**Rule evaluation events** are emitted during rule matching:
+
+- ``landlock_check_rule_fs``: a filesystem rule is evaluated
+- ``landlock_check_rule_net``: a network port rule is evaluated
+
+**Lifecycle events**:
+
+- ``landlock_free_domain``: a domain is freed
+- ``landlock_free_ruleset``: a ruleset is freed
+
+Enabling events
+===============
+
+Enable all Landlock events::
+
+    echo 1 > /sys/kernel/tracing/events/landlock/enable
+
+Enable a specific event::
+
+    echo 1 > /sys/kernel/tracing/events/landlock/landlock_deny_access_fs/enable
+
+Read the trace output::
+
+    cat /sys/kernel/tracing/trace_pipe
+
+Differences from audit records
+==============================
+
+Tracepoints and audit records both log Landlock denials, but differ
+in some field formats:
+
+- **Paths**: Tracepoints use ``d_absolute_path()`` (namespace-independent
+  absolute paths).  Audit uses ``d_path()`` (relative to the process's
+  chroot).  Tracepoint paths are deterministic regardless of the tracer's
+  mount namespace.
+
+- **Device names**: Tracepoints use numeric ``dev=<major>:<minor>``.
+  Audit uses string ``dev="<s_id>"``.  Numeric format is more precise
+  for machine parsing.
+
+- **Denied access field**: The ``deny_access_fs`` and ``deny_access_net``
+  tracepoints use the ``blockers=`` field name (same as audit).
+  Audit uses human-readable access right names (e.g.,
+  ``blockers=fs.read_file``), while tracepoints use a hex bitmask
+  (e.g., ``blockers=0x4``).  Scope and ptrace tracepoints omit
+  ``blockers`` because the event name identifies the denial type.
+
+- **Scope target names**: Tracepoints use role-specific field names
+  (``tracee_pid``, ``target_pid``, ``peer_pid``) that reflect the
+  semantic of each event.  Audit uses generic names (``opid``, ``ocomm``)
+  because the audit log format is not event-type-specific.
+
+- **Process name**: Scope tracepoints include ``comm=`` in the printk
+  output for stateless consumers.  eBPF consumers can read ``comm``
+  directly from the task_struct via BTF.  The ``comm`` value is treated
+  as untrusted input (escaped via ``__print_untrusted_str``).
+
+Ruleset versioning
+==================
+
+Syscall events include a ruleset version (``ruleset=<hex_id>.<version>``)
+that tracks the number of rules added to the ruleset.  The version is
+incremented on each ``landlock_add_rule()`` call and frozen at
+``landlock_restrict_self()`` time.  This enables trace consumers to
+correlate a domain with the exact set of rules it was created from.
+
+eBPF access
+===========
+
+eBPF programs attached via ``BPF_RAW_TRACEPOINT`` can access the
+tracepoint arguments directly through BTF.  The arguments include both
+standard kernel objects and Landlock-internal objects:
+
+- Standard kernel objects (``struct task_struct``, ``struct sock``,
+  ``struct path``, ``struct dentry``) can be used with existing BPF
+  helpers.
+- Landlock-internal objects (``struct landlock_domain``,
+  ``struct landlock_ruleset``, ``struct landlock_rule``,
+  ``struct landlock_hierarchy``) can be read via ``BPF_CORE_READ``.
+  Internal struct layouts may change between kernel versions; use CO-RE
+  for field relocation.
+
+All pointer arguments in the tracepoint prototypes are guaranteed
+non-NULL.
+
+Audit filtering equivalence
+============================
+
+Denial events include ``same_exec``, ``log_same_exec``, and
+``log_new_exec`` fields.  These allow both stateless (ftrace filter)
+and stateful (eBPF) consumers to replicate the audit subsystem's
+filtering logic::
+
+    # Show only denials that audit would also log:
+    echo 'same_exec==1 && log_same_exec==1 || same_exec==0 && log_new_exec==1' > \
+        /sys/kernel/tracing/events/landlock/landlock_deny_access_fs/filter
+
+Event reference
+===============
+
+.. kernel-doc:: include/trace/events/landlock.h
+    :doc: Landlock trace events
+
+.. kernel-doc:: include/trace/events/landlock.h
+    :internal:
+
+Additional documentation
+========================
+
+* Documentation/userspace-api/landlock.rst
+* Documentation/admin-guide/LSM/landlock.rst
+* Documentation/security/landlock.rst
+* https://landlock.io
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 338bc4d7cfab..d60e010e042b 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -54,6 +54,7 @@ applications.
    events-power
    events-nmi
    events-msr
+   events-landlock
    events-pci
    boottime-trace
    histogram
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index fd8b78c31f2f..e65370212aa1 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -8,7 +8,7 @@ Landlock: unprivileged access control
 =====================================
 
 :Author: Mickaël Salaün
-:Date: March 2026
+:Date: April 2026
 
 The goal of Landlock is to enable restriction of ambient rights (e.g. global
 filesystem or network access) for a set of processes.  Because Landlock
@@ -698,8 +698,12 @@ Starting with the Landlock ABI version 7, it is possible to control logging of
 Landlock audit events with the ``LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF``,
 ``LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON``, and
 ``LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF`` flags passed to
-sys_landlock_restrict_self().  See Documentation/admin-guide/LSM/landlock.rst
-for more details on audit.
+sys_landlock_restrict_self().  These flags control audit record generation.
+Landlock tracepoints are not affected by these flags and always fire when
+enabled, providing an alternative observability channel for debugging and
+monitoring.  See :doc:`/admin-guide/LSM/landlock` for more details
+on audit and tracepoints, and :doc:`/trace/events-landlock` for the
+complete trace event reference.
 
 Thread synchronization (ABI < 8)
 --------------------------------
@@ -814,6 +818,7 @@ Additional documentation
 ========================
 
 * Documentation/admin-guide/LSM/landlock.rst
+* Documentation/trace/events-landlock.rst
 * Documentation/security/landlock.rst
 * https://landlock.io
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 16/17] selftests/landlock: Add scope and ptrace tracepoint tests
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Add trace tests for the landlock_deny_ptrace,
landlock_deny_scope_signal, and landlock_deny_scope_abstract_unix_socket
tracepoints, following the audit test pattern of placing tests alongside
the functional tests for each subsystem.

The ptrace trace test verifies that the landlock_deny_ptrace event fires
when a sandboxed child attempts to ptrace an unsandboxed parent.  The
signal and unix socket tests verify the corresponding scope tracepoints
fire on denied operations.

Cc: Günther Noack <gnoack@google.com>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 .../testing/selftests/landlock/ptrace_test.c  | 164 +++++++++++++++
 .../landlock/scoped_abstract_unix_test.c      | 195 ++++++++++++++++++
 .../selftests/landlock/scoped_signal_test.c   | 150 ++++++++++++++
 3 files changed, 509 insertions(+)

diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
index 1b6c8b53bf33..a72035d1c27b 100644
--- a/tools/testing/selftests/landlock/ptrace_test.c
+++ b/tools/testing/selftests/landlock/ptrace_test.c
@@ -11,7 +11,9 @@
 #include <errno.h>
 #include <fcntl.h>
 #include <linux/landlock.h>
+#include <sched.h>
 #include <signal.h>
+#include <sys/mount.h>
 #include <sys/prctl.h>
 #include <sys/ptrace.h>
 #include <sys/types.h>
@@ -20,6 +22,7 @@
 
 #include "audit.h"
 #include "common.h"
+#include "trace.h"
 
 /* Copied from security/yama/yama_lsm.c */
 #define YAMA_SCOPE_DISABLED 0
@@ -429,4 +432,165 @@ TEST_F(audit, trace)
 	EXPECT_EQ(0, records.domain);
 }
 
+/* Trace tests */
+
+/* clang-format off */
+FIXTURE(trace_ptrace) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace_ptrace)
+{
+	int ret;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+
+	ret = tracefs_fixture_setup();
+	if (ret) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_DENY_PTRACE_ENABLE, true));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN(trace_ptrace)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_enable_event(TRACEFS_DENY_PTRACE_ENABLE, false);
+	tracefs_fixture_teardown();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/* clang-format off */
+FIXTURE_VARIANT(trace_ptrace)
+{
+	/* clang-format on */
+	bool sandbox;
+	int expect_denied;
+};
+
+/* Denied: sandboxed child ptraces unsandboxed parent. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_ptrace, denied) {
+	/* clang-format on */
+	.sandbox = true,
+	.expect_denied = 1,
+};
+
+/* Allowed: unsandboxed child uses PTRACE_TRACEME. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_ptrace, allowed) {
+	/* clang-format on */
+	.sandbox = false,
+	.expect_denied = 0,
+};
+
+TEST_F(trace_ptrace, deny_ptrace)
+{
+	char *buf, field[64], expected_pid[16];
+	int count, status;
+	pid_t child, parent;
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	parent = getpid();
+
+	/*
+	 * Set a known comm so the denied variant can verify both the trace
+	 * line task name and the comm= field.
+	 */
+	prctl(PR_SET_NAME, "ll_trace_test");
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		if (variant->sandbox) {
+			struct landlock_ruleset_attr ruleset_attr = {
+				.scoped = LANDLOCK_SCOPE_SIGNAL,
+			};
+			int ruleset_fd;
+
+			/*
+			 * Any scope creates a domain.  Ptrace denial
+			 * checks domain ancestry, not specific flags.
+			 */
+			ruleset_fd = landlock_create_ruleset(
+				&ruleset_attr, sizeof(ruleset_attr), 0);
+			if (ruleset_fd < 0)
+				_exit(1);
+
+			prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+			if (landlock_restrict_self(ruleset_fd, 0)) {
+				close(ruleset_fd);
+				_exit(1);
+			}
+			close(ruleset_fd);
+
+			/* PTRACE_ATTACH on unsandboxed parent: denied. */
+			if (ptrace(PTRACE_ATTACH, parent, NULL, NULL) == 0) {
+				ptrace(PTRACE_DETACH, parent, NULL, NULL);
+				_exit(2);
+			}
+			if (errno != EPERM)
+				_exit(3);
+		} else {
+			/* No sandbox: ptrace should succeed. */
+			if (ptrace(PTRACE_TRACEME) != 0)
+				_exit(1);
+		}
+
+		_exit(0);
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(buf, REGEX_DENY_PTRACE("ll_trace_test"));
+	if (variant->expect_denied) {
+		EXPECT_LE(variant->expect_denied, count)
+		{
+			TH_LOG("Expected deny_ptrace event, got %d\n%s", count,
+			       buf);
+		}
+
+		/* Verify tracee_pid is the parent's TGID. */
+		snprintf(expected_pid, sizeof(expected_pid), "%d", parent);
+		ASSERT_EQ(0, tracefs_extract_field(
+				     buf, REGEX_DENY_PTRACE("ll_trace_test"),
+				     "tracee_pid", field, sizeof(field)));
+		EXPECT_STREQ(expected_pid, field);
+
+		/* Verify comm matches prctl(PR_SET_NAME). */
+		ASSERT_EQ(0, tracefs_extract_field(
+				     buf, REGEX_DENY_PTRACE("ll_trace_test"),
+				     "comm", field, sizeof(field)));
+		EXPECT_STREQ("ll_trace_test", field);
+	} else {
+		EXPECT_EQ(0, count)
+		{
+			TH_LOG("Expected 0 deny_ptrace events, got %d\n%s",
+			       count, buf);
+		}
+	}
+
+	free(buf);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
index c47491d2d1c1..444df8ead1bf 100644
--- a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
+++ b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
@@ -12,6 +12,7 @@
 #include <sched.h>
 #include <signal.h>
 #include <stddef.h>
+#include <sys/mount.h>
 #include <sys/prctl.h>
 #include <sys/socket.h>
 #include <sys/stat.h>
@@ -23,6 +24,9 @@
 #include "audit.h"
 #include "common.h"
 #include "scoped_common.h"
+#include "trace.h"
+
+#define TRACE_TASK "scoped_abstract"
 
 /* Number of pending connections queue to be hold. */
 const short backlog = 10;
@@ -1145,4 +1149,195 @@ TEST(self_connect)
 		_metadata->exit_code = KSFT_FAIL;
 }
 
+/* Trace tests */
+
+/* clang-format off */
+FIXTURE(trace_unix) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace_unix)
+{
+	int ret;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+
+	ret = tracefs_fixture_setup();
+	if (ret) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	ASSERT_EQ(0, tracefs_enable_event(
+			     TRACEFS_DENY_SCOPE_ABSTRACT_UNIX_SOCKET_ENABLE,
+			     true));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN(trace_unix)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_enable_event(TRACEFS_DENY_SCOPE_ABSTRACT_UNIX_SOCKET_ENABLE,
+			     false);
+	tracefs_fixture_teardown();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/* clang-format off */
+FIXTURE_VARIANT(trace_unix)
+{
+	/* clang-format on */
+	bool sandbox;
+	int expect_denied;
+};
+
+/* Denied: sandboxed child connects to unsandboxed parent's socket. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_unix, denied) {
+	/* clang-format on */
+	.sandbox = true,
+	.expect_denied = 1,
+};
+
+/* Allowed: unsandboxed child connects. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_unix, allowed) {
+	/* clang-format on */
+	.sandbox = false,
+	.expect_denied = 0,
+};
+
+TEST_F(trace_unix, deny_scope_unix)
+{
+	struct sockaddr_un addr = {
+		.sun_family = AF_UNIX,
+	};
+	char *buf, field[128], expected_path[64], expected_pid[16];
+	int server_fd, client_fd, count, status;
+	pid_t child;
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	/* Create an abstract unix socket server in the parent. */
+	server_fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
+	ASSERT_LE(0, server_fd);
+
+	addr.sun_path[0] = '\0';
+	snprintf(addr.sun_path + 1, sizeof(addr.sun_path) - 1,
+		 "landlock_trace_test_%d", getpid());
+
+	ASSERT_EQ(0, bind(server_fd, (struct sockaddr *)&addr,
+			  offsetof(struct sockaddr_un, sun_path) + 1 +
+				  strlen(addr.sun_path + 1)));
+	ASSERT_EQ(0, listen(server_fd, 1));
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		if (variant->sandbox) {
+			struct landlock_ruleset_attr ruleset_attr = {
+				.scoped = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET,
+			};
+			int ruleset_fd;
+
+			ruleset_fd = landlock_create_ruleset(
+				&ruleset_attr, sizeof(ruleset_attr), 0);
+			if (ruleset_fd < 0)
+				_exit(1);
+
+			prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+			if (landlock_restrict_self(ruleset_fd, 0)) {
+				close(ruleset_fd);
+				_exit(1);
+			}
+			close(ruleset_fd);
+		}
+
+		client_fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
+		if (client_fd < 0)
+			_exit(1);
+
+		if (variant->sandbox) {
+			/* Connect should be denied. */
+			if (connect(client_fd, (struct sockaddr *)&addr,
+				    offsetof(struct sockaddr_un, sun_path) + 1 +
+					    strlen(addr.sun_path + 1)) == 0) {
+				close(client_fd);
+				_exit(2);
+			}
+			if (errno != EPERM) {
+				close(client_fd);
+				_exit(3);
+			}
+		} else {
+			/* No sandbox: connect should succeed. */
+			if (connect(client_fd, (struct sockaddr *)&addr,
+				    offsetof(struct sockaddr_un, sun_path) + 1 +
+					    strlen(addr.sun_path + 1)) != 0) {
+				close(client_fd);
+				_exit(2);
+			}
+		}
+		close(client_fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+	close(server_fd);
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(
+		buf, REGEX_DENY_SCOPE_ABSTRACT_UNIX_SOCKET(TRACE_TASK));
+	if (variant->expect_denied) {
+		EXPECT_LE(variant->expect_denied, count)
+		{
+			TH_LOG("Expected deny_scope_abstract_unix_socket "
+			       "event, got %d\n%s",
+			       count, buf);
+		}
+
+		/* Verify sun_path (trace skips the leading NUL). */
+		snprintf(expected_path, sizeof(expected_path),
+			 "landlock_trace_test_%d", getpid());
+		ASSERT_EQ(0, tracefs_extract_field(
+				     buf,
+				     REGEX_DENY_SCOPE_ABSTRACT_UNIX_SOCKET(
+					     TRACE_TASK),
+				     "sun_path", field, sizeof(field)));
+		EXPECT_STREQ(expected_path, field);
+
+		/* Verify peer_pid is the parent's PID. */
+		snprintf(expected_pid, sizeof(expected_pid), "%d", getpid());
+		ASSERT_EQ(0, tracefs_extract_field(
+				     buf,
+				     REGEX_DENY_SCOPE_ABSTRACT_UNIX_SOCKET(
+					     TRACE_TASK),
+				     "peer_pid", field, sizeof(field)));
+		EXPECT_STREQ(expected_pid, field);
+	} else {
+		EXPECT_EQ(0, count)
+		{
+			TH_LOG("Expected 0 deny_scope events, got %d\n%s",
+			       count, buf);
+		}
+	}
+
+	free(buf);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/scoped_signal_test.c b/tools/testing/selftests/landlock/scoped_signal_test.c
index d8bf33417619..811dc4b9358d 100644
--- a/tools/testing/selftests/landlock/scoped_signal_test.c
+++ b/tools/testing/selftests/landlock/scoped_signal_test.c
@@ -10,7 +10,9 @@
 #include <fcntl.h>
 #include <linux/landlock.h>
 #include <pthread.h>
+#include <sched.h>
 #include <signal.h>
+#include <sys/mount.h>
 #include <sys/prctl.h>
 #include <sys/types.h>
 #include <sys/wait.h>
@@ -18,6 +20,9 @@
 
 #include "common.h"
 #include "scoped_common.h"
+#include "trace.h"
+
+#define TRACE_TASK "scoped_signal_t"
 
 /* This variable is used for handling several signals. */
 static volatile sig_atomic_t is_signaled;
@@ -559,4 +564,149 @@ TEST_F(fown, sigurg_socket)
 		_metadata->exit_code = KSFT_FAIL;
 }
 
+/* Trace tests */
+
+/* clang-format off */
+FIXTURE(trace_signal) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace_signal)
+{
+	int ret;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+
+	ret = tracefs_fixture_setup();
+	if (ret) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	ASSERT_EQ(0,
+		  tracefs_enable_event(TRACEFS_DENY_SCOPE_SIGNAL_ENABLE, true));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN(trace_signal)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_enable_event(TRACEFS_DENY_SCOPE_SIGNAL_ENABLE, false);
+	tracefs_fixture_teardown();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/* clang-format off */
+FIXTURE_VARIANT(trace_signal)
+{
+	/* clang-format on */
+	bool sandbox;
+	int expect_denied;
+};
+
+/* Denied: sandboxed child signals unsandboxed parent. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_signal, denied) {
+	/* clang-format on */
+	.sandbox = true,
+	.expect_denied = 1,
+};
+
+/* Allowed: unsandboxed child signals unsandboxed parent. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(trace_signal, allowed) {
+	/* clang-format on */
+	.sandbox = false,
+	.expect_denied = 0,
+};
+
+TEST_F(trace_signal, deny_scope_signal)
+{
+	char *buf, field[64], expected_pid[16];
+	int count, status;
+	pid_t child;
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		if (variant->sandbox) {
+			struct landlock_ruleset_attr ruleset_attr = {
+				.scoped = LANDLOCK_SCOPE_SIGNAL,
+			};
+			int ruleset_fd;
+
+			ruleset_fd = landlock_create_ruleset(
+				&ruleset_attr, sizeof(ruleset_attr), 0);
+			if (ruleset_fd < 0)
+				_exit(1);
+
+			prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+			if (landlock_restrict_self(ruleset_fd, 0)) {
+				close(ruleset_fd);
+				_exit(1);
+			}
+			close(ruleset_fd);
+		}
+
+		if (variant->sandbox) {
+			/* Signal to unsandboxed parent should be denied. */
+			if (kill(getppid(), 0) == 0)
+				_exit(2);
+			if (errno != EPERM)
+				_exit(3);
+		} else {
+			/* No sandbox: kill should succeed. */
+			if (kill(getppid(), 0) != 0)
+				_exit(1);
+		}
+
+		_exit(0);
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(buf, REGEX_DENY_SCOPE_SIGNAL(TRACE_TASK));
+	if (variant->expect_denied) {
+		EXPECT_LE(variant->expect_denied, count)
+		{
+			TH_LOG("Expected deny_scope_signal event, got %d\n%s",
+			       count, buf);
+		}
+
+		/* Verify target_pid is the parent's PID. */
+		snprintf(expected_pid, sizeof(expected_pid), "%d", getpid());
+		ASSERT_EQ(0, tracefs_extract_field(
+				     buf, REGEX_DENY_SCOPE_SIGNAL(TRACE_TASK),
+				     "target_pid", field, sizeof(field)));
+		EXPECT_STREQ(expected_pid, field);
+	} else {
+		EXPECT_EQ(0, count)
+		{
+			TH_LOG("Expected 0 deny_scope_signal events, "
+			       "got %d\n%s",
+			       count, buf);
+		}
+	}
+
+	free(buf);
+}
+
 TEST_HARNESS_MAIN
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 14/17] selftests/landlock: Add filesystem tracepoint tests
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Add filesystem-specific trace tests in a dedicated test file, following
the same pattern as audit tests which live alongside the functional
tests for each subsystem.

Tests in trace_fs_test.c verify that:
- landlock_add_rule_fs events fire with correct path and fields,
- landlock_check_rule_fs events fire when rules match during pathwalk
  and do not fire for unhandled access types,
- landlock_deny_access_fs events fire on denied accesses,
- nested domains produce both check_rule and deny_access events,
- no trace events fire without a Landlock sandbox (unsandboxed
  baseline).

Add trace_layout1 fixture tests in fs_test.c for field verification
(check_rule_fs_fields) and multi-rule pathwalk
(check_rule_fs_multiple_rules) that reuse the layout1 filesystem
hierarchy.

Cc: Günther Noack <gnoack@google.com>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 tools/testing/selftests/landlock/fs_test.c    | 218 ++++++++++
 .../selftests/landlock/trace_fs_test.c        | 390 ++++++++++++++++++
 2 files changed, 608 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/trace_fs_test.c

diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index cdb47fc1fc0a..8f1ab43a07a0 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -44,6 +44,9 @@
 
 #include "audit.h"
 #include "common.h"
+#include "trace.h"
+
+#define TRACE_TASK "fs_test"
 
 #ifndef renameat2
 int renameat2(int olddirfd, const char *oldpath, int newdirfd,
@@ -7764,4 +7767,219 @@ TEST_F(audit_layout1, mount)
 	EXPECT_EQ(1, records.domain);
 }
 
+/* clang-format off */
+FIXTURE(trace_layout1) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace_layout1)
+{
+	struct stat st;
+
+	/*
+	 * Check tracefs availability before creating the layout, following the
+	 * layout3_fs pattern: skip before any layout creation to avoid leaving
+	 * stale TMP_DIR on skip.
+	 */
+	if (stat(TRACEFS_LANDLOCK_DIR, &st)) {
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	/* Isolate tracefs state (PID filter, event enables). */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	prepare_layout(_metadata);
+	create_layout1(_metadata);
+
+	set_cap(_metadata, CAP_DAC_OVERRIDE);
+	ASSERT_EQ(0, tracefs_fixture_setup());
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_CHECK_RULE_FS_ENABLE, true));
+	ASSERT_EQ(0, tracefs_clear());
+	ASSERT_EQ(0, tracefs_set_pid_filter(getpid()));
+	clear_cap(_metadata, CAP_DAC_OVERRIDE);
+}
+
+FIXTURE_TEARDOWN_PARENT(trace_layout1)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	set_cap(_metadata, CAP_DAC_OVERRIDE);
+	tracefs_enable_event(TRACEFS_CHECK_RULE_FS_ENABLE, false);
+	tracefs_clear_pid_filter();
+	tracefs_fixture_teardown();
+	clear_cap(_metadata, CAP_DAC_OVERRIDE);
+
+	remove_layout1(_metadata);
+	cleanup_layout(_metadata);
+}
+
+/*
+ * Verifies that check_rule_fs events include correct field values: domain, dev,
+ * ino, request, and allowed.  All values are verified against stat() of the
+ * rule path on a deterministic tmpfs layout.
+ */
+TEST_F(trace_layout1, check_rule_fs_fields)
+{
+	struct stat dir_stat;
+	char expected_dev[32];
+	char expected_ino[32];
+	char expected_req[32];
+	char *buf;
+	char field[64];
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	ASSERT_EQ(0, stat(dir_s1d1, &dir_stat));
+	snprintf(expected_dev, sizeof(expected_dev), "%u:%u",
+		 major(dir_stat.st_dev), minor(dir_stat.st_dev));
+	snprintf(expected_ino, sizeof(expected_ino), "%lu", dir_stat.st_ino);
+	snprintf(expected_req, sizeof(expected_req), "0x%x",
+		 (unsigned int)LANDLOCK_ACCESS_FS_READ_DIR);
+
+	set_cap(_metadata, CAP_DAC_OVERRIDE);
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_DAC_OVERRIDE);
+
+	sandbox_child_fs_access(_metadata, dir_s1d1,
+				LANDLOCK_ACCESS_FS_READ_DIR,
+				LANDLOCK_ACCESS_FS_READ_DIR, dir_s1d1);
+
+	set_cap(_metadata, CAP_DAC_OVERRIDE);
+	buf = tracefs_read_trace();
+	clear_cap(_metadata, CAP_DAC_OVERRIDE);
+	ASSERT_NE(NULL, buf);
+
+	EXPECT_EQ(1,
+		  tracefs_count_matches(buf, REGEX_CHECK_RULE_FS(TRACE_TASK)))
+	{
+		TH_LOG("Expected 1 check_rule_fs event\n%s", buf);
+	}
+
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_CHECK_RULE_FS(TRACE_TASK),
+					   "dev", field, sizeof(field)));
+	EXPECT_STREQ(expected_dev, field)
+	{
+		TH_LOG("Expected dev=%s, got %s", expected_dev, field);
+	}
+
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_CHECK_RULE_FS(TRACE_TASK),
+					   "ino", field, sizeof(field)));
+	EXPECT_STREQ(expected_ino, field)
+	{
+		TH_LOG("Expected ino=%s, got %s", expected_ino, field);
+	}
+
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_CHECK_RULE_FS(TRACE_TASK),
+					   "request", field, sizeof(field)));
+	EXPECT_STREQ(expected_req, field)
+	{
+		TH_LOG("Expected request=%s, got %s", expected_req, field);
+	}
+
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_CHECK_RULE_FS(TRACE_TASK),
+					   "allowed", field, sizeof(field)));
+	EXPECT_EQ('{', field[0])
+	{
+		TH_LOG("Expected allowed={...}, got %s", field);
+	}
+
+	free(buf);
+}
+
+/*
+ * Verifies check_rule_fs behavior with multiple rules.  With rules at s1d1 and
+ * s1d2 (a child of s1d1), accessing s1d2 produces only 1 event because the
+ * pathwalk short-circuits after the first rule fully unmasks the single layer.
+ */
+TEST_F(trace_layout1, check_rule_fs_multiple_rules)
+{
+	pid_t pid;
+	int status;
+	char *buf;
+	int count;
+
+	if (!self->tracefs_ok)
+		SKIP(return, "tracefs not available");
+
+	set_cap(_metadata, CAP_DAC_OVERRIDE);
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_DAC_OVERRIDE);
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		struct landlock_path_beneath_attr path_beneath = {
+			.allowed_access = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd, fd;
+
+		ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open(dir_s1d1, O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0)
+			_exit(1);
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				      &path_beneath, 0))
+			_exit(1);
+		close(path_beneath.parent_fd);
+
+		path_beneath.parent_fd =
+			open(dir_s1d2, O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0)
+			_exit(1);
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				      &path_beneath, 0))
+			_exit(1);
+		close(path_beneath.parent_fd);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0))
+			_exit(1);
+		close(ruleset_fd);
+
+		fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (fd >= 0)
+			close(fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	set_cap(_metadata, CAP_DAC_OVERRIDE);
+	buf = tracefs_read_trace();
+	clear_cap(_metadata, CAP_DAC_OVERRIDE);
+	ASSERT_NE(NULL, buf);
+
+	/*
+	 * Only 1 check_rule_fs event: the rule on dir_s1d2 fully unmasked the
+	 * single layer, so the pathwalk short-circuits before reaching the
+	 * dir_s1d1 rule.
+	 */
+	count = tracefs_count_matches(buf, REGEX_CHECK_RULE_FS(TRACE_TASK));
+	EXPECT_EQ(1, count)
+	{
+		TH_LOG("Expected 1 check_rule_fs event, got %d\n%s", count,
+		       buf);
+	}
+
+	free(buf);
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/trace_fs_test.c b/tools/testing/selftests/landlock/trace_fs_test.c
new file mode 100644
index 000000000000..60ed63aea049
--- /dev/null
+++ b/tools/testing/selftests/landlock/trace_fs_test.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Filesystem tracepoints
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "common.h"
+#include "trace.h"
+
+#define TRACE_TASK "trace_fs_test"
+
+/* clang-format off */
+FIXTURE(trace_fs) {
+	/* clang-format on */
+	int tracefs_ok;
+};
+
+FIXTURE_SETUP(trace_fs)
+{
+	int ret;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL));
+
+	ret = tracefs_fixture_setup();
+	if (ret) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		self->tracefs_ok = 0;
+		SKIP(return, "tracefs not available");
+	}
+	self->tracefs_ok = 1;
+
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_ADD_RULE_FS_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_CHECK_RULE_FS_ENABLE, true));
+	ASSERT_EQ(0, tracefs_enable_event(TRACEFS_DENY_ACCESS_FS_ENABLE, true));
+	ASSERT_EQ(0, tracefs_clear());
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN(trace_fs)
+{
+	if (!self->tracefs_ok)
+		return;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	tracefs_enable_event(TRACEFS_ADD_RULE_FS_ENABLE, false);
+	tracefs_enable_event(TRACEFS_CHECK_RULE_FS_ENABLE, false);
+	tracefs_enable_event(TRACEFS_DENY_ACCESS_FS_ENABLE, false);
+	tracefs_fixture_teardown();
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * Baseline: verifies that without Landlock, the operation succeeds and no
+ * check_rule or deny_access trace events fire.
+ */
+TEST_F(trace_fs, unsandboxed)
+{
+	char *buf;
+	int count, status, fd;
+	pid_t pid;
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		/*
+		 * No sandbox: verify that a normal FS access does not produce
+		 * Landlock trace events.
+		 */
+		fd = open("/usr", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (fd >= 0)
+			close(fd);
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(buf, REGEX_CHECK_RULE_FS(TRACE_TASK));
+	EXPECT_EQ(0, count);
+	count = tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK));
+	EXPECT_EQ(0, count);
+
+	free(buf);
+}
+
+/*
+ * Verifies that adding a filesystem rule emits a landlock_add_rule_fs trace
+ * event with the expected path and field values: ruleset ID is non-zero,
+ * access_rights is non-zero, and path matches.
+ */
+TEST_F(trace_fs, add_rule_fs)
+{
+	struct landlock_ruleset_attr ruleset_attr = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE |
+				     LANDLOCK_ACCESS_FS_WRITE_FILE |
+				     LANDLOCK_ACCESS_FS_READ_DIR,
+	};
+	struct landlock_path_beneath_attr path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE,
+	};
+	char *buf, field_buf[64];
+	int ruleset_fd, count;
+
+	ruleset_fd =
+		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+
+	path_beneath.parent_fd = open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				       &path_beneath, 0));
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+	ASSERT_EQ(0, close(ruleset_fd));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(buf, REGEX_ADD_RULE_FS(TRACE_TASK));
+	EXPECT_EQ(1, count)
+	{
+		TH_LOG("Expected 1 add_rule_fs event, got %d\n%s", count, buf);
+	}
+
+	/* Ruleset ID should be non-zero. */
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_ADD_RULE_FS(TRACE_TASK),
+					   "ruleset", field_buf,
+					   sizeof(field_buf)));
+	EXPECT_STRNE("0", field_buf);
+
+	/* Access rights should be non-zero. */
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_ADD_RULE_FS(TRACE_TASK),
+					   "access_rights", field_buf,
+					   sizeof(field_buf)));
+	EXPECT_STRNE("0x0", field_buf);
+
+	/* Path should be /usr. */
+	ASSERT_EQ(0,
+		  tracefs_extract_field(buf, REGEX_ADD_RULE_FS(TRACE_TASK),
+					"path", field_buf, sizeof(field_buf)));
+	EXPECT_STREQ("/usr", field_buf);
+
+	free(buf);
+}
+
+/*
+ * Verifies that an allowed access emits check_rule events (rule matched during
+ * pathwalk) but does NOT emit deny_access events (no denial).
+ */
+TEST_F(trace_fs, allowed_access)
+{
+	char *buf, field_buf[64];
+	int count;
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	/* Rule allows READ_DIR for /usr, access /usr which is allowed. */
+	sandbox_child_fs_access(_metadata, "/usr", LANDLOCK_ACCESS_FS_READ_DIR,
+				LANDLOCK_ACCESS_FS_READ_DIR, "/usr");
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(buf, REGEX_CHECK_RULE_FS(TRACE_TASK));
+	EXPECT_LE(1, count);
+
+	/* Single-layer allowed array: {0x<mask>}. */
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_CHECK_RULE_FS(TRACE_TASK),
+					   "allowed", field_buf,
+					   sizeof(field_buf)));
+	EXPECT_EQ('{', field_buf[0]);
+
+	count = tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK));
+	EXPECT_EQ(0, count);
+
+	free(buf);
+}
+
+/*
+ * Verifies that accessing a path whose access type is not in the handled set
+ * does not emit landlock_check_rule events.  The ruleset handles READ_FILE,
+ * but the directory open checks READ_DIR which is unhandled; Landlock has no
+ * opinion and no rule evaluation occurs.
+ */
+TEST_F(trace_fs, check_rule_unhandled)
+{
+	char *buf;
+	int count;
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	/* Handles READ_FILE only; READ_DIR is unhandled. */
+	sandbox_child_fs_access(_metadata, "/usr", LANDLOCK_ACCESS_FS_READ_FILE,
+				LANDLOCK_ACCESS_FS_READ_FILE, "/tmp");
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	/* No check_rule events because READ_DIR is not in the handled set. */
+	count = tracefs_count_matches(buf, REGEX_CHECK_RULE_FS(TRACE_TASK));
+	EXPECT_EQ(0, count);
+
+	free(buf);
+}
+
+/*
+ * Verifies that nested domains (child sandboxed under a parent domain) emit
+ * check_rule events from both layers and produce a deny_access event when the
+ * inner domain's rule does not cover the access.
+ */
+TEST_F(trace_fs, check_rule_nested)
+{
+	char *buf, field_buf[64], *comma;
+	size_t first_len, second_len;
+	int count_rule, count_access, status;
+	pid_t pid;
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	pid = fork();
+	ASSERT_LE(0, pid);
+
+	if (pid == 0) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		struct landlock_path_beneath_attr path_beneath = {
+			.allowed_access = LANDLOCK_ACCESS_FS_READ_DIR,
+		};
+		int ruleset_fd, fd;
+
+		/* First layer: allow /usr. */
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				      &path_beneath, 0)) {
+			close(path_beneath.parent_fd);
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(path_beneath.parent_fd);
+
+		prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+		if (landlock_restrict_self(ruleset_fd, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(ruleset_fd);
+
+		/* Second layer: also allow /usr. */
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		if (ruleset_fd < 0)
+			_exit(1);
+
+		path_beneath.parent_fd =
+			open("/usr", O_PATH | O_DIRECTORY | O_CLOEXEC);
+		if (path_beneath.parent_fd < 0) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				      &path_beneath, 0)) {
+			close(path_beneath.parent_fd);
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(path_beneath.parent_fd);
+
+		if (landlock_restrict_self(ruleset_fd, 0)) {
+			close(ruleset_fd);
+			_exit(1);
+		}
+		close(ruleset_fd);
+
+		/* Access /usr which is allowed by both layers. */
+		fd = open("/usr", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (fd >= 0)
+			close(fd);
+
+		/* Access /tmp which has no rule in either layer. */
+		fd = open("/tmp", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+		if (fd >= 0)
+			close(fd);
+
+		_exit(0);
+	}
+
+	ASSERT_EQ(pid, waitpid(pid, &status, 0));
+	ASSERT_TRUE(WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count_rule =
+		tracefs_count_matches(buf, REGEX_CHECK_RULE_FS(TRACE_TASK));
+	EXPECT_LE(1, count_rule);
+
+	/*
+	 * Both layers have the same rule, so the allowed array must
+	 * have two identical entries: {0x<mask>,0x<mask>}.
+	 */
+	ASSERT_EQ(0, tracefs_extract_field(buf, REGEX_CHECK_RULE_FS(TRACE_TASK),
+					   "allowed", field_buf,
+					   sizeof(field_buf)));
+	comma = strchr(field_buf, ',');
+	EXPECT_NE(0, !!comma);
+	if (comma) {
+		/*
+		 * Verify both entries are identical: compare the
+		 * substring before the comma with the substring after
+		 * it (stripping the braces).
+		 */
+		first_len = comma - field_buf - 1;
+		second_len = strlen(comma + 1) - 1;
+		EXPECT_EQ(first_len, second_len);
+		EXPECT_EQ(0, strncmp(field_buf + 1, comma + 1, first_len));
+	}
+
+	count_access =
+		tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK));
+	EXPECT_LE(1, count_access);
+
+	free(buf);
+}
+
+/*
+ * Verifies that a denied FS access emits a landlock_deny_access_fs trace event
+ * with the blocked access and path.
+ */
+TEST_F(trace_fs, deny_access_fs_denied)
+{
+	char *buf;
+	int count;
+
+	ASSERT_EQ(0, tracefs_clear_buf());
+
+	/*
+	 * Rule allows READ_DIR for /usr, but access /tmp which has no rule.
+	 * READ_DIR access to /tmp is denied by absence and should emit a
+	 * deny_access_fs event.
+	 */
+	sandbox_child_fs_access(_metadata, "/usr", LANDLOCK_ACCESS_FS_READ_DIR,
+				LANDLOCK_ACCESS_FS_READ_DIR, "/tmp");
+
+	buf = tracefs_read_buf();
+	ASSERT_NE(NULL, buf);
+
+	count = tracefs_count_matches(buf, REGEX_DENY_ACCESS_FS(TRACE_TASK));
+	EXPECT_LE(1, count);
+
+	free(buf);
+}
+
+TEST_HARNESS_MAIN
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 08/17] landlock: Add restrict_self and free_domain tracepoints
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Add a tracepoint for sandbox enforcement, emitted from the
landlock_restrict_self() syscall handler after the new domain is
created.  This logs both the source ruleset ID (with its version at the
time of the merge) and the new domain ID, enabling trace consumers to
correlate add_rule events (which use the ruleset ID) with check_rule
events (which use the domain ID).

The TP_PROTO takes only the ruleset and domain pointers.  The ruleset
version and parent domain ID are computed in TP_fast_assign from these
pointers rather than passed as scalar arguments.  This lets eBPF
programs access the full ruleset and domain state via BTF on just two
pointers.  TP_fast_assign includes lockdep_assert_held(&ruleset->lock)
to enforce that the caller holds the ruleset lock during emission,
ensuring eBPF programs see a consistent ruleset->version via BTF.

Move the ruleset lock acquisition from landlock_merge_ruleset() to the
caller so the lock is held across the merge, TSYNC, and tracepoint
emission.  The tracepoint fires only after all fallible operations
(including TSYNC) have succeeded, so every event corresponds to a domain
that is actually installed.

The flags-only restrict_self path (ruleset_fd == -1) does not create a
domain and does not emit this event.  restrict_self flags that affect
logging (log_same_exec, log_new_exec) are accessible via BTF on
domain->hierarchy.

Add a landlock_free_domain tracepoint that fires when a domain's
hierarchy node is freed.  The hierarchy node is the lifecycle boundary
because it represents the domain's identity and outlives the domain's
access masks, which may still be active in descendant domains.

Domain freeing is asynchronous: it happens in a workqueue because the
credential free path runs in RCU callback context where the teardown
chain's sleeping operations (iput, audit_log_start, put_pid) are
forbidden.

Cc: Günther Noack <gnoack@google.com>
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 include/trace/events/landlock.h | 69 +++++++++++++++++++++++++++++++++
 security/landlock/domain.c      | 20 +++++-----
 security/landlock/log.c         |  5 +++
 security/landlock/syscalls.c    | 23 ++++++++++-
 4 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/landlock.h b/include/trace/events/landlock.h
index f1e96c447b97..533aea6152e1 100644
--- a/include/trace/events/landlock.h
+++ b/include/trace/events/landlock.h
@@ -12,6 +12,8 @@
 
 #include <linux/tracepoint.h>
 
+struct landlock_domain;
+struct landlock_hierarchy;
 struct landlock_ruleset;
 struct path;
 
@@ -165,6 +167,73 @@ TRACE_EVENT(landlock_add_rule_net,
 	    TP_printk("ruleset=%llx.%u access_rights=0x%x port=%llu",
 		      __entry->ruleset_id, __entry->ruleset_version,
 		      __entry->access_rights, __entry->port));
+
+/**
+ * landlock_restrict_self - new domain created from landlock_restrict_self()
+ * @ruleset: Source ruleset frozen into the domain (never NULL); caller
+ *           holds ruleset->lock for BTF consistency.  eBPF programs can
+ *           read the full ruleset state via BTF (rules, version, access
+ *           masks).
+ * @domain: Newly created domain (never NULL, immutable after creation).
+ *          eBPF programs can navigate domain->hierarchy->parent for the
+ *          parent domain chain.
+ *
+ * Emitted after the domain is successfully installed (including TSYNC
+ * if requested).  The flags-only restrict_self path (ruleset_fd == -1)
+ * does not create a domain and does not emit this event.  Restrict_self
+ * flags that affect logging (log_same_exec, log_new_exec) are accessible
+ * via BTF on domain->hierarchy.
+ */
+TRACE_EVENT(landlock_restrict_self,
+
+	    TP_PROTO(const struct landlock_ruleset *ruleset,
+		     const struct landlock_domain *domain),
+
+	    TP_ARGS(ruleset, domain),
+
+	    TP_STRUCT__entry(__field(__u64, ruleset_id)
+				     __field(__u32, ruleset_version)
+					     __field(__u64, domain_id)
+						     __field(__u64, parent_id)),
+
+	    TP_fast_assign(
+		    lockdep_assert_held(&ruleset->lock);
+		    __entry->ruleset_id = ruleset->id;
+		    __entry->ruleset_version = ruleset->version;
+		    __entry->domain_id = domain->hierarchy->id;
+		    __entry->parent_id = domain->hierarchy->parent ?
+						 domain->hierarchy->parent->id :
+						 0;),
+
+	    TP_printk("ruleset=%llx.%u domain=%llx parent=%llx",
+		      __entry->ruleset_id, __entry->ruleset_version,
+		      __entry->domain_id, __entry->parent_id));
+
+/**
+ * landlock_free_domain - domain freed
+ * @hierarchy: Hierarchy node being freed (never NULL); eBPF can read
+ *             hierarchy->details (creator identity), hierarchy->parent
+ *             (domain chain), and hierarchy->log_status via BTF
+ *
+ * Emitted when the domain's last reference is dropped, either
+ * asynchronously from a kworker (via landlock_put_domain_deferred) or
+ * synchronously from the calling task (via landlock_put_domain).
+ */
+TRACE_EVENT(landlock_free_domain,
+
+	    TP_PROTO(const struct landlock_hierarchy *hierarchy),
+
+	    TP_ARGS(hierarchy),
+
+	    TP_STRUCT__entry(__field(__u64, domain_id) __field(__u64, denials)),
+
+	    TP_fast_assign(
+		    __entry->domain_id = hierarchy->id;
+		    __entry->denials = atomic64_read(&hierarchy->num_denials);),
+
+	    TP_printk("domain=%llx denials=%llu", __entry->domain_id,
+		      __entry->denials));
+
 #endif /* _TRACE_LANDLOCK_H */
 
 /* This part must be outside protection */
diff --git a/security/landlock/domain.c b/security/landlock/domain.c
index 0dfd53ae9dd7..45ee7ec87957 100644
--- a/security/landlock/domain.c
+++ b/security/landlock/domain.c
@@ -294,31 +294,28 @@ static int merge_ruleset(struct landlock_domain *const dst,
 	if (WARN_ON_ONCE(!dst || !dst->hierarchy))
 		return -EINVAL;
 
-	mutex_lock(&src->lock);
+	lockdep_assert_held(&src->lock);
 
 	/* Stacks the new layer. */
-	if (WARN_ON_ONCE(dst->num_layers < 1)) {
-		err = -EINVAL;
-		goto out_unlock;
-	}
+	if (WARN_ON_ONCE(dst->num_layers < 1))
+		return -EINVAL;
+
 	dst->layers[dst->num_layers - 1] =
 		landlock_upgrade_handled_access_masks(src->layer);
 
 	/* Merges the @src inode tree. */
 	err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
 	if (err)
-		goto out_unlock;
+		return err;
 
 #if IS_ENABLED(CONFIG_INET)
 	/* Merges the @src network port tree. */
 	err = merge_tree(dst, src, LANDLOCK_KEY_NET_PORT);
 	if (err)
-		goto out_unlock;
+		return err;
 #endif /* IS_ENABLED(CONFIG_INET) */
 
-out_unlock:
-	mutex_unlock(&src->lock);
-	return err;
+	return 0;
 }
 
 static int inherit_tree(struct landlock_domain *const parent,
@@ -399,6 +396,8 @@ static int inherit_ruleset(struct landlock_domain *const parent,
  * The current task is requesting to be restricted.  The subjective credentials
  * must not be in an overridden state. cf. landlock_init_hierarchy_log().
  *
+ * The caller must hold @ruleset->lock.
+ *
  * Return: A new domain merging @parent and @ruleset on success, or ERR_PTR() on
  * failure.  If @parent is NULL, the new domain duplicates @ruleset.
  */
@@ -411,6 +410,7 @@ landlock_merge_ruleset(struct landlock_domain *const parent,
 	int err;
 
 	might_sleep();
+	lockdep_assert_held(&ruleset->lock);
 	if (WARN_ON_ONCE(!ruleset))
 		return ERR_PTR(-EINVAL);
 
diff --git a/security/landlock/log.c b/security/landlock/log.c
index ef79e4ed0037..ab4f982f8184 100644
--- a/security/landlock/log.c
+++ b/security/landlock/log.c
@@ -174,9 +174,12 @@ static void audit_denial(const struct landlock_cred_security *const subject,
 
 #endif /* CONFIG_AUDIT */
 
+#include <trace/events/landlock.h>
+
 #ifdef CONFIG_TRACEPOINTS
 #define CREATE_TRACE_POINTS
 #include <trace/events/landlock.h>
+#undef CREATE_TRACE_POINTS
 #endif /* CONFIG_TRACEPOINTS */
 
 static struct landlock_hierarchy *
@@ -473,6 +476,8 @@ void landlock_log_free_domain(const struct landlock_hierarchy *const hierarchy)
 	if (WARN_ON_ONCE(!hierarchy))
 		return;
 
+	trace_landlock_free_domain(hierarchy);
+
 	if (!audit_enabled)
 		return;
 
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index b18e83e457c2..93999749d80e 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -491,6 +491,7 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
 		flags)
 {
 	struct landlock_ruleset *ruleset __free(landlock_put_ruleset) = NULL;
+	struct landlock_domain *new_dom = NULL;
 	struct cred *new_cred;
 	struct landlock_cred_security *new_llcred;
 	bool __maybe_unused log_same_exec, log_new_exec, log_subdomains,
@@ -558,10 +559,15 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
 		 * There is no possible race condition while copying and
 		 * manipulating the current credentials because they are
 		 * dedicated per thread.
+		 *
+		 * Holds @ruleset->lock across the merge and tracepoint
+		 * emission so that the tracepoint reads the exact
+		 * ruleset version frozen into the new domain.
 		 */
-		struct landlock_domain *const new_dom =
-			landlock_merge_ruleset(new_llcred->domain, ruleset);
+		mutex_lock(&ruleset->lock);
+		new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset);
 		if (IS_ERR(new_dom)) {
+			mutex_unlock(&ruleset->lock);
 			abort_creds(new_cred);
 			return PTR_ERR(new_dom);
 		}
@@ -586,10 +592,23 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
 		const int err = landlock_restrict_sibling_threads(
 			current_cred(), new_cred);
 		if (err) {
+			if (ruleset)
+				mutex_unlock(&ruleset->lock);
 			abort_creds(new_cred);
 			return err;
 		}
 	}
 
+	/*
+	 * Emit after all fallible operations (including TSYNC) have
+	 * succeeded, so every event corresponds to an installed domain.
+	 * The ruleset lock is still held for BTF consistency (enforced
+	 * by lockdep_assert_held in TP_fast_assign).
+	 */
+	if (new_dom)
+		trace_landlock_restrict_self(ruleset, new_dom);
+
+	if (ruleset)
+		mutex_unlock(&ruleset->lock);
 	return commit_creds(new_cred);
 }
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 12/17] landlock: Add tracepoints for ptrace and scope denials
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Scope and ptrace denials follow a different code path (domain hierarchy
check) than access-right denials, requiring dedicated tracepoints with
type-specific TP_PROTO arguments.

Complete the tracepoint coverage for all Landlock denial types by adding
tracepoints for ptrace and scope-based denials:
- landlock_deny_ptrace: emitted when ptrace access is denied due to
  domain hierarchy mismatch.
- landlock_deny_scope_signal: emitted when signal delivery is denied by
  LANDLOCK_SCOPE_SIGNAL.
- landlock_deny_scope_abstract_unix_socket: emitted when abstract unix
  socket access is denied by LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET.

TP_PROTO passes the raw kernel object (struct task_struct or struct
sock) for eBPF BTF access.  String fields (comm, sun_path) use
__print_untrusted_str() because they contain untrusted input.

Unlike deny_access_fs and deny_access_net which include a blockers field
showing which specific access rights were denied, these events omit
blockers because each event corresponds to exactly one denial type
identified by the event name itself (e.g., landlock_deny_ptrace can only
mean a ptrace denial).  A blockers field is always zero since
scope and ptrace denials do not use access-right bitmasks.

Audit records use generic field names (opid, ocomm) for the target
process, while tracepoints use role-specific names (tracee_pid,
target_pid, peer_pid).  The tracepoint naming is more descriptive
because trace events are strongly typed and tied to the semantics of each
event, while the audit log format is generic.

Cc: Günther Noack <gnoack@google.com>
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 include/trace/events/landlock.h | 135 ++++++++++++++++++++++++++++++++
 security/landlock/log.c         |  20 +++++
 2 files changed, 155 insertions(+)

diff --git a/include/trace/events/landlock.h b/include/trace/events/landlock.h
index 1afab091efba..9f96c9897f44 100644
--- a/include/trace/events/landlock.h
+++ b/include/trace/events/landlock.h
@@ -11,6 +11,7 @@
 #define _TRACE_LANDLOCK_H
 
 #include <linux/tracepoint.h>
+#include <net/af_unix.h>
 
 struct dentry;
 struct landlock_domain;
@@ -19,6 +20,7 @@ struct landlock_rule;
 struct landlock_ruleset;
 struct path;
 struct sock;
+struct task_struct;
 
 /**
  * DOC: Landlock trace events
@@ -433,6 +435,139 @@ TRACE_EVENT(
 		__entry->log_new_exec, __entry->blockers, __entry->sport,
 		__entry->dport));
 
+/**
+ * landlock_deny_ptrace - ptrace access denied
+ * @hierarchy: Hierarchy node that blocked the access (never NULL)
+ * @same_exec: Whether the current task is the same executable that called
+ *             landlock_restrict_self() for the denying hierarchy node
+ * @tracee: Target task (never NULL); eBPF can read pid, comm, cred,
+ *          namespaces, and cgroup via BTF
+ */
+TRACE_EVENT(
+	landlock_deny_ptrace,
+
+	TP_PROTO(const struct landlock_hierarchy *hierarchy, bool same_exec,
+		 const struct task_struct *tracee),
+
+	TP_ARGS(hierarchy, same_exec, tracee),
+
+	TP_STRUCT__entry(
+		__field(__u64, domain_id) __field(bool, same_exec)
+			__field(u32, log_same_exec) __field(u32, log_new_exec)
+				__field(pid_t, tracee_pid)
+					__string(tracee_comm, tracee->comm)),
+
+	TP_fast_assign(__entry->domain_id = hierarchy->id;
+		       __entry->same_exec = same_exec;
+		       __entry->log_same_exec = hierarchy->log_same_exec;
+		       __entry->log_new_exec = hierarchy->log_new_exec;
+		       __entry->tracee_pid =
+			       task_tgid_nr((struct task_struct *)tracee);
+		       __assign_str(tracee_comm);),
+
+	TP_printk(
+		"domain=%llx same_exec=%d log_same_exec=%u log_new_exec=%u tracee_pid=%d comm=%s",
+		__entry->domain_id, __entry->same_exec, __entry->log_same_exec,
+		__entry->log_new_exec, __entry->tracee_pid,
+		__print_untrusted_str(tracee_comm)));
+
+/**
+ * landlock_deny_scope_signal - signal delivery denied by
+ *                               LANDLOCK_SCOPE_SIGNAL
+ * @hierarchy: Hierarchy node that blocked the access (never NULL)
+ * @same_exec: Whether the current task is the same executable that called
+ *             landlock_restrict_self() for the denying hierarchy node
+ * @target: Signal target task (never NULL); eBPF can read pid, comm, cred,
+ *          namespaces, and cgroup via BTF
+ */
+TRACE_EVENT(
+	landlock_deny_scope_signal,
+
+	TP_PROTO(const struct landlock_hierarchy *hierarchy, bool same_exec,
+		 const struct task_struct *target),
+
+	TP_ARGS(hierarchy, same_exec, target),
+
+	TP_STRUCT__entry(
+		__field(__u64, domain_id) __field(bool, same_exec)
+			__field(u32, log_same_exec) __field(u32, log_new_exec)
+				__field(pid_t, target_pid)
+					__string(target_comm, target->comm)),
+
+	TP_fast_assign(__entry->domain_id = hierarchy->id;
+		       __entry->same_exec = same_exec;
+		       __entry->log_same_exec = hierarchy->log_same_exec;
+		       __entry->log_new_exec = hierarchy->log_new_exec;
+		       __entry->target_pid =
+			       task_tgid_nr((struct task_struct *)target);
+		       __assign_str(target_comm);),
+
+	TP_printk(
+		"domain=%llx same_exec=%d log_same_exec=%u log_new_exec=%u target_pid=%d comm=%s",
+		__entry->domain_id, __entry->same_exec, __entry->log_same_exec,
+		__entry->log_new_exec, __entry->target_pid,
+		__print_untrusted_str(target_comm)));
+
+/**
+ * landlock_deny_scope_abstract_unix_socket - abstract unix socket access
+ *     denied by LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET
+ * @hierarchy: Hierarchy node that blocked the access (never NULL)
+ * @same_exec: Whether the current task is the same executable that called
+ *             landlock_restrict_self() for the denying hierarchy node
+ * @peer: Peer socket (never NULL); eBPF can read sk_peer_pid,
+ *        sk_peer_cred, socket type, and protocol via BTF
+ */
+TRACE_EVENT(
+	landlock_deny_scope_abstract_unix_socket,
+
+	TP_PROTO(const struct landlock_hierarchy *hierarchy, bool same_exec,
+		 const struct sock *peer),
+
+	TP_ARGS(hierarchy, same_exec, peer),
+
+	TP_STRUCT__entry(
+		__field(__u64, domain_id) __field(bool, same_exec)
+			__field(u32, log_same_exec) __field(u32, log_new_exec)
+				__field(pid_t, peer_pid)
+		/*
+		 * Abstract socket names are untrusted binary data from
+		 * user space.  Use __string_len because abstract names
+		 * are not NUL-terminated; their length is determined by
+		 * addr->len.
+		 */
+		__string_len(sun_path,
+			     unix_sk(peer)->addr ?
+				     unix_sk(peer)->addr->name->sun_path + 1 :
+				     "",
+			     unix_sk(peer)->addr ?
+				     unix_sk(peer)->addr->len -
+					     offsetof(struct sockaddr_un,
+						      sun_path) -
+					     1 :
+				     0)),
+
+	TP_fast_assign(struct pid *peer_pid;
+
+		       __entry->domain_id = hierarchy->id;
+		       __entry->same_exec = same_exec;
+		       __entry->log_same_exec = hierarchy->log_same_exec;
+		       __entry->log_new_exec = hierarchy->log_new_exec;
+		       /*
+			* READ_ONCE prevents compiler double-read.  The value
+			* is stable because unix_state_lock(peer) is held by
+			* the caller (hook_unix_stream_connect or
+			* hook_unix_may_send).
+			*/
+		       peer_pid = READ_ONCE(peer->sk_peer_pid);
+		       __entry->peer_pid = peer_pid ? pid_nr(peer_pid) : 0;
+		       __assign_str(sun_path);),
+
+	TP_printk(
+		"domain=%llx same_exec=%d log_same_exec=%u log_new_exec=%u peer_pid=%d sun_path=%s",
+		__entry->domain_id, __entry->same_exec, __entry->log_same_exec,
+		__entry->log_new_exec, __entry->peer_pid,
+		__print_untrusted_str(sun_path)));
+
 #endif /* _TRACE_LANDLOCK_H */
 
 /* This part must be outside protection */
diff --git a/security/landlock/log.c b/security/landlock/log.c
index c81cb7c1c448..a2f61aed81ff 100644
--- a/security/landlock/log.c
+++ b/security/landlock/log.c
@@ -11,6 +11,9 @@
 #include <linux/bitops.h>
 #include <linux/lsm_audit.h>
 #include <linux/pid.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <net/sock.h>
 #include <uapi/linux/landlock.h>
 
 #include "access.h"
@@ -259,6 +262,23 @@ static void trace_denial(const struct landlock_cred_security *const subject,
 				ntohs(request->audit.u.net->sport),
 				ntohs(request->audit.u.net->dport));
 		break;
+	case LANDLOCK_REQUEST_PTRACE:
+		if (trace_landlock_deny_ptrace_enabled())
+			trace_landlock_deny_ptrace(youngest_denied, same_exec,
+						   request->audit.u.tsk);
+		break;
+	case LANDLOCK_REQUEST_SCOPE_SIGNAL:
+		if (trace_landlock_deny_scope_signal_enabled())
+			trace_landlock_deny_scope_signal(youngest_denied,
+							 same_exec,
+							 request->audit.u.tsk);
+		break;
+	case LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET:
+		if (trace_landlock_deny_scope_abstract_unix_socket_enabled())
+			trace_landlock_deny_scope_abstract_unix_socket(
+				youngest_denied, same_exec,
+				request->audit.u.net->sk);
+		break;
 	default:
 		break;
 	}
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2 11/17] landlock: Add landlock_deny_access_fs and landlock_deny_access_net
From: Mickaël Salaün @ 2026-04-06 14:37 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Steven Rostedt
  Cc: Mickaël Salaün, Jann Horn, Jeff Xu, Justin Suess,
	Kees Cook, Masami Hiramatsu, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-1-mic@digikod.net>

Add per-type tracepoints emitted from landlock_log_denial() when an
access is denied: landlock_deny_access_fs for filesystem denials and
landlock_deny_access_net for network denials.

The events use the "deny_" prefix (rather than "check_") to make clear
that they fire only on denial, not on every access check.

These complement the check_rule tracepoints by showing the final denial
verdict, including the denial-by-absence case (when no rule matches
along the pathwalk, no check_rule events fire, but the deny_access event
makes the denial explicit).

Trace events fire unconditionally, independent of audit configuration
and user-specified log flags (LANDLOCK_LOG_DISABLED).  The user's
"disable logging" intent applies to audit records, not to kernel
tracing.  The LANDLOCK_LOG_DISABLED check is moved into the
audit-specific path; num_denials and trace emission execute regardless.

The deny_access events pass the denying hierarchy node (const struct
landlock_hierarchy *hierarchy) in TP_PROTO, not the task's current
domain.  The domain_id entry field shows the ID of the specific
hierarchy node that blocked the access, matching audit record semantics.
This differs from check_rule events which pass the task's current domain
(needed for the dynamic per-layer array sizing).

The same_exec field is passed in TP_PROTO because it is computed from
the credential bitmask, not derivable from the hierarchy pointer alone.
The events include same_exec, log_same_exec, and log_new_exec fields for
stateless ftrace filtering that replicates audit's suppression logic.

The denial field is named "blockers" (matching the audit record field)
rather than "blocked", to enable consistent field-name correlation
between audit and trace output.

Network denial sport and dport fields use __u64 host-endian, matching
the landlock_net_port_attr.port UAPI convention.  The caller converts
from the lsm_network_audit __be16 fields via ntohs() before emitting
the event.

The filesystem path is resolved via d_absolute_path() (the same helper
used by landlock_add_rule_fs), producing namespace-independent absolute
paths.  Audit uses d_path() which resolves relative to the process's
chroot; the difference is documented but acceptable for tracepoints
which are designed for deterministic output regardless of the tracer's
namespace state.  Device numbers use numeric major:minor format (unlike
audit's string s_id) for machine parseability.

For FS_CHANGE_TOPOLOGY hooks that provide only a dentry, the path is
resolved via dentry_path_raw() instead of d_absolute_path().

The denial tracepoint allocates PATH_MAX bytes from the heap via
__getname() for path resolution.  This cost is only paid when a tracer
is attached.

Cc: Günther Noack <gnoack@google.com>
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- New patch.
---
 include/trace/events/landlock.h | 100 +++++++++++++++++++++
 security/landlock/log.c         | 149 ++++++++++++++++++++++++++------
 security/landlock/log.h         |   9 +-
 3 files changed, 227 insertions(+), 31 deletions(-)

diff --git a/include/trace/events/landlock.h b/include/trace/events/landlock.h
index e7bb8fa802bf..1afab091efba 100644
--- a/include/trace/events/landlock.h
+++ b/include/trace/events/landlock.h
@@ -18,6 +18,7 @@ struct landlock_hierarchy;
 struct landlock_rule;
 struct landlock_ruleset;
 struct path;
+struct sock;
 
 /**
  * DOC: Landlock trace events
@@ -50,6 +51,15 @@ struct path;
  * Network port fields use __u64 in host endianness, matching the
  * landlock_net_port_attr.port UAPI convention.  Callers convert from
  * network byte order before emitting the event.
+ *
+ * Field ordering convention for denial events: domain ID, same_exec,
+ * log_same_exec, log_new_exec, then blockers (deny_access events only),
+ * then type-specific object identification fields, then variable-length
+ * fields.
+ *
+ * The deny_access denial events include same_exec and log_same_exec /
+ * log_new_exec fields so that both stateless (ftrace filter) and stateful
+ * (eBPF) consumers can replicate the audit subsystem's filtering logic.
  */
 
 /**
@@ -333,6 +343,96 @@ TRACE_EVENT(landlock_check_rule_net,
 		      __entry->port,
 		      __print_dynamic_array(layers, sizeof(access_mask_t))));
 
+/**
+ * landlock_deny_access_fs - filesystem access denied
+ * @hierarchy: Hierarchy node that blocked the access (never NULL).
+ *             Identifies the specific domain in the hierarchy whose
+ *             rules caused the denial.  eBPF can read hierarchy->id,
+ *             hierarchy->log_same_exec, hierarchy->log_new_exec, and
+ *             walk hierarchy->parent for the domain chain.
+ * @same_exec: Whether the current task is the same executable that
+ *             called landlock_restrict_self() for the denying hierarchy
+ *             node.  Computed from the credential bitmask, not derivable
+ *             from the hierarchy alone.
+ * @blockers: Access mask that was blocked
+ * @path: Filesystem path that was denied (never NULL)
+ * @pathname: Resolved absolute path string (never NULL)
+ */
+TRACE_EVENT(
+	landlock_deny_access_fs,
+
+	TP_PROTO(const struct landlock_hierarchy *hierarchy, bool same_exec,
+		 access_mask_t blockers, const struct path *path,
+		 const char *pathname),
+
+	TP_ARGS(hierarchy, same_exec, blockers, path, pathname),
+
+	TP_STRUCT__entry(
+		__field(__u64, domain_id) __field(bool, same_exec)
+			__field(u32, log_same_exec) __field(u32, log_new_exec)
+				__field(access_mask_t, blockers)
+					__field(dev_t, dev) __field(ino_t, ino)
+						__string(pathname, pathname)),
+
+	TP_fast_assign(__entry->domain_id = hierarchy->id;
+		       __entry->same_exec = same_exec;
+		       __entry->log_same_exec = hierarchy->log_same_exec;
+		       __entry->log_new_exec = hierarchy->log_new_exec;
+		       __entry->blockers = blockers;
+		       __entry->dev = path->dentry->d_sb->s_dev;
+		       __entry->ino = d_backing_inode(path->dentry)->i_ino;
+		       __assign_str(pathname);),
+
+	TP_printk(
+		"domain=%llx same_exec=%d log_same_exec=%u log_new_exec=%u blockers=0x%x dev=%u:%u ino=%lu path=%s",
+		__entry->domain_id, __entry->same_exec, __entry->log_same_exec,
+		__entry->log_new_exec, __entry->blockers, MAJOR(__entry->dev),
+		MINOR(__entry->dev), __entry->ino,
+		__print_untrusted_str(pathname)));
+
+/**
+ * landlock_deny_access_net - network access denied
+ * @hierarchy: Hierarchy node that blocked the access (never NULL)
+ * @same_exec: Whether the current task is the same executable that
+ *             called landlock_restrict_self() for the denying hierarchy
+ *             node
+ * @blockers: Access mask that was blocked
+ * @sk: Socket object (never NULL); eBPF can read socket family, state,
+ *      local/remote addresses, and options via BTF
+ * @sport: Source port in host endianness (non-zero for bind denials,
+ *         zero for connect denials)
+ * @dport: Destination port in host endianness (non-zero for connect
+ *         denials, zero for bind denials)
+ */
+TRACE_EVENT(
+	landlock_deny_access_net,
+
+	TP_PROTO(const struct landlock_hierarchy *hierarchy, bool same_exec,
+		 access_mask_t blockers, const struct sock *sk, __u64 sport,
+		 __u64 dport),
+
+	TP_ARGS(hierarchy, same_exec, blockers, sk, sport, dport),
+
+	TP_STRUCT__entry(
+		__field(__u64, domain_id) __field(bool, same_exec)
+			__field(u32, log_same_exec) __field(u32, log_new_exec)
+				__field(access_mask_t, blockers)
+					__field(__u64, sport)
+						__field(__u64, dport)),
+
+	TP_fast_assign(__entry->domain_id = hierarchy->id;
+		       __entry->same_exec = same_exec;
+		       __entry->log_same_exec = hierarchy->log_same_exec;
+		       __entry->log_new_exec = hierarchy->log_new_exec;
+		       __entry->blockers = blockers; __entry->sport = sport;
+		       __entry->dport = dport;),
+
+	TP_printk(
+		"domain=%llx same_exec=%d log_same_exec=%u log_new_exec=%u blockers=0x%x sport=%llu dport=%llu",
+		__entry->domain_id, __entry->same_exec, __entry->log_same_exec,
+		__entry->log_new_exec, __entry->blockers, __entry->sport,
+		__entry->dport));
+
 #endif /* _TRACE_LANDLOCK_H */
 
 /* This part must be outside protection */
diff --git a/security/landlock/log.c b/security/landlock/log.c
index ab4f982f8184..c81cb7c1c448 100644
--- a/security/landlock/log.c
+++ b/security/landlock/log.c
@@ -3,6 +3,7 @@
  * Landlock - Log helpers
  *
  * Copyright © 2023-2025 Microsoft Corporation
+ * Copyright © 2026 Cloudflare
  */
 
 #include <kunit/test.h>
@@ -143,6 +144,9 @@ static void audit_denial(const struct landlock_cred_security *const subject,
 {
 	struct audit_buffer *ab;
 
+	if (READ_ONCE(youngest_denied->log_status) == LANDLOCK_LOG_DISABLED)
+		return;
+
 	if (!audit_enabled)
 		return;
 
@@ -172,6 +176,16 @@ static void audit_denial(const struct landlock_cred_security *const subject,
 	log_domain(youngest_denied);
 }
 
+#else /* CONFIG_AUDIT */
+
+static inline void
+audit_denial(const struct landlock_cred_security *const subject,
+	     const struct landlock_request *const request,
+	     struct landlock_hierarchy *const youngest_denied,
+	     const size_t youngest_layer, const access_mask_t missing)
+{
+}
+
 #endif /* CONFIG_AUDIT */
 
 #include <trace/events/landlock.h>
@@ -180,6 +194,86 @@ static void audit_denial(const struct landlock_cred_security *const subject,
 #define CREATE_TRACE_POINTS
 #include <trace/events/landlock.h>
 #undef CREATE_TRACE_POINTS
+
+#include "fs.h"
+
+static void trace_denial(const struct landlock_cred_security *const subject,
+			 const struct landlock_request *const request,
+			 const struct landlock_hierarchy *const youngest_denied,
+			 const size_t youngest_layer,
+			 const access_mask_t missing)
+{
+	const bool same_exec = !!(subject->domain_exec & BIT(youngest_layer));
+
+	switch (request->type) {
+	case LANDLOCK_REQUEST_FS_ACCESS:
+	case LANDLOCK_REQUEST_FS_CHANGE_TOPOLOGY:
+		if (trace_landlock_deny_access_fs_enabled()) {
+			char *buf __free(__putname) = __getname();
+			const char *pathname;
+			const struct path *path;
+
+			/*
+			 * FS_CHANGE_TOPOLOGY uses either LSM_AUDIT_DATA_PATH or
+			 * LSM_AUDIT_DATA_DENTRY depending on the hook.  For the
+			 * dentry case, build a path on the stack with the real
+			 * dentry so TP_fast_assign can extract dev and ino.
+			 * The mnt field is unused by TP_fast_assign.
+			 */
+			if (request->audit.type == LSM_AUDIT_DATA_DENTRY) {
+				struct path dentry_path = {
+					.dentry = request->audit.u.dentry,
+				};
+
+				path = &dentry_path;
+				pathname =
+					buf ? dentry_path_raw(
+						      request->audit.u.dentry,
+						      buf, PATH_MAX) :
+					      "<no_mem>";
+				if (IS_ERR(pathname))
+					pathname = "<unreachable>";
+
+				trace_landlock_deny_access_fs(youngest_denied,
+							      same_exec,
+							      missing, path,
+							      pathname);
+			} else {
+				path = &request->audit.u.path;
+				pathname = buf ? resolve_path_for_trace(path,
+									buf) :
+						 "<no_mem>";
+
+				trace_landlock_deny_access_fs(youngest_denied,
+							      same_exec,
+							      missing, path,
+							      pathname);
+			}
+		}
+		break;
+	case LANDLOCK_REQUEST_NET_ACCESS:
+		if (trace_landlock_deny_access_net_enabled())
+			trace_landlock_deny_access_net(
+				youngest_denied, same_exec, missing,
+				request->audit.u.net->sk,
+				ntohs(request->audit.u.net->sport),
+				ntohs(request->audit.u.net->dport));
+		break;
+	default:
+		break;
+	}
+}
+
+#else /* CONFIG_TRACEPOINTS */
+
+static inline void
+trace_denial(const struct landlock_cred_security *const subject,
+	     const struct landlock_request *const request,
+	     const struct landlock_hierarchy *const youngest_denied,
+	     const size_t youngest_layer, const access_mask_t missing)
+{
+}
+
 #endif /* CONFIG_TRACEPOINTS */
 
 static struct landlock_hierarchy *
@@ -439,9 +533,6 @@ void landlock_log_denial(const struct landlock_cred_security *const subject,
 			get_hierarchy(subject->domain, youngest_layer);
 	}
 
-	if (READ_ONCE(youngest_denied->log_status) == LANDLOCK_LOG_DISABLED)
-		return;
-
 	/*
 	 * Consistently keeps track of the number of denied access requests
 	 * even if audit is currently disabled, or if audit rules currently
@@ -450,45 +541,25 @@ void landlock_log_denial(const struct landlock_cred_security *const subject,
 	 */
 	atomic64_inc(&youngest_denied->num_denials);
 
-#ifdef CONFIG_AUDIT
+	trace_denial(subject, request, youngest_denied, youngest_layer,
+		     missing);
 	audit_denial(subject, request, youngest_denied, youngest_layer,
 		     missing);
-#endif /* CONFIG_AUDIT */
 }
 
 #ifdef CONFIG_AUDIT
 
-/**
- * landlock_log_free_domain - Create an audit record on domain deallocation
- *
- * @hierarchy: The domain's hierarchy being deallocated.
- *
- * Only domains which previously appeared in the audit logs are logged again.
- * This is useful to know when a domain will never show again in the audit log.
- *
- * Called in a work queue scheduled by landlock_put_domain_deferred() called by
- * hook_cred_free().
- */
-void landlock_log_free_domain(const struct landlock_hierarchy *const hierarchy)
+static void audit_drop_domain(const struct landlock_hierarchy *const hierarchy)
 {
 	struct audit_buffer *ab;
 
-	if (WARN_ON_ONCE(!hierarchy))
-		return;
-
-	trace_landlock_free_domain(hierarchy);
-
 	if (!audit_enabled)
 		return;
 
-	/* Ignores domains that were not logged.  */
+	/* Ignores domains that were not logged. */
 	if (READ_ONCE(hierarchy->log_status) != LANDLOCK_LOG_RECORDED)
 		return;
 
-	/*
-	 * If logging of domain allocation succeeded, warns about failure to log
-	 * domain deallocation to highlight unbalanced domain lifetime logs.
-	 */
 	ab = audit_log_start(audit_context(), GFP_KERNEL,
 			     AUDIT_LANDLOCK_DOMAIN);
 	if (!ab)
@@ -499,8 +570,32 @@ void landlock_log_free_domain(const struct landlock_hierarchy *const hierarchy)
 	audit_log_end(ab);
 }
 
+#else /* CONFIG_AUDIT */
+
+static inline void
+audit_drop_domain(const struct landlock_hierarchy *const hierarchy)
+{
+}
+
 #endif /* CONFIG_AUDIT */
 
+/**
+ * landlock_log_free_domain - Log domain deallocation
+ *
+ * @hierarchy: The domain's hierarchy being deallocated.
+ *
+ * Called from landlock_put_domain_deferred() (via a work queue scheduled by
+ * hook_cred_free()) or directly from landlock_put_domain().
+ */
+void landlock_log_free_domain(const struct landlock_hierarchy *const hierarchy)
+{
+	if (WARN_ON_ONCE(!hierarchy))
+		return;
+
+	trace_landlock_free_domain(hierarchy);
+	audit_drop_domain(hierarchy);
+}
+
 #ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
 
 static struct kunit_case test_cases[] = {
diff --git a/security/landlock/log.h b/security/landlock/log.h
index 4370fff86e45..5615a776c29a 100644
--- a/security/landlock/log.h
+++ b/security/landlock/log.h
@@ -3,6 +3,7 @@
  * Landlock - Log helpers
  *
  * Copyright © 2023-2025 Microsoft Corporation
+ * Copyright © 2026 Cloudflare
  */
 
 #ifndef _SECURITY_LANDLOCK_LOG_H
@@ -28,7 +29,7 @@ enum landlock_request_type {
 /*
  * We should be careful to only use a variable of this type for
  * landlock_log_denial().  This way, the compiler can remove it entirely if
- * CONFIG_AUDIT is not set.
+ * CONFIG_SECURITY_LANDLOCK_LOG is not set.
  */
 struct landlock_request {
 	/* Mandatory fields. */
@@ -52,14 +53,14 @@ struct landlock_request {
 	deny_masks_t deny_masks;
 };
 
-#ifdef CONFIG_AUDIT
+#ifdef CONFIG_SECURITY_LANDLOCK_LOG
 
 void landlock_log_free_domain(const struct landlock_hierarchy *const hierarchy);
 
 void landlock_log_denial(const struct landlock_cred_security *const subject,
 			 const struct landlock_request *const request);
 
-#else /* CONFIG_AUDIT */
+#else /* CONFIG_SECURITY_LANDLOCK_LOG */
 
 static inline void
 landlock_log_free_domain(const struct landlock_hierarchy *const hierarchy)
@@ -72,6 +73,6 @@ landlock_log_denial(const struct landlock_cred_security *const subject,
 {
 }
 
-#endif /* CONFIG_AUDIT */
+#endif /* CONFIG_SECURITY_LANDLOCK_LOG */
 
 #endif /* _SECURITY_LANDLOCK_LOG_H */
-- 
2.53.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox