[RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
@ 2025-03-04  1:12 Tingmao Wang
  2025-03-04  1:12 ` [RFC PATCH 1/9] Define the supervisor and event structure Tingmao Wang
                   ` (10 more replies)
  0 siblings, 11 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:12 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

Landlock supervise: a mechanism for interactive permission requests

Hi,

I would like to propose an extension to Landlock to support a "supervisor"
mode, which would enable a user program to sandbox applications (or
itself) in a dynamic, fine-grained, and potentially temporary way.
Practically, this makes it easy to give maximal control to the user,
perhaps in the form of a "just in time" permission prompt.  Read on, or
check the sandboxer program in the last patch for a "demo".

To Jan Kara and other fanotify reviewers, I've included you in this patch
as Mickaël suggested that we could potentially extend and re-use the
fanotify uapi and code instead of creating an entirely new representation
for permission requests and mechanism for passing it (as this patch
currently does).  I've not really thought out how that would work (there
will probably have to be some extension of the fanotify-fd uapi since
landlock handles more than FS access), but I think it is a promising idea,
hence I would like to hear your thoughts if you could spare a moment to
look at this.  A good outcome could also be that we add the necessary
hooks so that both this and fanotify (but really fsnotify?) can have _perm
events for create/delete/rename etc.

FS mailing list - I've CC'd this patchset to you too - even though the
patch doesn't currently touch any FS code, this is very FS related, and
also, in order to address an inode lock related problem which I will
mention in patch 6 of this series, future versions of this patch will
likely need to add a few more LSM hooks.  Especially for that part, but
also other bits of this project, a pair of eyes from the FS community
would be very helpful.

To Tycho Andersen -- I'm CC'ing you as you've worked on the seccomp-unotify
feature which is also quite related, so if you could spare some time for a
quick review, or provide some suggestions, that would be very appreciated
:)

I'm submitting this series as a non-production-ready, proof-of-concept
RFC, and I would appreciate feedback on any aspects of the design or
implementation.  Note that due to the PoC nature of this, I have not
handled checkpatch.pl errors etc.  I also welcome suggestions for
alternative names for this feature (e.g. landlock-unotify?
landlock-perm?).  At this point I'm very keen to hear some initial
feedback from the community before investing further into polishing this
patch.

(I've briefly pitched the overall idea to Mickaël, but he has not reviewed
the patch yet)

Why extend landlock?
--------------------

While this feature could be implemented as its own LSM, I feel like it is
a natural extension to landlock -- landlock has already defined a set of
fine-grained access requests with the intention to add more (and not just
for FS alone), is designed to be an unprivileged, stackable,
process-scoped, ad-hoc mechanism with no persistent state, which works
well as a generic API to support a dynamic sandbox, and landlock is
already doing the path traversal work to evaluate hierarchical filesystem
rules, which would also be useful for a performant dynamic sandbox
implementation.

Use cases
---------

I have several potential use cases in mind that will benefit from
landlock-supervise, for example:

1. A patch to firejail (I have not discussed with the firejail maintainers
on this yet - wanted to see the reception of this kernel patch first)
which can leverage landlock in a highly flexible way, prompting the user
for permission to access "extra" files after the sandbox has started
(without e.g. having to restart a very stateful GUI program).

This way of using landlock can potentially replace its current approach of
using bind mounts (as it will allow implementing "blacklists"), allowing
unprivileged sandbox creation (although need to check with firejail if
there are other factors preventing this).  This also allows editing
profiles "live" in a highly interactive way (i.e. the user can choose
"allow and remember" on a permission request which will also add the newly
allowed path to a local firejail profile, all automatically)

2. A "protected" mode for common development environments (e.g. VSCode or
a terminal can be launched "protected") that doesn't compromise on
ease-of-use.  File access to $PWD at launch can be allowed, and access to
other places can be allowed ad-hoc by the developer with hopefully one UI
click.  Since landlock can also be used to restrict network access, such a
protected mode can also restrict outgoing connections by default (but ask
the user if they allow it for all or certain processes, on the first
attempt to connect).

Recently there has been incidents of secret-stealing malware targeting
developers (on Linux) by social engineering them to open and build/run a
project. [1]  The hope is that landlock-supervise can drive adoption of
sandboxes for developers and others by making them more user-friendly.

In addition to the above, I also hope that this would help with landlock
adoption even in non-interaction-heavy scenarios, by allowing application
developers the choice to gracefully recover from over-restrictive rulesets
and collect failure metrics, until they are confident that actually
blocking non-allowed accesses would not break their application or degrade
the user experience.

I have more exploration to do regarding applying this to applications, but
I do have a working proof of concept already (implemented as an
enhancement to the sandboxer example). Here is a shortened output:

    bash # env LL_FS_RO=/usr:/lib:/bin:/etc:/dev:/proc LL_FS_RW= LL_SUPERVISE=1 ./sandboxer bash -i
    bash # echo "Hi, $(whoami)!"
    Hi, root!
    bash # ls /
    ------------- Sandboxer access request -------------
    Process ls[166] (/usr/bin/ls) wants to read
      /
    (y)es/(a)lways/(n)o > y
    ----------------------------------------------------
    bin
    boot
    dev
    ...
    usr
    var
    bash # echo 'evil' >> /etc/profile
    (a spurious create request due to current issue with dcache miss is omitted)
    ------------- Sandboxer access request -------------
    Process bash[163] (/usr/bin/bash) wants to read/write
      /etc/profile
    (y)es/(a)lways/(n)o > n
    ----------------------------------------------------
    bash: /etc/profile: Permission denied
    bash #

Alternatives
------------

I have looked for existing ways to implement the proposed use cases (at
least for FS access), and three main approaches stand out to me:

1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
response from a fanotify listener.  However, it does not currently have
the equivalent _PERM for file creation, deletion, rename and linking, and
it is also not designed for unprivileged, process-scoped use (unlike
landlock).

2. Seccomp-unotify: this can be used to trap all syscalls and give the
sandbox a chance to allow or deny any one of them. However, a correct,
TOCTOU-proof implementation will likely require handling a large number of
fs-related syscalls in user-space, with the sandboxer opening the file or
carrying out the operation on behalf of the sandboxee.  This is probably
going to be extremely complex and makes everything less performant.

3. Using a FUSE filesystem which gates access.  This is actually an
approach taken by an existing sandbox solution - flatpak [2], however it
requires either tight integration with the application (and thus doesn't
work well for the mentioned use cases), or if one wants to sandbox a
program "transparently", SYS_ADMIN to chroot.

I've tested that what I have here works with the enhanced sandboxer, but
have yet to write any self tests or do extensive testing or perf
measurements.  I have also yet to implement support for supervising tcp
rules as well as FS refer operations.

Base commit: 78332fdb956f18accfbca5993b10c5ed69f00a2c (tag:
landlock-6.14-rc5, mic/next)

[1]: https://cybersecuritynews.com/beware-of-lazarus-linkedin-recruiting-scam/
[2]: https://flatpak.github.io/xdg-desktop-portal/docs/documents-and-fuse.html

Tingmao Wang (9):
  Define the supervisor and event structure
  Refactor per-layer information in rulesets and rules
  Adds a supervisor reference in the per-layer information
  User-space API for creating a supervisor-fd
  Define user structure for events and responses.
  Creating supervisor events for filesystem operations
  Implement fdinfo for ruleset and supervisor fd
  Implement fops for supervisor-fd
  Enhance the sandboxer example to support landlock-supervise

 include/uapi/linux/landlock.h | 119 ++++++
 samples/landlock/sandboxer.c  | 759 +++++++++++++++++++++++++++++++++-
 security/landlock/Makefile    |   2 +-
 security/landlock/fs.c        | 134 +++++-
 security/landlock/ruleset.c   |  49 ++-
 security/landlock/ruleset.h   |  66 +--
 security/landlock/supervise.c | 194 +++++++++
 security/landlock/supervise.h | 171 ++++++++
 security/landlock/syscalls.c  | 621 +++++++++++++++++++++++++++-
 9 files changed, 2036 insertions(+), 79 deletions(-)
 create mode 100644 security/landlock/supervise.c
 create mode 100644 security/landlock/supervise.h

--
2.39.5

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH 1/9] Define the supervisor and event structure
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
@ 2025-03-04  1:12 ` Tingmao Wang
  2025-03-04  1:12 ` [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules Tingmao Wang
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:12 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

In the current design (mostly not implemented yet), a "supervisor" is a
program that creates (but probably not enforce on itself) a Landlock
ruleset which it specifically marks as operating in "supervise" mode. For
such a layer (but not other layers below or above it), access not granted
by the ruleset, which would normally result in a denial, instead triggers
a supervise event, and the thread which caused the event is paused until
either the supervisor responds to the event, the event is cancelled due to
supervisor termination, or the requesting thread being killed.

We define a refcounted structure that represents a supervisor, and will
later be exposed to the user-space via a file descriptor.  Each supervisor
has an event queue and a separate list of events which have been read by
the supervisor and is now awaiting response.  This allows the future read
codepath to not have to iterate over already notified events, but still
allow the response codepath to find the event.

The event struct is also refcounted, so that it is not tied to the
lifetime of the supervisor (e.g. if it dies, the task doing the access
that is currently stuck in kernel syscall still holds the event refcount,
and can read its status safely).

The details of the event structure will be populated in a future patch.

The struct is called landlock_supervise_event_kernel so that the uapi
header can use the shorter name.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 security/landlock/Makefile    |  2 +-
 security/landlock/supervise.c | 72 +++++++++++++++++++++++++++++++++++
 security/landlock/supervise.h | 63 ++++++++++++++++++++++++++++++
 3 files changed, 136 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/supervise.c
 create mode 100644 security/landlock/supervise.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index b4538b7cf7d2..c9bab22ab0f5 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,6 +1,6 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
 landlock-y := setup.o syscalls.o object.o ruleset.o \
-	cred.o task.o fs.o
+	cred.o task.o fs.o supervise.o
 
 landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/supervise.c b/security/landlock/supervise.c
new file mode 100644
index 000000000000..a3bb6928f453
--- /dev/null
+++ b/security/landlock/supervise.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Implementation specific to landlock-supervise
+ *
+ * Copyright © 2025 Tingmao Wang <m@maowtm.org>
+ */
+
+#include <linux/path.h>
+#include <linux/pid.h>
+#include <linux/slab.h>
+#include <linux/wait_bit.h>
+
+#include "supervise.h"
+
+struct landlock_supervisor *landlock_create_supervisor(void)
+{
+	struct landlock_supervisor *supervisor;
+
+	supervisor = kzalloc(sizeof(*supervisor), GFP_KERNEL_ACCOUNT);
+	if (!supervisor)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&supervisor->usage, 1);
+	supervisor->next_event_id = 1;
+	spin_lock_init(&supervisor->lock);
+	INIT_LIST_HEAD(&supervisor->event_queue);
+	INIT_LIST_HEAD(&supervisor->notified_events);
+	init_waitqueue_head(&supervisor->poll_event_wq);
+	return supervisor;
+}
+
+void landlock_get_supervisor(struct landlock_supervisor *const supervisor)
+{
+	refcount_inc(&supervisor->usage);
+}
+
+static void
+deny_and_put_event(struct landlock_supervise_event_kernel *const event)
+{
+	cmpxchg(&event->state, LANDLOCK_SUPERVISE_EVENT_NEW,
+		LANDLOCK_SUPERVISE_EVENT_DENIED);
+	cmpxchg(&event->state, LANDLOCK_SUPERVISE_EVENT_NOTIFIED,
+		LANDLOCK_SUPERVISE_EVENT_DENIED);
+	wake_up_var(event);
+	landlock_put_supervise_event(event);
+}
+
+void landlock_put_supervisor(struct landlock_supervisor *const supervisor)
+{
+	if (refcount_dec_and_test(&supervisor->usage)) {
+		struct landlock_supervise_event_kernel *freeme, *next;
+
+		might_sleep();
+		/* we are the only reference, hence no locking */
+
+		/* deny all pending events */
+		list_for_each_entry_safe(freeme, next, &supervisor->event_queue,
+					 node) {
+			list_del(&freeme->node);
+			deny_and_put_event(freeme);
+		}
+		/*
+		 * user reply no longer possible without any reference to
+		 * supervisor, deny all notified events
+		 */
+		list_for_each_entry_safe(freeme, next,
+					 &supervisor->notified_events, node) {
+			list_del(&freeme->node);
+			deny_and_put_event(freeme);
+		}
+		kfree(supervisor);
+	}
+}
diff --git a/security/landlock/supervise.h b/security/landlock/supervise.h
new file mode 100644
index 000000000000..1fc3460335af
--- /dev/null
+++ b/security/landlock/supervise.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Implementation specific to landlock-supervise
+ *
+ * Copyright © 2025 Tingmao Wang <m@maowtm.org>
+ */
+
+#ifndef _SECURITY_LANDLOCK_SUPERVISE_H
+#define _SECURITY_LANDLOCK_SUPERVISE_H
+
+#include <linux/refcount.h>
+#include <linux/wait.h>
+#include <linux/path.h>
+#include <linux/pid.h>
+
+#include "access.h"
+#include "ruleset.h"
+
+struct landlock_supervisor {
+	refcount_t usage;
+	spinlock_t lock;
+	/* protected by @lock, contains landlock_supervise_event_kernel */
+	struct list_head event_queue;
+	/* protected by @lock, contains landlock_supervise_event_kernel */
+	struct list_head notified_events;
+	struct wait_queue_head poll_event_wq;
+	/* protected by @lock */
+	u32 next_event_id;
+};
+
+enum landlock_supervise_event_state {
+	LANDLOCK_SUPERVISE_EVENT_NEW,
+	LANDLOCK_SUPERVISE_EVENT_NOTIFIED,
+	LANDLOCK_SUPERVISE_EVENT_ALLOWED,
+	LANDLOCK_SUPERVISE_EVENT_DENIED,
+};
+
+struct landlock_supervise_event_kernel {
+	struct list_head node;
+	refcount_t usage;
+	enum landlock_supervise_event_state state;
+
+	/* more fields to come */
+};
+
+struct landlock_supervisor *landlock_create_supervisor(void);
+void landlock_get_supervisor(struct landlock_supervisor *const supervisor);
+void landlock_put_supervisor(struct landlock_supervisor *const supervisor);
+
+static inline void landlock_get_supervise_event(
+	struct landlock_supervise_event_kernel *const event)
+{
+	refcount_inc(&event->usage);
+}
+
+static inline void landlock_put_supervise_event(
+	struct landlock_supervise_event_kernel *const event)
+{
+	if (refcount_dec_and_test(&event->usage))
+		kfree(event);
+}
+
+#endif /* _SECURITY_LANDLOCK_SUPERVISE_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
  2025-03-04  1:12 ` [RFC PATCH 1/9] Define the supervisor and event structure Tingmao Wang
@ 2025-03-04  1:12 ` Tingmao Wang
  2025-03-04 19:49   ` Mickaël Salaün
  2025-03-04  1:12 ` [RFC PATCH 3/9] Adds a supervisor reference in the per-layer information Tingmao Wang
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:12 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

We need a place to store the supervisor pointer for each layer in
a domain.  Currently, the domain has a trailing flexible array
for handled access masks of each layer.  This patch extends it by
creating a separate landlock_ruleset_layer structure that will
hold this access mask, and make the ruleset's flexible array use
this structure instead.

An alternative is to use landlock_hierarchy, but I have chosen to
extend the FAM as this is makes it more clear the supervisor
pointer is tied to layers, just like access masks.

This patch doesn't make any functional changes nor add any
supervise specific stuff.  It is purely to pave the way for
future patches.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 security/landlock/ruleset.c  | 29 +++++++++---------
 security/landlock/ruleset.h  | 59 ++++++++++++++++++++++--------------
 security/landlock/syscalls.c |  2 +-
 3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 69742467a0cf..2cc6f7c5eb1b 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -31,9 +31,8 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
 {
 	struct landlock_ruleset *new_ruleset;
 
-	new_ruleset =
-		kzalloc(struct_size(new_ruleset, access_masks, num_layers),
-			GFP_KERNEL_ACCOUNT);
+	new_ruleset = kzalloc(struct_size(new_ruleset, layer_stack, num_layers),
+			      GFP_KERNEL_ACCOUNT);
 	if (!new_ruleset)
 		return ERR_PTR(-ENOMEM);
 	refcount_set(&new_ruleset->usage, 1);
@@ -104,8 +103,9 @@ static bool is_object_pointer(const enum landlock_key_type key_type)
 
 static struct landlock_rule *
 create_rule(const struct landlock_id id,
-	    const struct landlock_layer (*const layers)[], const u32 num_layers,
-	    const struct landlock_layer *const new_layer)
+	    const struct landlock_rule_layer (*const layers)[],
+	    const u32 num_layers,
+	    const struct landlock_rule_layer *const new_layer)
 {
 	struct landlock_rule *new_rule;
 	u32 new_num_layers;
@@ -201,7 +201,7 @@ static void build_check_ruleset(void)
  */
 static int insert_rule(struct landlock_ruleset *const ruleset,
 		       const struct landlock_id id,
-		       const struct landlock_layer (*const layers)[],
+		       const struct landlock_rule_layer (*const layers)[],
 		       const size_t num_layers)
 {
 	struct rb_node **walker_node;
@@ -284,7 +284,7 @@ static int insert_rule(struct landlock_ruleset *const ruleset,
 
 static void build_check_layer(void)
 {
-	const struct landlock_layer layer = {
+	const struct landlock_rule_layer layer = {
 		.level = ~0,
 		.access = ~0,
 	};
@@ -299,7 +299,7 @@ int landlock_insert_rule(struct landlock_ruleset *const ruleset,
 			 const struct landlock_id id,
 			 const access_mask_t access)
 {
-	struct landlock_layer layers[] = { {
+	struct landlock_rule_layer layers[] = { {
 		.access = access,
 		/* When @level is zero, insert_rule() extends @ruleset. */
 		.level = 0,
@@ -344,7 +344,7 @@ static int merge_tree(struct landlock_ruleset *const dst,
 	/* Merges the @src tree. */
 	rbtree_postorder_for_each_entry_safe(walker_rule, next_rule, src_root,
 					     node) {
-		struct landlock_layer layers[] = { {
+		struct landlock_rule_layer layers[] = { {
 			.level = dst->num_layers,
 		} };
 		const struct landlock_id id = {
@@ -389,8 +389,9 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
 		err = -EINVAL;
 		goto out_unlock;
 	}
-	dst->access_masks[dst->num_layers - 1] =
-		landlock_upgrade_handled_access_masks(src->access_masks[0]);
+	dst->layer_stack[dst->num_layers - 1].access_masks =
+		landlock_upgrade_handled_access_masks(
+			src->layer_stack[0].access_masks);
 
 	/* Merges the @src inode tree. */
 	err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
@@ -472,8 +473,8 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
 		goto out_unlock;
 	}
 	/* Copies the parent layer stack and leaves a space for the new layer. */
-	memcpy(child->access_masks, parent->access_masks,
-	       flex_array_size(parent, access_masks, parent->num_layers));
+	memcpy(child->layer_stack, parent->layer_stack,
+	       flex_array_size(parent, layer_stack, parent->num_layers));
 
 	if (WARN_ON_ONCE(!parent->hierarchy)) {
 		err = -EINVAL;
@@ -644,7 +645,7 @@ bool landlock_unmask_layers(const struct landlock_rule *const rule,
 	 * E.g. /a/b <execute> + /a <read> => /a/b <execute + read>
 	 */
 	for (layer_level = 0; layer_level < rule->num_layers; layer_level++) {
-		const struct landlock_layer *const layer =
+		const struct landlock_rule_layer *const layer =
 			&rule->layers[layer_level];
 		const layer_mask_t layer_bit = BIT_ULL(layer->level - 1);
 		const unsigned long access_req = access_request;
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 52f4f0af6ab0..a2605959f733 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -21,9 +21,10 @@
 #include "object.h"
 
 /**
- * struct landlock_layer - Access rights for a given layer
+ * struct landlock_rule_layer - Stores the access rights for a
+ * given layer in a rule.
  */
-struct landlock_layer {
+struct landlock_rule_layer {
 	/**
 	 * @level: Position of this layer in the layer stack.
 	 */
@@ -102,10 +103,11 @@ struct landlock_rule {
 	 */
 	u32 num_layers;
 	/**
-	 * @layers: Stack of layers, from the latest to the newest, implemented
-	 * as a flexible array member (FAM).
+	 * @layers: Stack of layers, from the latest to the newest,
+	 * implemented as a flexible array member (FAM). Only
+	 * contains layers that has a rule for this object.
 	 */
-	struct landlock_layer layers[] __counted_by(num_layers);
+	struct landlock_rule_layer layers[] __counted_by(num_layers);
 };
 
 /**
@@ -124,6 +126,18 @@ struct landlock_hierarchy {
 	refcount_t usage;
 };
 
+/**
+ * struct landlock_ruleset_layer - Store per-layer information
+ * within a domain (or a non-merged ruleset)
+ */
+struct landlock_ruleset_layer {
+	/**
+	 * @access_masks: Contains the subset of filesystem and
+	 * network actions that are restricted by a layer.
+	 */
+	struct access_masks access_masks;
+};
+
 /**
  * struct landlock_ruleset - Landlock ruleset
  *
@@ -187,18 +201,17 @@ struct landlock_ruleset {
 			 */
 			u32 num_layers;
 			/**
-			 * @access_masks: Contains the subset of filesystem and
-			 * network actions that are restricted by a ruleset.
-			 * A domain saves all layers of merged rulesets in a
-			 * stack (FAM), starting from the first layer to the
-			 * last one.  These layers are used when merging
-			 * rulesets, for user space backward compatibility
-			 * (i.e. future-proof), and to properly handle merged
-			 * rulesets without overlapping access rights.  These
-			 * layers are set once and never changed for the
-			 * lifetime of the ruleset.
+			 * @layer_stack: A domain saves all layers of merged
+			 * rulesets in a stack (FAM), starting from the first
+			 * layer to the last one.  These layers are used when
+			 * merging rulesets, for user space backward
+			 * compatibility (i.e. future-proof), and to properly
+			 * handle merged rulesets without overlapping access
+			 * rights.  These layers are set once and never
+			 * changed for the lifetime of the ruleset.
 			 */
-			struct access_masks access_masks[];
+			struct landlock_ruleset_layer
+				layer_stack[] __counted_by(num_layers);
 		};
 	};
 };
@@ -248,7 +261,7 @@ landlock_union_access_masks(const struct landlock_ruleset *const domain)
 
 	for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
 		union access_masks_all layer = {
-			.masks = domain->access_masks[layer_level],
+			.masks = domain->layer_stack[layer_level].access_masks,
 		};
 
 		matches.all |= layer.all;
@@ -296,7 +309,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset,
 
 	/* Should already be checked in sys_landlock_create_ruleset(). */
 	WARN_ON_ONCE(fs_access_mask != fs_mask);
-	ruleset->access_masks[layer_level].fs |= fs_mask;
+	ruleset->layer_stack[layer_level].access_masks.fs |= fs_mask;
 }
 
 static inline void
@@ -308,7 +321,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
 
 	/* Should already be checked in sys_landlock_create_ruleset(). */
 	WARN_ON_ONCE(net_access_mask != net_mask);
-	ruleset->access_masks[layer_level].net |= net_mask;
+	ruleset->layer_stack[layer_level].access_masks.net |= net_mask;
 }
 
 static inline void
@@ -319,7 +332,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset,
 
 	/* Should already be checked in sys_landlock_create_ruleset(). */
 	WARN_ON_ONCE(scope_mask != mask);
-	ruleset->access_masks[layer_level].scope |= mask;
+	ruleset->layer_stack[layer_level].access_masks.scope |= mask;
 }
 
 static inline access_mask_t
@@ -327,7 +340,7 @@ landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset,
 			    const u16 layer_level)
 {
 	/* Handles all initially denied by default access rights. */
-	return ruleset->access_masks[layer_level].fs |
+	return ruleset->layer_stack[layer_level].access_masks.fs |
 	       _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
 }
 
@@ -335,14 +348,14 @@ static inline access_mask_t
 landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
 			     const u16 layer_level)
 {
-	return ruleset->access_masks[layer_level].net;
+	return ruleset->layer_stack[layer_level].access_masks.net;
 }
 
 static inline access_mask_t
 landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
 			const u16 layer_level)
 {
-	return ruleset->access_masks[layer_level].scope;
+	return ruleset->layer_stack[layer_level].access_masks.scope;
 }
 
 bool landlock_unmask_layers(const struct landlock_rule *const rule,
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index a9760d252fc2..ead9b68168ad 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -313,7 +313,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
 		return -ENOMSG;
 
 	/* Checks that allowed_access matches the @ruleset constraints. */
-	mask = ruleset->access_masks[0].fs;
+	mask = landlock_get_fs_access_mask(ruleset, 0);
 	if ((path_beneath_attr.allowed_access | mask) != mask)
 		return -EINVAL;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 3/9] Adds a supervisor reference in the per-layer information
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
  2025-03-04  1:12 ` [RFC PATCH 1/9] Define the supervisor and event structure Tingmao Wang
  2025-03-04  1:12 ` [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules Tingmao Wang
@ 2025-03-04  1:12 ` Tingmao Wang
  2025-03-04  1:13 ` [RFC PATCH 4/9] User-space API for creating a supervisor-fd Tingmao Wang
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:12 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

Following from the previous patch, we now use the new per-layer struct to
store a reference to any supervisor attached to a layer (merged in a
domain or unmerged).

The supervisor is refcounted, and so we need to correctly get/put it when
inheriting a domain or when merging a layer.  This means looping through
all the layers and getting each supervisor that exists, as the domain
effectively stores a copy of all the inherited layers.

TODO: because we are now referencing the supervisor in the layer, the
event deny and cleanup code in landlock_put_supervisor won't work as
intended.  I didn't realize this until after finishing this set of
patches, so this will be addressed in a future series.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 security/landlock/ruleset.c   | 26 +++++++++++++++++++++++---
 security/landlock/ruleset.h   |  7 +++++++
 security/landlock/supervise.h |  6 ++++++
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 2cc6f7c5eb1b..2e93b8105cc9 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -26,6 +26,7 @@
 #include "limits.h"
 #include "object.h"
 #include "ruleset.h"
+#include "supervise.h"
 
 static struct landlock_ruleset *create_ruleset(const u32 num_layers)
 {
@@ -389,9 +390,14 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
 		err = -EINVAL;
 		goto out_unlock;
 	}
-	dst->layer_stack[dst->num_layers - 1].access_masks =
-		landlock_upgrade_handled_access_masks(
-			src->layer_stack[0].access_masks);
+	dst->layer_stack[dst->num_layers - 1] = (struct landlock_ruleset_layer){
+		.access_masks = landlock_upgrade_handled_access_masks(
+			src->layer_stack[0].access_masks),
+		.supervisor = src->layer_stack[0].supervisor,
+	};
+	if (dst->layer_stack[dst->num_layers - 1].supervisor)
+		landlock_get_supervisor(
+			dst->layer_stack[dst->num_layers - 1].supervisor);
 
 	/* Merges the @src inode tree. */
 	err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
@@ -447,6 +453,7 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
 			   struct landlock_ruleset *const child)
 {
 	int err = 0;
+	int layer;
 
 	might_sleep();
 	if (!parent)
@@ -475,6 +482,12 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
 	/* Copies the parent layer stack and leaves a space for the new layer. */
 	memcpy(child->layer_stack, parent->layer_stack,
 	       flex_array_size(parent, layer_stack, parent->num_layers));
+	/* Get the refcount of any supervisor copied over */
+	for (layer = 0; layer < child->num_layers; layer++) {
+		if (child->layer_stack[layer].supervisor)
+			landlock_get_supervisor(
+				child->layer_stack[layer].supervisor);
+	}
 
 	if (WARN_ON_ONCE(!parent->hierarchy)) {
 		err = -EINVAL;
@@ -492,6 +505,7 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
 static void free_ruleset(struct landlock_ruleset *const ruleset)
 {
 	struct landlock_rule *freeme, *next;
+	int layer;
 
 	might_sleep();
 	rbtree_postorder_for_each_entry_safe(freeme, next, &ruleset->root_inode,
@@ -505,6 +519,12 @@ static void free_ruleset(struct landlock_ruleset *const ruleset)
 #endif /* IS_ENABLED(CONFIG_INET) */
 
 	put_hierarchy(ruleset->hierarchy);
+	for (layer = 0; layer < ruleset->num_layers; layer++) {
+		struct landlock_supervisor *const supervisor =
+			ruleset->layer_stack[layer].supervisor;
+		if (supervisor)
+			landlock_put_supervisor(supervisor);
+	}
 	kfree(ruleset);
 }
 
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index a2605959f733..ed530643ea68 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -136,6 +136,13 @@ struct landlock_ruleset_layer {
 	 * network actions that are restricted by a layer.
 	 */
 	struct access_masks access_masks;
+	/**
+	 * @supervisor: If not null, this layer is operating in
+	 * supervisor mode.  Access denied by only supervised layers
+	 * are forwarded to the supervisor(s), who can then make a
+	 * decision whether to actually deny the access, or allow it.
+	 */
+	struct landlock_supervisor *supervisor;
 };
 
 /**
diff --git a/security/landlock/supervise.h b/security/landlock/supervise.h
index 1fc3460335af..febe26a11578 100644
--- a/security/landlock/supervise.h
+++ b/security/landlock/supervise.h
@@ -16,6 +16,12 @@
 #include "access.h"
 #include "ruleset.h"
 
+/**
+ * Each supervisor is associated with one active layer in a
+ * domain (or associated with a not-yet-active layer in a struct
+ * landlock_ruleset).  User-space interact with the event queue
+ * through a landlock_supervise_fd.
+ */
 struct landlock_supervisor {
 	refcount_t usage;
 	spinlock_t lock;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 4/9] User-space API for creating a supervisor-fd
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (2 preceding siblings ...)
  2025-03-04  1:12 ` [RFC PATCH 3/9] Adds a supervisor reference in the per-layer information Tingmao Wang
@ 2025-03-04  1:13 ` Tingmao Wang
  2025-03-05 16:09   ` Mickaël Salaün
  2025-03-04  1:13 ` [RFC PATCH 5/9] Define user structure for events and responses Tingmao Wang
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:13 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

We allow the user to pass in an additional flag to landlock_create_ruleset
which will make the ruleset operate in "supervise" mode, with a supervisor
attached. We create additional space in the landlock_ruleset_attr
structure to pass the newly created supervisor fd back to user-space.

The intention, while not implemented yet, is that the user-space will read
events from this fd and write responses back to it.

Note: need to investigate if fd clone on fork() is handled correctly, but
should be fine if it shares the struct file. We might also want to let the
user customize the flags on this fd, so that they can request no
O_CLOEXEC.

NOTE: despite this patch having a new uapi, I'm still very open to e.g.
re-using fanotify stuff instead (if that makes sense in the end). This is
just a PoC.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 include/uapi/linux/landlock.h |  10 ++++
 security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
 2 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index e1d2c27533b4..7bc1eb4859fb 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
 	 * resources (e.g. IPCs).
 	 */
 	__u64 scoped;
+	/**
+	 * @supervisor_fd: Placeholder to store the supervisor file
+	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
+	 */
+	__s32 supervisor_fd;
+	/**
+	 * @pad: Unused, must be zero.
+	 */
+	__u32 pad;
 };
 
 /*
@@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
  */
 /* clang-format off */
 #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
+#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
 /* clang-format on */
 
 /**
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index ead9b68168ad..adf7e77023b5 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -32,6 +32,7 @@
 #include "limits.h"
 #include "net.h"
 #include "ruleset.h"
+#include "supervise.h"
 #include "setup.h"
 
 static bool is_initialized(void)
@@ -99,8 +100,10 @@ static void build_check_abi(void)
 	ruleset_size = sizeof(ruleset_attr.handled_access_fs);
 	ruleset_size += sizeof(ruleset_attr.handled_access_net);
 	ruleset_size += sizeof(ruleset_attr.scoped);
+	ruleset_size += sizeof(ruleset_attr.supervisor_fd);
+	ruleset_size += sizeof(ruleset_attr.pad);
 	BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size);
-	BUILD_BUG_ON(sizeof(ruleset_attr) != 24);
+	BUILD_BUG_ON(sizeof(ruleset_attr) != 32);
 
 	path_beneath_size = sizeof(path_beneath_attr.allowed_access);
 	path_beneath_size += sizeof(path_beneath_attr.parent_fd);
@@ -151,16 +154,42 @@ static const struct file_operations ruleset_fops = {
 	.write = fop_dummy_write,
 };
 
-#define LANDLOCK_ABI_VERSION 6
+static int fop_supervisor_release(struct inode *const inode,
+				  struct file *const filp)
+{
+	struct landlock_supervisor *supervisor = filp->private_data;
+
+	landlock_put_supervisor(supervisor);
+	return 0;
+}
+
+static const struct file_operations supervisor_fops = {
+	.release = fop_supervisor_release,
+	/* TODO: read, write, poll, dup */
+	.read = fop_dummy_read,
+	.write = fop_dummy_write,
+};
+
+static int
+landlock_supervisor_open_fd(struct landlock_supervisor *const supervisor,
+			    const fmode_t mode)
+{
+	landlock_get_supervisor(supervisor);
+	return anon_inode_getfd("[landlock-supervisor]", &supervisor_fops,
+				supervisor, O_RDWR | O_CLOEXEC);
+}
+
+#define LANDLOCK_ABI_VERSION 7
 
 /**
  * sys_landlock_create_ruleset - Create a new ruleset
  *
- * @attr: Pointer to a &struct landlock_ruleset_attr identifying the scope of
- *        the new ruleset.
- * @size: Size of the pointed &struct landlock_ruleset_attr (needed for
- *        backward and forward compatibility).
- * @flags: Supported value: %LANDLOCK_CREATE_RULESET_VERSION.
+ * @attr:  Pointer to a &struct landlock_ruleset_attr identifying the scope of
+ *         the new ruleset.
+ * @size:  Size of the pointed &struct landlock_ruleset_attr (needed for
+ *         backward and forward compatibility).
+ * @flags: Supported value: %LANDLOCK_CREATE_RULESET_VERSION,
+ * 	       %LANDLOCK_CREATE_RULESET_SUPERVISE.
  *
  * This system call enables to create a new Landlock ruleset, and returns the
  * related file descriptor on success.
@@ -172,18 +201,21 @@ static const struct file_operations ruleset_fops = {
  * Possible returned errors are:
  *
  * - %EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
- * - %EINVAL: unknown @flags, or unknown access, or unknown scope, or too small @size;
+ * - %EINVAL: unknown @flags, or unknown access, or unknown
+ * 	          scope, or too small @size, or non-zero @pad;
  * - %E2BIG: @attr or @size inconsistencies;
  * - %EFAULT: @attr or @size inconsistencies;
  * - %ENOMSG: empty &landlock_ruleset_attr.handled_access_fs.
  */
 SYSCALL_DEFINE3(landlock_create_ruleset,
-		const struct landlock_ruleset_attr __user *const, attr,
-		const size_t, size, const __u32, flags)
+		struct landlock_ruleset_attr __user *const, attr, const size_t,
+		size, const __u32, flags)
 {
 	struct landlock_ruleset_attr ruleset_attr;
 	struct landlock_ruleset *ruleset;
+	struct landlock_supervisor *supervisor;
 	int err, ruleset_fd;
+	bool supervise = false;
 
 	/* Build-time checks. */
 	build_check_abi();
@@ -192,10 +224,16 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
 		return -EOPNOTSUPP;
 
 	if (flags) {
-		if ((flags == LANDLOCK_CREATE_RULESET_VERSION) && !attr &&
-		    !size)
+		if (flags == LANDLOCK_CREATE_RULESET_VERSION) {
+			if (attr || size)
+				return -EINVAL;
 			return LANDLOCK_ABI_VERSION;
-		return -EINVAL;
+		}
+		if (flags == LANDLOCK_CREATE_RULESET_SUPERVISE) {
+			supervise = true;
+		} else {
+			return -EINVAL;
+		}
 	}
 
 	/* Copies raw user space buffer. */
@@ -206,6 +244,13 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
 	if (err)
 		return err;
 
+	if (supervise && size < offsetofend(typeof(ruleset_attr), pad))
+		return -EINVAL;
+
+	if (size >= offsetofend(typeof(ruleset_attr), pad) &&
+	    ruleset_attr.pad != 0)
+		return -EINVAL;
+
 	/* Checks content (and 32-bits cast). */
 	if ((ruleset_attr.handled_access_fs | LANDLOCK_MASK_ACCESS_FS) !=
 	    LANDLOCK_MASK_ACCESS_FS)
@@ -227,11 +272,40 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
 	if (IS_ERR(ruleset))
 		return PTR_ERR(ruleset);
 
+	if (supervise) {
+		supervisor = landlock_create_supervisor();
+		if (IS_ERR(supervisor)) {
+			landlock_put_ruleset(ruleset);
+			return -ENOMEM;
+		}
+		/* Pass ownership of supervisor to ruleset struct */
+		ruleset->layer_stack[0].supervisor = supervisor;
+	}
+
 	/* Creates anonymous FD referring to the ruleset. */
 	ruleset_fd = anon_inode_getfd("[landlock-ruleset]", &ruleset_fops,
 				      ruleset, O_RDWR | O_CLOEXEC);
-	if (ruleset_fd < 0)
+	if (ruleset_fd < 0) {
 		landlock_put_ruleset(ruleset);
+		return ruleset_fd;
+	}
+
+	if (supervise) {
+		int supervisor_fd;
+
+		supervisor_fd = landlock_supervisor_open_fd(
+			ruleset->layer_stack[0].supervisor, O_RDWR | O_CLOEXEC);
+		if (supervisor_fd < 0) {
+			landlock_put_ruleset(ruleset);
+			return supervisor_fd;
+		}
+		if (copy_to_user(&attr->supervisor_fd, &supervisor_fd,
+				 sizeof(supervisor_fd))) {
+			landlock_put_ruleset(ruleset);
+			return -EFAULT;
+		}
+	}
+
 	return ruleset_fd;
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (3 preceding siblings ...)
  2025-03-04  1:13 ` [RFC PATCH 4/9] User-space API for creating a supervisor-fd Tingmao Wang
@ 2025-03-04  1:13 ` Tingmao Wang
  2025-03-04 19:49   ` Mickaël Salaün
  2025-03-04  1:13 ` [RFC PATCH 6/9] Creating supervisor events for filesystem operations Tingmao Wang
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:13 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

The two structures are designed to be passed via read and write
to the supervisor-fd.  Compile time check for no holes are added
to build_check_abi.

The event structure will be a dynamically sized structure with
possibly a NULL-terminating filename at the end.  This is so that
we can pass a raw filename to the supervisor for file creation
requests, without having the trouble of not being able to open a
fd to a file that has not been created.

NOTE: despite this patch having a new uapi, I'm still very open to e.g.
re-using fanotify stuff instead (if that makes sense in the end). This is
just a PoC.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 include/uapi/linux/landlock.h | 107 ++++++++++++++++++++++++++++++++++
 security/landlock/syscalls.c  |  28 +++++++++
 2 files changed, 135 insertions(+)

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 7bc1eb4859fb..b5645fdd998d 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -318,4 +318,111 @@ struct landlock_net_port_attr {
 #define LANDLOCK_SCOPE_SIGNAL		                (1ULL << 1)
 /* clang-format on*/
 
+/**
+ * DOC: supervisor
+ *
+ * Supervise mode
+ * ~~~~~~~~~~~~~~
+ *
+ * TODO
+ */
+
+typedef __u16 landlock_supervise_event_type_t;
+/* clang-format off */
+#define LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS         1
+#define LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS        2
+/* clang-format on */
+
+struct landlock_supervise_event_hdr {
+	/**
+	 * @type: Type of the event.
+	 */
+	landlock_supervise_event_type_t type;
+	/**
+	 * @length: Length of the entire struct
+	 * landlock_supervise_event including this header.
+	 */
+	__u16 length;
+	/**
+	 * @cookie: Opaque identifier to be included in the response.
+	 */
+	__u32 cookie;
+};
+
+struct landlock_supervise_event {
+	struct landlock_supervise_event_hdr hdr;
+	__u64 access_request;
+	__kernel_pid_t accessor;
+	union {
+		struct {
+			/**
+			 * @fd1: An open file descriptor for the file (open,
+			 * delete, execute, link, readdir, rename, truncate),
+			 * or the parent directory (for create operations
+			 * targeting its child) being accessed.  Must be
+			 * closed by the reader.
+			 *
+			 * If this points to a parent directory, @destname
+			 * will contain the target filename. If @destname is
+			 * empty, this points to the target file.
+			 */
+			int fd1;
+			/**
+			 * @fd2: For link or rename requests, a second file
+			 * descriptor for the target parent directory.  Must
+			 * be closed by the reader.  @destname contains the
+			 * destination filename.  This field is -1 if not
+			 * used.
+			 */
+			int fd2;
+			/**
+			 * @destname: A filename for a file creation target.
+			 *
+			 * If either of fd1 or fd2 points to a parent
+			 * directory rather than the target file, this is the
+			 * NULL-terminated name of the file that will be
+			 * newly created.
+			 *
+			 * Counting the NULL terminator, this field will
+			 * contain one or more NULL padding at the end so
+			 * that the length of the whole struct
+			 * landlock_supervise_event is a multiple of 8 bytes.
+			 *
+			 * This is a variable length member, and the length
+			 * including the terminating NULL(s) can be derived
+			 * from hdr.length - offsetof(struct
+			 * landlock_supervise_event, destname).
+			 */
+			char destname[];
+		};
+		struct {
+			__u16 port;
+		};
+	};
+};
+
+/* clang-format off */
+#define LANDLOCK_SUPERVISE_DECISION_DENY              0
+#define LANDLOCK_SUPERVISE_DECISION_ALLOW             1
+/* clang-format on */
+
+struct landlock_supervise_response {
+	/**
+	 * @length: Size of this structure.
+	 */
+	__u16 length;
+	/**
+	 * @decision: Whether to allow the request.
+	 */
+	__u8 decision;
+	/**
+	 * @pad: Reserved, must be zero.
+	 */
+	__u8 _reserved;
+	/**
+	 * @cookie: Cookie previously received in the request.
+	 */
+	__u32 cookie;
+};
+
 #endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index adf7e77023b5..f1080e7de0c7 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -91,6 +91,9 @@ static void build_check_abi(void)
 	struct landlock_path_beneath_attr path_beneath_attr;
 	struct landlock_net_port_attr net_port_attr;
 	size_t ruleset_size, path_beneath_size, net_port_size;
+	struct landlock_supervise_event *event;
+	struct landlock_supervise_response response;
+	size_t supervise_evt_size, supervise_response_size;
 
 	/*
 	 * For each user space ABI structures, first checks that there is no
@@ -114,6 +117,31 @@ static void build_check_abi(void)
 	net_port_size += sizeof(net_port_attr.port);
 	BUILD_BUG_ON(sizeof(net_port_attr) != net_port_size);
 	BUILD_BUG_ON(sizeof(net_port_attr) != 16);
+
+	/* Check that anything before the destname does not have holes */
+	supervise_evt_size = sizeof(event->hdr.type);
+	supervise_evt_size += sizeof(event->hdr.length);
+	supervise_evt_size += sizeof(event->hdr.cookie);
+	BUILD_BUG_ON(offsetofend(typeof(*event), hdr) != 8);
+	supervise_evt_size += sizeof(event->access_request);
+	supervise_evt_size += sizeof(event->accessor);
+	supervise_evt_size += sizeof(event->fd1);
+	supervise_evt_size += sizeof(event->fd2);
+	BUILD_BUG_ON(offsetof(typeof(*event), destname) != supervise_evt_size);
+	BUILD_BUG_ON(offsetof(typeof(*event), destname) != 28);
+
+	/*
+	 * Make sure this struct does not end up with stricter
+	 * alignment than 8
+	 */
+	BUILD_BUG_ON(__alignof__(typeof(*event)) != 8);
+
+	supervise_response_size = sizeof(response.length);
+	supervise_response_size += sizeof(response.decision);
+	supervise_response_size += sizeof(response._reserved);
+	supervise_response_size += sizeof(response.cookie);
+	BUILD_BUG_ON(sizeof(response) != supervise_response_size);
+	BUILD_BUG_ON(sizeof(response) != 8);
 }
 
 /* Ruleset handling */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 6/9] Creating supervisor events for filesystem operations
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (4 preceding siblings ...)
  2025-03-04  1:13 ` [RFC PATCH 5/9] Define user structure for events and responses Tingmao Wang
@ 2025-03-04  1:13 ` Tingmao Wang
  2025-03-04 19:50   ` Mickaël Salaün
  2025-03-04  1:13 ` [RFC PATCH 7/9] Implement fdinfo for ruleset and supervisor fd Tingmao Wang
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:13 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

NOTE from future me: This implementation which waits for user response
while blocking inside the current security_path_* hooks is problematic due
to taking exclusive inode lock on the parent directory, and while I have a
proposal for a solution, outlined below, I haven't managed to include the
code for that in this version of the patch. Thus for this commit in
particular I'm probably more looking for suggestions on the approach
rather than code review.  Please see the TODO section at the end of this
message before reviewing this patch.

----

This patch implements a proof-of-concept for modifying the current
landlock LSM hooks to send supervisor events and wait for responses, when
a supervised layer is involved.

In this design, access requests which would end up being denied by other
non-supervised landlock layers (or which would fail the normal inode
permission check anyways - but this is currently TODO, I only thought of
this afterwards) are denied straight away to avoid pointless supervisor
notifications.

Currently current_check_access_path only gets the path of the parent
directory for create/remove operations, which is not enough for what we
want to pass to the supervisor.  Therefore we extend it by passing in any
relevant child dentry (but see TODO below - this may not be possible with
the proper implementation).

This initial implementation doesn't handle links and renames, and for now
these operations behave as if no supervisor is present (and thus will be
denied, unless it is allowed by the layer rules).  Also note that we can
get spurious create requests if the program tries to O_CREAT open an
existing file that exists but not in the dcache (from my understanding).

Event IDs (referred to as an opaque cookie in the uapi) are currently
generated with a simple `next_event_id++`.  I considered using e.g. xarray
but decided to not for this PoC. Suggestions welcome. (Note that we have
to design our own event id even if we use an extension of fanotify, as
fanotify uses a file descriptor to identify events, which is not generic
enough for us)

----

TODO:

When testing this I realized that doing it this way means that for the
create/delete case, we end up holding an exclusive inode lock on the
parent directory while waiting for supervisor to respond (see namei.c -
security_path_mknod is called in may_o_create <- lookup_open which has an
exclusive lock if O_CREAT is passed), which will prevent all other tasks
from accessing that directory (regardless of whether or not they are under
landlock).

This is clearly unacceptable, but since landlock (and also this extension)
doesn't actually need a dentry for the child (which is allocated after the
inode lock), I think this is not unsolvable.  I'm experimenting with
creating a new LSM hook, something like security_pathname_mknod
(suggestions welcome), which will be called after we looked up the dentry
for the parent (to prevent racing symlinks TOCTOU), but before we take the
lock for it.  Such a hook can still take as argument the parent dentry,
plus name of the child (instead of a struct path for it).

Suggestions for alternative approaches are definitely welcome!

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 security/landlock/fs.c        | 134 ++++++++++++++++++++++++++++++++--
 security/landlock/supervise.c | 122 +++++++++++++++++++++++++++++++
 security/landlock/supervise.h | 106 ++++++++++++++++++++++++++-
 3 files changed, 354 insertions(+), 8 deletions(-)

diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 71b9dc331aae..5c147edb6fff 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -44,6 +44,7 @@
 #include "object.h"
 #include "ruleset.h"
 #include "setup.h"
+#include "supervise.h"
 
 /* Underlying object management */
 
@@ -924,10 +925,13 @@ static bool is_access_to_paths_allowed(
 }
 
 static int current_check_access_path(const struct path *const path,
+				     struct dentry *const child,
 				     access_mask_t access_request)
 {
 	const struct landlock_ruleset *const dom = get_current_fs_domain();
 	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
+	bool is_remove = !!(access_request & (LANDLOCK_ACCESS_FS_REMOVE_FILE |
+					      LANDLOCK_ACCESS_FS_REMOVE_DIR));
 
 	if (!dom)
 		return 0;
@@ -938,6 +942,29 @@ static int current_check_access_path(const struct path *const path,
 				       NULL, 0, NULL, NULL))
 		return 0;
 
+	if (landlock_has_supervisors(dom)) {
+		layer_mask_t pending_ask_supervise_layers =
+			landlock_layer_masks_to_denied_layers(
+				access_request, layer_masks,
+				sizeof(layer_masks), dom->num_layers);
+
+		WARN_ON_ONCE(!pending_ask_supervise_layers);
+
+		struct path child_path = *path;
+		if (child) {
+			child_path.dentry = child;
+		}
+
+		bool supervisor_allowed = landlock_ask_supervised_layers(
+			dom, pending_ask_supervise_layers,
+			LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS, access_request,
+			&child_path, NULL, child && !is_remove, false, 0);
+
+		if (supervisor_allowed) {
+			return 0;
+		}
+	}
+
 	return -EACCES;
 }
 
@@ -1092,6 +1119,8 @@ static bool collect_domain_accesses(
  * - 0 if access is allowed;
  * - -EXDEV if @old_dentry would inherit new access rights from @new_dir;
  * - -EACCES if file removal or creation is denied.
+ *
+ * TODO: implement interation wiht supervisors.
  */
 static int current_check_refer_path(struct dentry *const old_dentry,
 				    const struct path *const new_dir,
@@ -1415,38 +1444,43 @@ static int hook_path_rename(const struct path *const old_dir,
 static int hook_path_mkdir(const struct path *const dir,
 			   struct dentry *const dentry, const umode_t mode)
 {
-	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_DIR);
+	return current_check_access_path(dir, dentry,
+					 LANDLOCK_ACCESS_FS_MAKE_DIR);
 }
 
 static int hook_path_mknod(const struct path *const dir,
 			   struct dentry *const dentry, const umode_t mode,
 			   const unsigned int dev)
 {
-	return current_check_access_path(dir, get_mode_access(mode));
+	return current_check_access_path(dir, dentry, get_mode_access(mode));
 }
 
 static int hook_path_symlink(const struct path *const dir,
 			     struct dentry *const dentry,
 			     const char *const old_name)
 {
-	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_SYM);
+	return current_check_access_path(dir, dentry,
+					 LANDLOCK_ACCESS_FS_MAKE_SYM);
 }
 
 static int hook_path_unlink(const struct path *const dir,
 			    struct dentry *const dentry)
 {
-	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_REMOVE_FILE);
+	return current_check_access_path(dir, dentry,
+					 LANDLOCK_ACCESS_FS_REMOVE_FILE);
 }
 
 static int hook_path_rmdir(const struct path *const dir,
 			   struct dentry *const dentry)
 {
-	return current_check_access_path(dir, LANDLOCK_ACCESS_FS_REMOVE_DIR);
+	return current_check_access_path(dir, dentry,
+					 LANDLOCK_ACCESS_FS_REMOVE_DIR);
 }
 
 static int hook_path_truncate(const struct path *const path)
 {
-	return current_check_access_path(path, LANDLOCK_ACCESS_FS_TRUNCATE);
+	return current_check_access_path(path, NULL,
+					 LANDLOCK_ACCESS_FS_TRUNCATE);
 }
 
 /* File hooks */
@@ -1562,9 +1596,81 @@ static int hook_file_open(struct file *const file)
 	if ((open_access_request & allowed_access) == open_access_request)
 		return 0;
 
+	if (landlock_has_supervisors(dom)) {
+		layer_mask_t pending_ask_supervise_layers =
+			landlock_layer_masks_to_denied_layers(
+				open_access_request, layer_masks,
+				sizeof(layer_masks), dom->num_layers);
+
+		WARN_ON_ONCE(!pending_ask_supervise_layers);
+
+		/*
+		 * We don't need to ask the supervisor for optional
+		 * access right now - we can ask later.
+		 */
+
+		bool supervisor_allowed = landlock_ask_supervised_layers(
+			dom, pending_ask_supervise_layers,
+			LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS,
+			open_access_request, &file->f_path, NULL, false, false,
+			0);
+
+		if (supervisor_allowed) {
+			landlock_file(file)->allowed_access =
+				open_access_request;
+			return 0;
+		}
+	}
+
 	return -EACCES;
 }
 
+/*
+ * For any "optional" permissions (truncate and ioctl) which was
+ * not allowed at time a file was opened, we want to check with
+ * any supervised layers if they actually allow it at the time
+ * the user tries to do such an operation on the opened fd.  We
+ * can check for access on the path (using the opener's domain)
+ * as the opener can never re-gain permissions under landlock.
+ */
+static bool check_opened_file_access_supervisor(struct file *const file,
+						access_mask_t access_request)
+{
+	const struct landlock_ruleset *dom = landlock_get_applicable_domain(
+		landlock_cred(file->f_cred)->domain, any_fs);
+
+	if (landlock_has_supervisors(dom)) {
+		layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
+		bool allowed = is_access_to_paths_allowed(
+			dom, &file->f_path,
+			landlock_init_layer_masks(dom, access_request,
+						  &layer_masks,
+						  LANDLOCK_KEY_INODE),
+			&layer_masks, NULL, 0, NULL, NULL);
+		if (allowed) {
+			WARN_ONCE(
+				1,
+				"Access was previously not allowed, now it's allowed in the same domain. Landlock bug?");
+			return false;
+		}
+
+		layer_mask_t pending_ask_supervise_layers =
+			landlock_layer_masks_to_denied_layers(
+				access_request, layer_masks,
+				sizeof(layer_masks), dom->num_layers);
+		WARN_ON_ONCE(!pending_ask_supervise_layers);
+
+		bool supervisor_allowed = landlock_ask_supervised_layers(
+			dom, pending_ask_supervise_layers,
+			LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS, access_request,
+			&file->f_path, NULL, false, false, 0);
+
+		return supervisor_allowed;
+	}
+
+	return false;
+}
+
 static int hook_file_truncate(struct file *const file)
 {
 	/*
@@ -1579,6 +1685,12 @@ static int hook_file_truncate(struct file *const file)
 	 */
 	if (landlock_file(file)->allowed_access & LANDLOCK_ACCESS_FS_TRUNCATE)
 		return 0;
+
+	if (check_opened_file_access_supervisor(file,
+						LANDLOCK_ACCESS_FS_TRUNCATE)) {
+		return 0;
+	}
+
 	return -EACCES;
 }
 
@@ -1602,6 +1714,11 @@ static int hook_file_ioctl(struct file *file, unsigned int cmd,
 	if (is_masked_device_ioctl(cmd))
 		return 0;
 
+	if (check_opened_file_access_supervisor(file,
+						LANDLOCK_ACCESS_FS_IOCTL_DEV)) {
+		return 0;
+	}
+
 	return -EACCES;
 }
 
@@ -1625,6 +1742,11 @@ static int hook_file_ioctl_compat(struct file *file, unsigned int cmd,
 	if (is_masked_device_ioctl_compat(cmd))
 		return 0;
 
+	if (check_opened_file_access_supervisor(file,
+						LANDLOCK_ACCESS_FS_IOCTL_DEV)) {
+		return 0;
+	}
+
 	return -EACCES;
 }
 
diff --git a/security/landlock/supervise.c b/security/landlock/supervise.c
index a3bb6928f453..3f31a89c4c96 100644
--- a/security/landlock/supervise.c
+++ b/security/landlock/supervise.c
@@ -12,6 +12,12 @@
 
 #include "supervise.h"
 
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "landlock-supervise: " fmt
+
 struct landlock_supervisor *landlock_create_supervisor(void)
 {
 	struct landlock_supervisor *supervisor;
@@ -70,3 +76,119 @@ void landlock_put_supervisor(struct landlock_supervisor *const supervisor)
 		kfree(supervisor);
 	}
 }
+
+/**
+ * landlock_ask_supervised_layers - check if all denied layers
+ * are supervised, and if yes, ask all of them for permission.
+ *
+ * Return whether access should be allowed.  If denied_layers
+ * contains any non-supervised layer, will return false without
+ * making any supervisor event.
+ *
+ * Caller owns any paths passed in, we might get refs.
+ */
+bool landlock_ask_supervised_layers(
+	const struct landlock_ruleset *const domain,
+	const layer_mask_t denied_layers,
+	const landlock_supervise_event_type_t request_type,
+	const access_mask_t access_request, const struct path *const path1,
+	const struct path *const path2, const bool path1_new,
+	const bool path2_new, const __u16 port)
+{
+	size_t layer_level;
+	unsigned long denied_layers_ = denied_layers;
+
+	if (WARN_ON_ONCE(!denied_layers)) {
+		return true;
+	}
+
+	for_each_set_bit(layer_level, &denied_layers_, domain->num_layers) {
+		if (!domain->layer_stack[layer_level].supervisor) {
+			return false;
+		}
+	}
+
+	/*
+	 * All denied layers are supervisor layers, so we just ask
+	 * them in turn. There's good argument for either order (top
+	 * -> bottom, or the other way), so we just do the easiest
+	 * thing here.
+	 */
+
+	for_each_set_bit(layer_level, &denied_layers_, domain->num_layers) {
+		struct landlock_supervisor *const supervisor =
+			domain->layer_stack[layer_level].supervisor;
+
+		/*
+		 * supervisor will stay valid here because we're blocking
+		 * this thread which references the layer, which in terms
+		 * references the supervisor.
+		 */
+
+		/* TODO: memchg supervisor owner then allocate with account */
+		struct landlock_supervise_event_kernel *event __free(
+			landlock_put_supervise_event) =
+			kzalloc(sizeof(*event), GFP_KERNEL_ACCOUNT);
+
+		int rc;
+
+		if (!event) {
+			pr_alert(
+				"failed to allocate memory for supervisor event\n");
+			return false;
+		}
+
+		refcount_set(&event->usage, 1);
+		event->state = LANDLOCK_SUPERVISE_EVENT_NEW;
+
+		event->type = request_type;
+		event->access_request = access_request;
+		event->accessor = get_pid(task_pid(current));
+		switch (request_type) {
+		case LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS:
+			if (path1) {
+				path_get(path1);
+				event->target_1 = *path1;
+				event->target_1_is_new = path1_new;
+			}
+			if (path2) {
+				path_get(path2);
+				event->target_2 = *path2;
+				event->target_2_is_new = path2_new;
+			}
+			break;
+		case LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS:
+			event->port = port;
+			break;
+		}
+
+		if (WARN_ON(!supervisor)) {
+			/*
+			 * We checked all denied layers are supervised
+			 * earlier...
+			 */
+			return false;
+		}
+
+		spin_lock(&supervisor->lock);
+		event->event_id = supervisor->next_event_id++;
+		landlock_get_supervise_event(event);
+		list_add_tail(&event->node, &supervisor->event_queue);
+		spin_unlock(&supervisor->lock);
+		wake_up(&supervisor->poll_event_wq);
+
+		rc = wait_var_event_killable(
+			event, LANDLOCK_SUPERVISE_EVENT_HANDLED(event));
+		if (rc) {
+			/* Task died, doesn't matter what we say */
+			return false;
+		}
+		if (event->state != LANDLOCK_SUPERVISE_EVENT_ALLOWED) {
+			return false;
+		}
+
+		/* event has __free */
+	}
+
+	return true;
+}
diff --git a/security/landlock/supervise.h b/security/landlock/supervise.h
index febe26a11578..10fc274fabb7 100644
--- a/security/landlock/supervise.h
+++ b/security/landlock/supervise.h
@@ -12,6 +12,7 @@
 #include <linux/wait.h>
 #include <linux/path.h>
 #include <linux/pid.h>
+#include <uapi/linux/landlock.h>
 
 #include "access.h"
 #include "ruleset.h"
@@ -46,9 +47,56 @@ struct landlock_supervise_event_kernel {
 	refcount_t usage;
 	enum landlock_supervise_event_state state;
 
-	/* more fields to come */
+	/* Cookie as presented to user-space */
+	u32 event_id;
+
+	landlock_supervise_event_type_t type;
+	access_mask_t access_request;
+	struct pid *accessor;
+	union {
+		struct {
+			/**
+			 * @target_1: The first (and may be the only, for
+			 * most requests) target path. To expose as much
+			 * useful information to the supervisor as possible,
+			 * for file creation and deletion, this points to the
+			 * actual path being created (or deleted), rather
+			 * than the parent directory. Note that for the
+			 * create case, this means that the dentry will be
+			 * negative (unless we end up in some horrible race).
+			 * In the create case, target_1_is_new is set, so
+			 * that we know to pass the parent as the fd to the
+			 * user-space supervisor, and fill destname with the
+			 * name of the file.
+			 *
+			 * For refer (link and rename), this points to the
+			 * source (or simply the first argument in case of
+			 * exchange) being linked. It will necessarily have
+			 * to be an existing file (even though the dentry may
+			 * turn negative).
+			 */
+			struct path target_1;
+			/**
+			 * @target_2: The destination path for link and
+			 * rename (or simply the second argument in case of
+			 * exchange). target_2_is_new will be set unless this
+			 * is an exchange.
+			 */
+			struct path target_2;
+
+			u8 target_1_is_new : 1;
+			u8 target_2_is_new : 1;
+		};
+		struct {
+			__u16 port;
+		};
+	};
 };
 
+#define LANDLOCK_SUPERVISE_EVENT_HANDLED(event)                \
+	((event)->state == LANDLOCK_SUPERVISE_EVENT_ALLOWED || \
+	 (event)->state == LANDLOCK_SUPERVISE_EVENT_DENIED)
+
 struct landlock_supervisor *landlock_create_supervisor(void);
 void landlock_get_supervisor(struct landlock_supervisor *const supervisor);
 void landlock_put_supervisor(struct landlock_supervisor *const supervisor);
@@ -62,8 +110,62 @@ static inline void landlock_get_supervise_event(
 static inline void landlock_put_supervise_event(
 	struct landlock_supervise_event_kernel *const event)
 {
-	if (refcount_dec_and_test(&event->usage))
+	if (refcount_dec_and_test(&event->usage)) {
+		switch (event->type) {
+		case LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS:
+			if (event->target_1.dentry)
+				path_put(&event->target_1);
+			if (event->target_2.dentry)
+				path_put(&event->target_2);
+			break;
+		case LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS:
+			break;
+		}
+		put_pid(event->accessor);
 		kfree(event);
+	}
+}
+
+DEFINE_FREE(landlock_put_supervise_event,
+	    struct landlock_supervise_event_kernel *,
+	    if (_T) landlock_put_supervise_event(_T))
+
+static inline bool
+landlock_has_supervisors(const struct landlock_ruleset *const domain)
+{
+	size_t layer_level;
+	for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
+		if (domain->layer_stack[layer_level].supervisor)
+			return true;
+	}
+	return false;
 }
 
+static inline layer_mask_t landlock_layer_masks_to_denied_layers(
+	const access_mask_t access_request, const layer_mask_t layer_masks[],
+	const size_t masks_array_size, const int num_layers)
+{
+	unsigned long access_req = access_request;
+	layer_mask_t denied_layers = 0;
+	size_t layer_level;
+	unsigned long access_bit;
+
+	for (layer_level = 0; layer_level < num_layers; layer_level++) {
+		for_each_set_bit(access_bit, &access_req, masks_array_size) {
+			if (layer_masks[access_bit] & BIT_ULL(layer_level))
+				denied_layers |= BIT_ULL(layer_level);
+		}
+	}
+
+	return denied_layers;
+}
+
+bool landlock_ask_supervised_layers(
+	const struct landlock_ruleset *const domain,
+	const layer_mask_t denied_layers,
+	const landlock_supervise_event_type_t request_type,
+	const access_mask_t access_request, const struct path *const path1,
+	const struct path *const path2, const bool path1_new,
+	const bool path2_new, const __u16 port);
+
 #endif /* _SECURITY_LANDLOCK_SUPERVISE_H */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 7/9] Implement fdinfo for ruleset and supervisor fd
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (5 preceding siblings ...)
  2025-03-04  1:13 ` [RFC PATCH 6/9] Creating supervisor events for filesystem operations Tingmao Wang
@ 2025-03-04  1:13 ` Tingmao Wang
  2025-03-04  1:13 ` [RFC PATCH 8/9] Implement fops for supervisor-fd Tingmao Wang
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:13 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

Purely for ease of debugging. Shows whether a ruleset is in
supervisor mode, and for the supervisor fd, any events.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 include/uapi/linux/landlock.h |   2 +
 security/landlock/syscalls.c  | 146 ++++++++++++++++++++++++++++++++++
 2 files changed, 148 insertions(+)

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index b5645fdd998d..2b2a21c1b6cf 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -270,6 +270,7 @@ struct landlock_net_port_attr {
 #define LANDLOCK_ACCESS_FS_TRUNCATE			(1ULL << 14)
 #define LANDLOCK_ACCESS_FS_IOCTL_DEV			(1ULL << 15)
 /* clang-format on */
+/* Add extra entries to access_request_to_string too */
 
 /**
  * DOC: net_access
@@ -292,6 +293,7 @@ struct landlock_net_port_attr {
 #define LANDLOCK_ACCESS_NET_BIND_TCP			(1ULL << 0)
 #define LANDLOCK_ACCESS_NET_CONNECT_TCP			(1ULL << 1)
 /* clang-format on */
+/* Add extra entries to access_request_to_string too */
 
 /**
  * DOC: scope
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index f1080e7de0c7..3018e3663173 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -170,6 +170,17 @@ static ssize_t fop_dummy_write(struct file *const filp,
 	return -EINVAL;
 }
 
+static void fop_ruleset_fdinfo(struct seq_file *const m, struct file *const f)
+{
+	struct landlock_ruleset *const ruleset = f->private_data;
+
+	seq_printf(m, "num_rules: %d\n", ruleset->num_rules);
+	if (ruleset->layer_stack[0].supervisor)
+		seq_puts(m, "supervisor: yes\n");
+	else
+		seq_puts(m, "supervisor: no\n");
+}
+
 /*
  * A ruleset file descriptor enables to build a ruleset by adding (i.e.
  * writing) rule after rule, without relying on the task's context.  This
@@ -180,6 +191,7 @@ static const struct file_operations ruleset_fops = {
 	.release = fop_ruleset_release,
 	.read = fop_dummy_read,
 	.write = fop_dummy_write,
+	.show_fdinfo = fop_ruleset_fdinfo,
 };
 
 static int fop_supervisor_release(struct inode *const inode,
@@ -191,11 +203,145 @@ static int fop_supervisor_release(struct inode *const inode,
 	return 0;
 }
 
+static const char *
+event_state_to_string(enum landlock_supervise_event_state state)
+{
+	switch (state) {
+	case LANDLOCK_SUPERVISE_EVENT_NEW:
+		return "new";
+	case LANDLOCK_SUPERVISE_EVENT_NOTIFIED:
+		return "notified";
+	case LANDLOCK_SUPERVISE_EVENT_ALLOWED:
+		return "allowed";
+	case LANDLOCK_SUPERVISE_EVENT_DENIED:
+		return "denied";
+	default:
+		WARN_ONCE(1, "unknown event state\n");
+		return "unknown";
+	}
+}
+
+static void
+access_request_to_string(const landlock_supervise_event_type_t access_type,
+			 const access_mask_t access_request, struct seq_file *m)
+{
+	switch (access_type) {
+	case LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS:
+		if (access_request & LANDLOCK_ACCESS_FS_EXECUTE)
+			seq_puts(m, "FS_EXECUTE ");
+		if (access_request & LANDLOCK_ACCESS_FS_WRITE_FILE)
+			seq_puts(m, "FS_WRITE_FILE ");
+		if (access_request & LANDLOCK_ACCESS_FS_READ_FILE)
+			seq_puts(m, "FS_READ_FILE ");
+		if (access_request & LANDLOCK_ACCESS_FS_READ_DIR)
+			seq_puts(m, "FS_READ_DIR ");
+		if (access_request & LANDLOCK_ACCESS_FS_REMOVE_DIR)
+			seq_puts(m, "FS_REMOVE_DIR ");
+		if (access_request & LANDLOCK_ACCESS_FS_REMOVE_FILE)
+			seq_puts(m, "FS_REMOVE_FILE ");
+		if (access_request & LANDLOCK_ACCESS_FS_MAKE_CHAR)
+			seq_puts(m, "FS_MAKE_CHAR ");
+		if (access_request & LANDLOCK_ACCESS_FS_MAKE_DIR)
+			seq_puts(m, "FS_MAKE_DIR ");
+		if (access_request & LANDLOCK_ACCESS_FS_MAKE_REG)
+			seq_puts(m, "FS_MAKE_REG ");
+		if (access_request & LANDLOCK_ACCESS_FS_MAKE_SOCK)
+			seq_puts(m, "FS_MAKE_SOCK ");
+		if (access_request & LANDLOCK_ACCESS_FS_MAKE_FIFO)
+			seq_puts(m, "FS_MAKE_FIFO ");
+		if (access_request & LANDLOCK_ACCESS_FS_MAKE_BLOCK)
+			seq_puts(m, "FS_MAKE_BLOCK ");
+		if (access_request & LANDLOCK_ACCESS_FS_MAKE_SYM)
+			seq_puts(m, "FS_MAKE_SYM ");
+		if (access_request & LANDLOCK_ACCESS_FS_REFER)
+			seq_puts(m, "FS_REFER ");
+		if (access_request & LANDLOCK_ACCESS_FS_TRUNCATE)
+			seq_puts(m, "FS_TRUNCATE ");
+		if (access_request & LANDLOCK_ACCESS_FS_IOCTL_DEV)
+			seq_puts(m, "FS_IOCTL_DEV ");
+		break;
+	case LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS:
+		if (access_request & LANDLOCK_ACCESS_NET_BIND_TCP)
+			seq_puts(m, "NET_BIND_TCP ");
+		if (access_request & LANDLOCK_ACCESS_NET_CONNECT_TCP)
+			seq_puts(m, "NET_CONNECT_TCP ");
+		break;
+	}
+}
+
+static void fop_supervisor_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct landlock_supervisor *const supervisor = f->private_data;
+	struct landlock_supervise_event_kernel *event;
+
+	spin_lock(&supervisor->lock);
+
+	size_t cnt = list_count_nodes(&supervisor->event_queue);
+	seq_printf(m, "num_events: %zu\n", cnt);
+	list_for_each_entry(event, &supervisor->event_queue, node) {
+		struct task_struct *task =
+			get_pid_task(event->accessor, PIDTYPE_PID);
+
+		seq_puts(m, "event:\n");
+		if (task) {
+			seq_printf(m, "\taccessor: %s[%d]\n", task->comm,
+				   task->pid);
+			put_task_struct(task);
+		} else {
+			seq_puts(m, "\taccessor: defunct\n");
+		}
+
+		if (event->type == LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS) {
+			seq_puts(m, "\taccess: filesystem\n");
+			seq_printf(m, "\taccess_request: %llu ",
+				   (unsigned long long)event->access_request);
+			access_request_to_string(event->type,
+						 event->access_request, m);
+			seq_puts(m, "\n");
+			if (event->target_1.dentry) {
+				/*
+				 * ok to access since event owns a ref to the
+				 * path, and we have event list spin lock.
+				 */
+				if (event->target_1_is_new) {
+					seq_puts(m, "\ttarget_1 (new): ");
+				} else {
+					seq_puts(m, "\ttarget_1: ");
+				}
+				seq_path(m, &event->target_1, "");
+				seq_puts(m, "\n");
+			}
+			if (event->target_2.dentry) {
+				if (event->target_2_is_new) {
+					seq_puts(m, "\ttarget_2 (new): ");
+				} else {
+					seq_puts(m, "\ttarget_2: ");
+				}
+				seq_path(m, &event->target_2, "");
+				seq_puts(m, "\n");
+			}
+		} else if (event->type ==
+			   LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS) {
+			seq_puts(m, "\taccess: network\n");
+			seq_printf(m, "\tport: %u\n",
+				   (unsigned int)event->port);
+		} else {
+			WARN(1, "unknown event key type\n");
+		}
+
+		seq_printf(m, "\tstate: %s\n",
+			   event_state_to_string(event->state));
+	}
+
+	spin_unlock(&supervisor->lock);
+}
+
 static const struct file_operations supervisor_fops = {
 	.release = fop_supervisor_release,
 	/* TODO: read, write, poll, dup */
 	.read = fop_dummy_read,
 	.write = fop_dummy_write,
+	.show_fdinfo = fop_supervisor_fdinfo,
 };
 
 static int
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 8/9] Implement fops for supervisor-fd
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (6 preceding siblings ...)
  2025-03-04  1:13 ` [RFC PATCH 7/9] Implement fdinfo for ruleset and supervisor fd Tingmao Wang
@ 2025-03-04  1:13 ` Tingmao Wang
  2025-03-04  1:13 ` [RFC PATCH 9/9] Enhance the sandboxer example to support landlock-supervise Tingmao Wang
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:13 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

This patch exposes the events to user-space via read and receives response
back via writes to the fd.

We will set aside the problem of how to handle situations where the
supervisor don't actually have the permission to open a fd for the path
for now (and just deny the event on any error), but note that landlock
does not restrict opening of O_PATH fds, and so at least a supervisor
supervising itself is not completely out of the question (but the
usefulness of this is perhaps questionable).

NOTE: despite this patch having a new uapi, I'm still very open to e.g.
re-using fanotify stuff instead (if that makes sense in the end). This is
just a PoC.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 security/landlock/syscalls.c | 349 ++++++++++++++++++++++++++++++++++-
 1 file changed, 346 insertions(+), 3 deletions(-)

diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 3018e3663173..7d191c946ecc 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -203,6 +203,348 @@ static int fop_supervisor_release(struct inode *const inode,
 	return 0;
 }
 
+/**
+ * Lifetime of return value is tied to p.
+ */
+static struct path p_parent(struct path p)
+{
+	struct path parent_path = { .mnt = p.mnt,
+				    .dentry = p.dentry->d_parent };
+	return parent_path;
+}
+
+/**
+ * Open an O_PATH fd of a target file for passing to the
+ * supervisor.
+ */
+static int supervise_fs_fd_open_install(struct path *path)
+{
+	int fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		pr_warn("get_unused_fd_flags: %pe\n", ERR_PTR(fd));
+		return fd;
+	}
+	struct file *f = dentry_open(path, O_PATH | O_CLOEXEC, current_cred());
+	if (IS_ERR(f)) {
+		pr_warn("Failed to open fd in supervisor: %ld\n", PTR_ERR(f));
+		put_unused_fd(fd);
+		return PTR_ERR(f);
+	}
+	fd_install(fd, f);
+	return fd;
+}
+
+static ssize_t fop_supervisor_read(struct file *const filp,
+				   char __user *const buf, const size_t size,
+				   loff_t *const ppos)
+{
+	struct landlock_supervisor *supervisor = filp->private_data;
+	struct landlock_supervise_event_kernel *event = NULL;
+	bool found = false;
+	struct landlock_supervise_event *user_event = NULL;
+	size_t destname_size = 0, event_size = 0;
+	const size_t dest_offset =
+		offsetof(struct landlock_supervise_event, destname);
+	const char *destname = NULL; /* Lifetime tied to event */
+	int fd1 = -1, fd2 = -1, ret = 0;
+	bool nonblock = filp->f_flags & O_NONBLOCK;
+	struct path parent_path;
+
+	if (WARN_ON(!supervisor))
+		return -ENODEV;
+
+	if (size < sizeof(struct landlock_supervise_event))
+		return -EINVAL;
+
+retry:
+	spin_lock(&supervisor->lock);
+
+	/*
+	 * Find the first new event (but really, all events in this
+	 * list should be new)
+	 */
+	list_for_each_entry(event, &supervisor->event_queue, node) {
+		if (event->state == LANDLOCK_SUPERVISE_EVENT_NEW) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		spin_unlock(&supervisor->lock);
+		if (nonblock) {
+			return -EAGAIN;
+		}
+
+		/*
+		 * Wait for events to be added to the queue.
+		 * Not sure if we can call list_empty() without the lock
+		 * here, hence true.
+		 */
+		ret = wait_event_interruptible(supervisor->poll_event_wq, true);
+		if (ret)
+			return ret;
+
+		goto retry;
+	}
+
+	/*
+	 * We take the event out of the list and let other readers
+	 * carry on.  We take over the event's ownership from the
+	 * list (hence no get/put).
+	 */
+	list_del(&event->node);
+	spin_unlock(&supervisor->lock);
+
+	if (event->type == LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS) {
+		struct dentry *dest_dentry;
+
+		if (WARN_ON(event->target_1_is_new && event->target_2_is_new)) {
+			ret = -EAGAIN;
+			goto fail_deny;
+		}
+
+		/*
+		 * Get destname out here so that we know the event's size.
+		 * We separate the lifetime of destname away from the
+		 * kernel event so we can move the copy outside of lock.
+		 */
+		if (event->target_1.dentry && event->target_1_is_new) {
+			dest_dentry = event->target_1.dentry;
+			destname = (char *)dest_dentry->d_name.name;
+			destname_size = dest_dentry->d_name.len + 1;
+		} else if (event->target_2.dentry && event->target_2_is_new) {
+			dest_dentry = event->target_2.dentry;
+			destname = (char *)dest_dentry->d_name.name;
+			destname_size = dest_dentry->d_name.len + 1;
+		}
+	}
+
+	event_size = ALIGN(dest_offset + destname_size,
+			   __alignof__(typeof(*user_event)));
+
+	if (event_size > size) {
+		ret = -EINVAL;
+		goto fail_readd_event;
+	}
+
+	/* We will copy the destname directly to user buffer */
+	user_event =
+		kzalloc(sizeof(struct landlock_supervise_event), GFP_KERNEL);
+	if (!user_event)
+		return -ENOMEM;
+
+	user_event->hdr.type = event->type;
+	user_event->hdr.length = event_size;
+	user_event->hdr.cookie = event->event_id;
+	user_event->access_request = event->access_request;
+	user_event->accessor = pid_vnr(event->accessor);
+
+	/* Set up the appropriate file descriptors based on the type */
+	if (event->type == LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS) {
+		if (event->target_1.dentry) {
+			if (event->target_1_is_new) {
+				parent_path = p_parent(event->target_1);
+				fd1 = supervise_fs_fd_open_install(
+					&parent_path);
+				if (fd1 < 0) {
+					ret = fd1;
+					goto fail_deny_or_readd;
+				}
+			} else {
+				fd1 = supervise_fs_fd_open_install(
+					&event->target_1);
+				if (fd1 < 0) {
+					ret = fd1;
+					goto fail_deny_or_readd;
+				}
+			}
+		}
+
+		if (event->target_2.dentry) {
+			if (event->target_2_is_new) {
+				parent_path = p_parent(event->target_2);
+				fd2 = supervise_fs_fd_open_install(
+					&parent_path);
+				if (fd2 < 0) {
+					ret = fd2;
+					goto fail_deny_or_readd;
+				}
+			} else {
+				fd2 = supervise_fs_fd_open_install(
+					&event->target_2);
+				if (fd2 < 0) {
+					ret = fd2;
+					goto fail_deny_or_readd;
+				}
+			}
+		}
+	} else if (event->type == LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS) {
+		user_event->port = event->port;
+	}
+
+	user_event->fd1 = fd1;
+	user_event->fd2 = fd2;
+
+	/* Non-variable-sized part */
+	if (copy_to_user(buf, user_event, dest_offset)) {
+		ret = -EFAULT;
+		goto fail_readd_event;
+	}
+
+	/* destname */
+	if (destname && destname_size > 0) {
+		if (copy_to_user(buf + dest_offset, destname, destname_size)) {
+			ret = -EFAULT;
+			goto fail_readd_event;
+		}
+	}
+
+	/* Zero out any padding bytes */
+	if (event_size > dest_offset + destname_size) {
+		size_t padding_len = event_size - dest_offset - destname_size;
+		if (clear_user(buf + dest_offset + destname_size,
+			       padding_len)) {
+			ret = -EFAULT;
+			goto fail_readd_event;
+		}
+	}
+
+	ret = event_size;
+	event->state = LANDLOCK_SUPERVISE_EVENT_NOTIFIED;
+	/* No decision yet, don't wake up! */
+	spin_lock(&supervisor->lock);
+	list_add(&event->node, &supervisor->notified_events);
+	event = NULL;
+	spin_unlock(&supervisor->lock);
+	goto free;
+
+fail_deny_or_readd:
+	if (ret == -EINTR)
+		goto fail_readd_event;
+	else
+		goto fail_deny;
+
+fail_readd_event:
+	WARN_ON(event->state != LANDLOCK_SUPERVISE_EVENT_NEW);
+	spin_lock(&supervisor->lock);
+	list_add(&event->node, &supervisor->event_queue);
+	event = NULL;
+	spin_unlock(&supervisor->lock);
+	goto free;
+
+fail_deny:
+	event->state = LANDLOCK_SUPERVISE_EVENT_DENIED;
+	wake_up_var(event);
+	landlock_put_supervise_event(event);
+	event = NULL;
+	goto free;
+
+free:
+	WARN_ON(event);
+	if (fd1 >= 0)
+		put_unused_fd(fd1);
+	if (fd2 >= 0)
+		put_unused_fd(fd2);
+	kfree(user_event);
+	return ret;
+}
+
+static __poll_t fop_supervisor_poll(struct file *file, poll_table *wait)
+{
+	struct landlock_supervisor *supervisor = file->private_data;
+	__poll_t mask = 0;
+
+	poll_wait(file, &supervisor->poll_event_wq, wait);
+
+	spin_lock(&supervisor->lock);
+	if (!list_empty(&supervisor->event_queue))
+		mask |= POLLIN | POLLRDNORM;
+	spin_unlock(&supervisor->lock);
+
+	return mask;
+}
+
+static ssize_t fop_supervisor_write(struct file *const filp,
+				    const char __user *const buf,
+				    const size_t size, loff_t *const ppos)
+{
+	struct landlock_supervisor *supervisor = filp->private_data;
+	struct landlock_supervise_response response;
+	struct landlock_supervise_event_kernel *event;
+	size_t bytes_processed = 0;
+	bool found;
+
+	/* We need at least one complete response */
+	if (size < sizeof(response))
+		return -EINVAL;
+
+	while (bytes_processed + sizeof(response) <= size) {
+		if (copy_from_user(&response, buf + bytes_processed,
+				   sizeof(response)))
+			return -EFAULT;
+
+		if (response.length != sizeof(response))
+			return -EINVAL;
+
+		spin_lock(&supervisor->lock);
+
+		/* Find the event with matching cookie */
+		found = false;
+		list_for_each_entry(event, &supervisor->notified_events, node) {
+			if (event->event_id == response.cookie) {
+				found = true;
+				break;
+			}
+		}
+
+		if (!found) {
+			spin_unlock(&supervisor->lock);
+			pr_warn("Unknown supervise event cookie: %u\n",
+				response.cookie);
+			event = NULL;
+			goto ret;
+		}
+
+		list_del(&event->node);
+		spin_unlock(&supervisor->lock);
+
+		if (WARN_ON(LANDLOCK_SUPERVISE_EVENT_HANDLED(event))) {
+			bytes_processed += sizeof(response);
+			landlock_put_supervise_event(event);
+			event = NULL;
+			continue;
+		}
+
+		if (response.decision == LANDLOCK_SUPERVISE_DECISION_ALLOW)
+			event->state = LANDLOCK_SUPERVISE_EVENT_ALLOWED;
+		else if (response.decision == LANDLOCK_SUPERVISE_DECISION_DENY)
+			event->state = LANDLOCK_SUPERVISE_EVENT_DENIED;
+		else {
+			pr_warn("Invalid supervise event decision: %u\n",
+				response.decision);
+			goto fail_re_add;
+		}
+
+		wake_up_var(event);
+		landlock_put_supervise_event(event);
+		event = NULL;
+
+		bytes_processed += sizeof(response);
+	}
+	goto ret;
+
+fail_re_add:
+	spin_lock(&supervisor->lock);
+	list_add(&event->node, &supervisor->notified_events);
+	event = NULL;
+	spin_unlock(&supervisor->lock);
+
+ret:
+	WARN_ON(event);
+	return bytes_processed > 0 ? bytes_processed : -EINVAL;
+}
+
 static const char *
 event_state_to_string(enum landlock_supervise_event_state state)
 {
@@ -338,9 +680,10 @@ static void fop_supervisor_fdinfo(struct seq_file *m, struct file *f)
 
 static const struct file_operations supervisor_fops = {
 	.release = fop_supervisor_release,
-	/* TODO: read, write, poll, dup */
-	.read = fop_dummy_read,
-	.write = fop_dummy_write,
+	.read = fop_supervisor_read,
+	.write = fop_supervisor_write,
+	.poll = fop_supervisor_poll,
+	.llseek = noop_llseek,
 	.show_fdinfo = fop_supervisor_fdinfo,
 };
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH 9/9] Enhance the sandboxer example to support landlock-supervise
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (7 preceding siblings ...)
  2025-03-04  1:13 ` [RFC PATCH 8/9] Implement fops for supervisor-fd Tingmao Wang
@ 2025-03-04  1:13 ` Tingmao Wang
  2025-03-04 19:48 ` [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Mickaël Salaün
  2025-03-06 21:04 ` Jan Kara
  10 siblings, 0 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-04  1:13 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jan Kara
  Cc: Tingmao Wang, linux-security-module, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen

This is perhaps a bit overengineered with the ppoll on 5 different fds,
but it makes sure the sandboxed child can't try to print anything to the
terminal from a different thread while an access request is pending
(otherwise it could trick the user by printing over the request text).
This also makes sure inputs are directed to the right place (the child
when no prompt, or the sandboxer itself when an access request is shown).

But even with that, I'm not claiming this "sandbox" with supervise mode is
in any way production quality.  It's intended as a PoC.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 samples/landlock/sandboxer.c | 759 ++++++++++++++++++++++++++++++++++-
 1 file changed, 739 insertions(+), 20 deletions(-)

diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
index 07fab2ef534e..4a6a0d74c614 100644
--- a/samples/landlock/sandboxer.c
+++ b/samples/landlock/sandboxer.c
@@ -24,10 +24,16 @@
 #include <sys/syscall.h>
 #include <unistd.h>
 #include <stdbool.h>
+#include <poll.h>
+#include <pthread.h>
+#include <sys/wait.h>
+#include <termios.h>
+#include <linux/limits.h>
+#include <stdint.h>
 
 #ifndef landlock_create_ruleset
 static inline int
-landlock_create_ruleset(const struct landlock_ruleset_attr *const attr,
+landlock_create_ruleset(struct landlock_ruleset_attr *const attr,
 			const size_t size, const __u32 flags)
 {
 	return syscall(__NR_landlock_create_ruleset, attr, size, flags);
@@ -58,6 +64,7 @@ static inline int landlock_restrict_self(const int ruleset_fd,
 #define ENV_TCP_BIND_NAME "LL_TCP_BIND"
 #define ENV_TCP_CONNECT_NAME "LL_TCP_CONNECT"
 #define ENV_SCOPED_NAME "LL_SCOPED"
+#define ENV_SUPERVISE "LL_SUPERVISE"
 #define ENV_DELIMITER ":"
 
 static int str2num(const char *numstr, __u64 *num_dst)
@@ -278,24 +285,30 @@ static bool check_ruleset_scope(const char *const env_var,
 	LANDLOCK_ACCESS_FS_READ_FILE | \
 	LANDLOCK_ACCESS_FS_READ_DIR)
 
-#define ACCESS_FS_ROUGHLY_WRITE ( \
-	LANDLOCK_ACCESS_FS_WRITE_FILE | \
-	LANDLOCK_ACCESS_FS_REMOVE_DIR | \
-	LANDLOCK_ACCESS_FS_REMOVE_FILE | \
+#define ACCESS_FS_ROUGHLY_CREATE ( \
 	LANDLOCK_ACCESS_FS_MAKE_CHAR | \
 	LANDLOCK_ACCESS_FS_MAKE_DIR | \
 	LANDLOCK_ACCESS_FS_MAKE_REG | \
 	LANDLOCK_ACCESS_FS_MAKE_SOCK | \
 	LANDLOCK_ACCESS_FS_MAKE_FIFO | \
 	LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
-	LANDLOCK_ACCESS_FS_MAKE_SYM | \
+	LANDLOCK_ACCESS_FS_MAKE_SYM)
+
+#define ACCESS_FS_ROUGHLY_REMOVE ( \
+	LANDLOCK_ACCESS_FS_REMOVE_DIR | \
+	LANDLOCK_ACCESS_FS_REMOVE_FILE)
+
+#define ACCESS_FS_ROUGHLY_WRITE ( \
+	LANDLOCK_ACCESS_FS_WRITE_FILE | \
+	ACCESS_FS_ROUGHLY_CREATE | \
+	ACCESS_FS_ROUGHLY_REMOVE | \
 	LANDLOCK_ACCESS_FS_REFER | \
 	LANDLOCK_ACCESS_FS_TRUNCATE | \
 	LANDLOCK_ACCESS_FS_IOCTL_DEV)
 
 /* clang-format on */
 
-#define LANDLOCK_ABI_LAST 6
+#define LANDLOCK_ABI_LAST 7
 
 #define XSTR(s) #s
 #define STR(s) XSTR(s)
@@ -321,6 +334,7 @@ static const char help[] =
 	"* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n"
 	"  - \"a\" to restrict opening abstract unix sockets\n"
 	"  - \"s\" to restrict sending signals\n"
+	"* " ENV_SUPERVISE ": set to 1 to enable supervisor mode\n"
 	"\n"
 	"Example:\n"
 	ENV_FS_RO_NAME "=\"${PATH}:/lib:/usr:/proc:/etc:/dev/urandom\" "
@@ -335,14 +349,22 @@ static const char help[] =
 
 /* clang-format on */
 
+int verbose_exec(const char *cmd_path, char *const *cmd_argv,
+		 char *const *envp);
+int interactive_sandboxer(int supervisor_fd, int child_stdin, int child_stdout,
+			  int child_stderr, pid_t child_pid);
+
 int main(const int argc, char *const argv[], char *const *const envp)
 {
 	const char *cmd_path;
 	char *const *cmd_argv;
-	int ruleset_fd, abi;
+	int ruleset_fd = -1, supervisor_fd = -1, abi;
 	char *env_port_name;
 	__u64 access_fs_ro = ACCESS_FS_ROUGHLY_READ,
 	      access_fs_rw = ACCESS_FS_ROUGHLY_READ | ACCESS_FS_ROUGHLY_WRITE;
+	bool supervise = false;
+	__u32 flags;
+	char *env_supervise;
 
 	struct landlock_ruleset_attr ruleset_attr = {
 		.handled_access_fs = access_fs_rw,
@@ -350,6 +372,8 @@ int main(const int argc, char *const argv[], char *const *const envp)
 				      LANDLOCK_ACCESS_NET_CONNECT_TCP,
 		.scoped = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
 			  LANDLOCK_SCOPE_SIGNAL,
+		.supervisor_fd = 0,
+		.pad = 0,
 	};
 
 	if (argc < 2) {
@@ -357,6 +381,11 @@ int main(const int argc, char *const argv[], char *const *const envp)
 		return 1;
 	}
 
+	env_supervise = getenv(ENV_SUPERVISE);
+	if (env_supervise && strcmp(env_supervise, "1") == 0) {
+		supervise = true;
+	}
+
 	abi = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION);
 	if (abi < 0) {
 		const int err = errno;
@@ -422,6 +451,10 @@ int main(const int argc, char *const argv[], char *const *const envp)
 		/* Removes LANDLOCK_SCOPE_* for ABI < 6 */
 		ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
 					 LANDLOCK_SCOPE_SIGNAL);
+		__attribute__((fallthrough));
+	case 6:
+		/* Removes supervisor mode for ABI < 7 */
+		supervise = false;
 		fprintf(stderr,
 			"Hint: You should update the running kernel "
 			"to leverage Landlock features "
@@ -456,12 +489,31 @@ int main(const int argc, char *const argv[], char *const *const envp)
 	if (check_ruleset_scope(ENV_SCOPED_NAME, &ruleset_attr))
 		return 1;
 
-	ruleset_fd =
-		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+	flags = 0;
+	if (supervise)
+		flags |= LANDLOCK_CREATE_RULESET_SUPERVISE;
+
+	ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+					     sizeof(ruleset_attr), flags);
 	if (ruleset_fd < 0) {
 		perror("Failed to create a ruleset");
 		return 1;
 	}
+	if (supervise) {
+		supervisor_fd = ruleset_attr.supervisor_fd;
+		if (supervisor_fd < 0) {
+			fprintf(stderr, "supervisor_fd is invalid");
+			return 1;
+		}
+		if (supervisor_fd == 0) {
+			fprintf(stderr, "supervisor_fd not set by kernel");
+			return 1;
+		}
+	} else if (ruleset_attr.supervisor_fd != 0) {
+		fprintf(stderr,
+			"supervisor_fd should not be set by kernel, but it is not 0");
+		return 1;
+	}
 
 	if (populate_ruleset_fs(ENV_FS_RO_NAME, ruleset_fd, access_fs_ro)) {
 		goto err_close_ruleset;
@@ -483,23 +535,690 @@ int main(const int argc, char *const argv[], char *const *const envp)
 		perror("Failed to restrict privileges");
 		goto err_close_ruleset;
 	}
-	if (landlock_restrict_self(ruleset_fd, 0)) {
-		perror("Failed to enforce ruleset");
-		goto err_close_ruleset;
-	}
-	close(ruleset_fd);
 
 	cmd_path = argv[1];
 	cmd_argv = argv + 1;
+
+	if (!supervise) {
+		if (landlock_restrict_self(ruleset_fd, 0)) {
+			perror("Failed to enforce ruleset");
+			goto err_close_ruleset;
+		}
+		close(ruleset_fd);
+		verbose_exec(cmd_path, cmd_argv, envp);
+	} else {
+		pid_t child;
+		int child_stdin_pipe[2], child_stdout_pipe[2],
+			child_stderr_pipe[2];
+		// read from [0], write to [1]
+		if (pipe(child_stdin_pipe) || pipe(child_stdout_pipe) ||
+		    pipe(child_stderr_pipe)) {
+			perror("Failed to create pipes");
+			goto err_close_ruleset;
+		}
+		child = fork();
+		if (child < 0) {
+			perror("Failed to fork");
+			goto err_close_ruleset;
+		}
+		if (child == 0) {
+			close(supervisor_fd);
+
+			if (landlock_restrict_self(ruleset_fd, 0)) {
+				perror("Failed to enforce ruleset");
+				goto err_close_ruleset;
+			}
+
+			close(child_stdin_pipe[1]);
+			close(child_stdout_pipe[0]);
+			close(child_stderr_pipe[0]);
+			if (dup2(child_stdin_pipe[0], STDIN_FILENO) < 0 ||
+			    dup2(child_stdout_pipe[1], STDOUT_FILENO) < 0 ||
+			    dup2(child_stderr_pipe[1], STDERR_FILENO) < 0) {
+				perror("Failed to redirect child I/O");
+				exit(1);
+			}
+			close(child_stdin_pipe[0]);
+			close(child_stdout_pipe[1]);
+			close(child_stderr_pipe[1]);
+
+			close(ruleset_fd);
+			verbose_exec(cmd_path, cmd_argv, envp);
+		} else {
+			close(ruleset_fd);
+			close(child_stdin_pipe[0]);
+			close(child_stdout_pipe[1]);
+			close(child_stderr_pipe[1]);
+			return interactive_sandboxer(supervisor_fd,
+						     child_stdin_pipe[1],
+						     child_stdout_pipe[0],
+						     child_stderr_pipe[0],
+						     child);
+		}
+	}
+
+err_close_ruleset:
+	close(ruleset_fd);
+	return 1;
+}
+
+int verbose_exec(const char *cmd_path, char *const *cmd_argv, char *const *envp)
+{
 	fprintf(stderr, "Executing the sandboxed command...\n");
 	execvpe(cmd_path, cmd_argv, envp);
+	int err = errno;
 	fprintf(stderr, "Failed to execute \"%s\": %s\n", cmd_path,
-		strerror(errno));
+		strerror(err));
 	fprintf(stderr, "Hint: access to the binary, the interpreter or "
 			"shared libraries may be denied.\n");
-	return 1;
+	return err;
+}
 
-err_close_ruleset:
-	close(ruleset_fd);
-	return 1;
+enum SandboxAccessType {
+	ACCESS_READ,
+	ACCESS_READWRITE,
+	ACCESS_CREATE,
+	ACCESS_REMOVE,
+};
+
+struct context {
+	int supervisor_fd;
+	char **allowed_paths;
+	size_t num_allowed_paths;
+};
+
+static int f_set_noblock(int fd)
+{
+	int flags = fcntl(fd, F_GETFL, 0);
+	if (flags < 0) {
+		perror("Failed to get flags");
+		return -1;
+	}
+	if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) < 0) {
+		perror("Failed to set flags");
+		return -1;
+	}
+	return 0;
+}
+
+static int write_all(int fd, const char *buf, size_t count)
+{
+	while (count > 0) {
+		ssize_t written = write(fd, buf, count);
+		if (written < 0) {
+			return written;
+		}
+		count -= written;
+		buf += written;
+	}
+	return 0;
+}
+
+static int readlink_fd_s(int fd, char *buf, size_t buf_len)
+{
+	if (buf_len == 0) {
+		errno = EINVAL;
+		return -1;
+	}
+	char procfd[100];
+	snprintf(procfd, sizeof(procfd), "/proc/self/fd/%d", fd);
+	ssize_t len = readlink(procfd, buf, buf_len - 1);
+	if (len < 0) {
+		return -1;
+	}
+	buf[len] = '\0';
+	return len;
+}
+
+static bool show_sandbox_prompt_fs(enum SandboxAccessType access,
+				   const char *file1, const char *file2,
+				   int pid, const char *comm, const char *exe,
+				   struct context *context)
+{
+	const char *access_kv;
+	switch (access) {
+	case ACCESS_READ:
+		access_kv = "read";
+		break;
+	case ACCESS_READWRITE:
+		access_kv = "read/write";
+		break;
+	case ACCESS_CREATE:
+		access_kv = "create";
+		break;
+	case ACCESS_REMOVE:
+		access_kv = "remove";
+		break;
+	default:
+		abort();
+		return false;
+	}
+	if (isatty(STDIN_FILENO)) {
+		tcflush(STDIN_FILENO, TCIOFLUSH);
+	}
+	fprintf(stderr,
+		"------------- Sandboxer access request -------------\n");
+	fprintf(stderr, "Process %s[%d] (%s) wants to %s\n  %s\n", comm, pid,
+		exe, access_kv, file1);
+	if (file2) {
+		fprintf(stderr, "  %s\n", file2);
+	}
+	bool allow = false;
+	while (true) {
+		char answer[10];
+		fprintf(stderr, "(y)es/(a)lways/(n)o > ");
+		fflush(stderr);
+		int rc = read(STDIN_FILENO, answer, sizeof(answer));
+		if (rc < 0) {
+			perror("Failed to read answer");
+			break;
+		}
+		if (rc == 0) {
+			break;
+		}
+		answer[rc] = '\0';
+		if (strcmp(answer, "y\n") == 0) {
+			allow = true;
+			break;
+		} else if (strcmp(answer, "a\n") == 0) {
+			allow = true;
+			/* +2 in case file2 is also set */
+			context->allowed_paths =
+				realloc(context->allowed_paths,
+					(context->num_allowed_paths + 2) *
+						sizeof(char *));
+			if (!context->allowed_paths) {
+				abort();
+			}
+			char *dup_str = strdup(file1);
+			if (!dup_str) {
+				abort();
+			}
+			context->allowed_paths[context->num_allowed_paths] =
+				dup_str;
+			context->num_allowed_paths++;
+
+			if (file2) {
+				dup_str = strdup(file2);
+				if (!dup_str) {
+					abort();
+				}
+				context->allowed_paths
+					[context->num_allowed_paths] = dup_str;
+				context->num_allowed_paths++;
+			}
+			break;
+		} else if (strcmp(answer, "n\n") == 0) {
+			allow = false;
+			break;
+		} else {
+			fprintf(stderr,
+				"Please answer \"y\", \"a\", or \"n\"\n");
+		}
+	}
+	fprintf(stderr,
+		"----------------------------------------------------\n");
+	return allow;
+}
+
+static bool show_sandbox_prompt_network(__u16 port, struct context *context)
+{
+	/* TODO: unimplemented in kernel */
+	return true;
+}
+
+#ifndef min
+#define min(a, b) ((a) < (b) ? (a) : (b))
+#endif
+
+static bool path_join(char *dest_buf, size_t dest_buf_len, const char *last)
+{
+	if (dest_buf_len <= 1) {
+		return false;
+	}
+	size_t last_len = strlen(last);
+	size_t dest_len = strnlen(dest_buf, dest_buf_len);
+	if (dest_len == 1 && dest_buf[0] == '/') {
+		dest_buf[0] = '\0';
+		dest_len = 0;
+	}
+	size_t dest_space = dest_buf_len - dest_len;
+	if (dest_space <= 1) {
+		return false;
+	}
+	if (dest_space == 2) {
+		dest_buf[dest_len] = '/';
+		dest_buf[dest_len + 1] = '\0';
+		return false;
+	}
+	size_t copy_count = min(dest_space - 2, last_len);
+	dest_buf[dest_len] = '/';
+	memcpy(dest_buf + dest_len + 1, last, copy_count);
+	dest_buf[dest_len + 1 + copy_count] = '\0';
+	return copy_count == last_len;
+}
+
+static int process_event(struct landlock_supervise_event *evt,
+			 struct context *context)
+{
+	char *target_path_1 = NULL;
+	char *target_path_2 = NULL;
+	char *comm = NULL;
+	char *exe = NULL;
+	int pid;
+	int fd = -1;
+	ssize_t len;
+	enum SandboxAccessType access = -1;
+	char proc_exe[100], proc_comm[100];
+	struct landlock_supervise_response response;
+	bool allow = false;
+	int ret = 0;
+	int supervisor_fd = context->supervisor_fd;
+
+	memset(&response, 0, sizeof(response));
+
+	if (((uintptr_t)evt) % __alignof__(struct landlock_supervise_event) !=
+	    0) {
+		/*
+		 * Check that the kernel hasn't messed up given we're
+		 * reading an array of varable length struct
+		 */
+		fprintf(stderr, "evt = %p is badly aligned\n", evt);
+		abort();
+	}
+
+	switch (evt->hdr.type) {
+	case LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS:
+		if (evt->fd1 != -1) {
+			target_path_1 = malloc(PATH_MAX);
+			if (!target_path_1) {
+				abort();
+			}
+			if (readlink_fd_s(evt->fd1, target_path_1, PATH_MAX) <
+			    -1) {
+				close(evt->fd1);
+				perror("Failed to readlink");
+				ret = -1;
+				goto ret;
+			}
+			close(evt->fd1);
+		} else {
+			fprintf(stderr, "fd1 is -1 which should not happen.");
+			abort();
+		}
+		if (evt->fd2 != -1) {
+			target_path_2 = malloc(PATH_MAX);
+			if (!target_path_2) {
+				abort();
+			}
+			if (readlink_fd_s(evt->fd2, target_path_2, PATH_MAX) <
+			    -1) {
+				perror("Failed to readlink");
+				close(evt->fd2);
+				ret = -1;
+				goto ret;
+			}
+			close(evt->fd2);
+		}
+		if (evt->destname[0] != 0) {
+			if (evt->fd2 != -1) {
+				path_join(target_path_2, PATH_MAX,
+					  evt->destname);
+			} else {
+				path_join(target_path_1, PATH_MAX,
+					  evt->destname);
+			}
+		}
+		if (evt->access_request & ACCESS_FS_ROUGHLY_CREATE) {
+			access = ACCESS_CREATE;
+		} else if (evt->access_request & ACCESS_FS_ROUGHLY_REMOVE) {
+			access = ACCESS_REMOVE;
+		} else if (evt->access_request & ACCESS_FS_ROUGHLY_WRITE) {
+			access = ACCESS_READWRITE;
+		} else {
+			access = ACCESS_READ;
+		}
+
+		if (strcmp(target_path_1, "/dev/tty") == 0) {
+			/*
+			 * Deny TTY access to bash, as it messes with the
+			 * supervisor input, causing the supervisor to
+			 * receive SIGTTIN
+			 */
+			goto response;
+		}
+
+		for (size_t i = 0; i < context->num_allowed_paths; i++) {
+			if (strcmp(target_path_1, context->allowed_paths[i]) ==
+			    0) {
+				allow = true;
+				break;
+			}
+		}
+		break;
+	case LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS:
+		/* No pre-processing needed */
+		break;
+	default:
+		fprintf(stderr, "Unknown event type: %d\n", evt->hdr.type);
+		ret = -1;
+		break;
+	}
+
+	pid = evt->accessor;
+	snprintf(proc_exe, sizeof(proc_exe), "/proc/%d/exe", pid);
+	exe = malloc(PATH_MAX);
+	if (!exe) {
+		abort();
+	}
+	len = readlink(proc_exe, exe, PATH_MAX - 1);
+	if (len < 0) {
+		perror("Failed to readlink proc exe");
+		return -1;
+	}
+	exe[len] = '\0';
+	snprintf(proc_comm, sizeof(proc_comm), "/proc/%d/comm", pid);
+	comm = malloc(PATH_MAX);
+	if (!comm) {
+		abort();
+	}
+	fd = open(proc_comm, O_RDONLY);
+	if (fd < 0) {
+		snprintf(comm, PATH_MAX, "???");
+	} else {
+		len = read(fd, comm, PATH_MAX - 1);
+		if (len < 0) {
+			snprintf(comm, PATH_MAX, "???");
+		} else {
+			comm[len] = '\0';
+			if (len > 0 && comm[len - 1] == '\n') {
+				comm[len - 1] = '\0';
+			}
+		}
+		close(fd);
+	}
+
+	switch (evt->hdr.type) {
+	case LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS:
+		if (!allow) {
+			allow = show_sandbox_prompt_fs(access, target_path_1,
+						       target_path_2, pid, comm,
+						       exe, context);
+		}
+		break;
+	case LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS:
+		allow = show_sandbox_prompt_network(evt->port, context);
+		break;
+	}
+
+response:
+	/* Prepare and send response to the kernel */
+	response.length = sizeof(response);
+	response.decision = allow ? LANDLOCK_SUPERVISE_DECISION_ALLOW :
+				    LANDLOCK_SUPERVISE_DECISION_DENY;
+	response.cookie = evt->hdr.cookie;
+
+	if (write(supervisor_fd, &response, sizeof(response)) !=
+	    sizeof(response)) {
+		perror("Failed to write supervisor response");
+		ret = -1;
+	}
+
+ret:
+	free(target_path_1);
+	free(target_path_2);
+	free(comm);
+	free(exe);
+	return ret;
+}
+
+static int process_events(void *data, size_t data_len, struct context *context)
+{
+	while (data_len > 0) {
+		struct landlock_supervise_event *evt;
+		int rc;
+		if (data_len < sizeof(evt->hdr)) {
+			fprintf(stderr,
+				"Too few bytes for a event header - got %zu left, need %zu.",
+				data_len, sizeof(evt->hdr));
+			return -EINVAL;
+		}
+		evt = data;
+		if (evt->hdr.length > data_len) {
+			fprintf(stderr,
+				"Length from event header is greater than remaining data.");
+			return -EINVAL;
+		}
+		rc = process_event(evt, context);
+		if (rc < 0) {
+			return rc;
+		}
+		data_len -= evt->hdr.length;
+		data += evt->hdr.length;
+	}
+	return 0;
+}
+
+int interactive_sandboxer(int supervisor_fd, int child_stdin, int child_stdout,
+			  int child_stderr, pid_t child_pid)
+{
+	char *write_buf = NULL;
+	size_t write_buf_len = 0;
+
+	size_t io_buf_len = 4096;
+	char *io_buf = malloc(io_buf_len);
+	if (!io_buf) {
+		fprintf(stderr, "Failed to allocate I/O buffer");
+		return -1;
+	}
+
+	int status = 0;
+
+	struct pollfd pfds[5] = {
+		{ .fd = STDIN_FILENO, .events = POLLIN },
+		{ .fd = child_stdout, .events = POLLIN },
+		{ .fd = child_stderr, .events = POLLIN },
+		{ .fd = supervisor_fd, .events = POLLIN },
+		{ .fd = child_stdin, .events = POLLOUT },
+	};
+	const int pfd_idx_stdin = 0;
+	const int pfd_idx_child_stdout = 1;
+	const int pfd_idx_child_stderr = 2;
+	const int pfd_idx_supervisor = 3;
+	const int pfd_idx_child_stdin = 4;
+	const int poll_len = 5;
+
+	struct context context = {
+		.supervisor_fd = supervisor_fd,
+		.allowed_paths = NULL,
+		.num_allowed_paths = 0,
+	};
+
+	bool child_stdin_closed = false;
+
+	/*
+	 * Don't deadlock by us trying to write to child, and child
+	 * waiting to write to us.
+	 */
+	f_set_noblock(child_stdin);
+
+	/* Don't get killed by SIGPIPE when child closes stdout/err */
+	signal(SIGPIPE, SIG_IGN);
+
+	while (1) {
+		if (write_buf_len > 0 && !child_stdin_closed) {
+			pfds[pfd_idx_child_stdin].fd = child_stdin;
+		} else {
+			pfds[pfd_idx_child_stdin].fd = -1;
+		}
+
+		for (int i = 0; i < poll_len; i++) {
+			pfds[i].revents = 0;
+		}
+
+		if (ppoll(pfds, poll_len, NULL, NULL) < 0) {
+			if (errno != EINTR) {
+				perror("ppoll");
+				goto err_kill_child;
+			}
+		}
+
+		if (pfds[0].revents & POLLIN) {
+			/*
+			 * Our stdin -> temp buffer for child's stdin.
+			 * Need to do this before handling any supervisor
+			 * events so that inputs intended for the child is
+			 * not interperted as user decision.
+			 */
+			const int read_len = 4096;
+			write_buf =
+				realloc(write_buf, write_buf_len + read_len);
+			if (!write_buf) {
+				fprintf(stderr,
+					"Failed to realloc write buffer\n");
+				goto err_kill_child;
+			}
+			ssize_t count = read(STDIN_FILENO,
+					     write_buf + write_buf_len,
+					     read_len);
+			if (count > 0) {
+				write_buf_len += count;
+			} else if (count == 0) {
+				/* Our stdin is closed. Don't read from it anymore. */
+				pfds[pfd_idx_stdin].fd = -1;
+			} else {
+				perror("Failed to read from stdin");
+				goto err_kill_child;
+			}
+		}
+
+		if (write_buf_len > 0) {
+			/* Attempt to write any outstanding stdin to child */
+			ssize_t written =
+				write(child_stdin, write_buf, write_buf_len);
+			if (written > 0) {
+				if (written > write_buf_len) {
+					abort();
+				} else if (written == write_buf_len) {
+					write_buf_len = 0;
+				} else {
+					memmove(write_buf, write_buf + written,
+						write_buf_len - written);
+					write_buf_len -= written;
+				}
+			} else {
+				if (errno == EPIPE) {
+					close(child_stdin);
+					child_stdin_closed = true;
+					pfds[pfd_idx_child_stdin].fd = -1;
+					write_buf_len = 0;
+				} else if (errno != EAGAIN) {
+					perror("Failed to write to child stdin");
+					goto err_kill_child;
+				}
+			}
+		}
+
+		if (pfds[pfd_idx_stdin].fd == -1 && write_buf_len == 0) {
+			/* We can safely close child's stdin now */
+			close(child_stdin);
+			child_stdin_closed = true;
+			pfds[pfd_idx_child_stdin].fd = -1;
+		}
+
+		if (pfds[pfd_idx_child_stdout].revents & POLLIN) {
+			/* Child stdout -> our stdout */
+			ssize_t count = read(child_stdout, io_buf, io_buf_len);
+			if (count > 0) {
+				if (write_all(STDOUT_FILENO, io_buf, count) <
+				    0) {
+					perror("Failed to write to stdout");
+					goto err_kill_child;
+				}
+			} else if (count == 0 ||
+				   (count < 0 && errno == EPIPE)) {
+				close(child_stdout);
+				pfds[pfd_idx_child_stdout].fd = -1;
+			} else if (count < 0 && errno != EAGAIN) {
+				perror("Failed to read from child stdout");
+				goto err_kill_child;
+			}
+		}
+
+		if (pfds[2].revents & POLLIN) {
+			/* Child stderr -> our stderr */
+			ssize_t count = read(child_stderr, io_buf, io_buf_len);
+			if (count > 0) {
+				if (write_all(STDERR_FILENO, io_buf, count) <
+				    0) {
+					perror("Failed to write to stderr");
+					goto err_kill_child;
+				}
+			} else if (count == 0 ||
+				   (count < 0 && errno == EPIPE)) {
+				close(child_stderr);
+				pfds[pfd_idx_child_stderr].fd = -1;
+			} else if (count < 0 && errno != EAGAIN) {
+				perror("Failed to read from child stderr");
+				goto err_kill_child;
+			}
+		}
+
+		if (waitpid(child_pid, &status, WNOHANG) == child_pid) {
+			/*
+			 * Write out any remaining child stdout/stderr.
+			 * If child died, read would just return EOF.
+			 */
+			while (1) {
+				ssize_t count =
+					read(child_stdout, io_buf, io_buf_len);
+				if (count > 0)
+					write_all(STDOUT_FILENO, io_buf, count);
+				else
+					break;
+			}
+			while (1) {
+				ssize_t count =
+					read(child_stderr, io_buf, io_buf_len);
+				if (count > 0)
+					write_all(STDERR_FILENO, io_buf, count);
+				else
+					break;
+			}
+			return WIFEXITED(status) ? WEXITSTATUS(status) : 1;
+		}
+
+		if (pfds[pfd_idx_supervisor].revents) {
+retry:
+			ssize_t count = read(supervisor_fd, io_buf, io_buf_len);
+			if (count > 0) {
+				process_events(io_buf, count, &context);
+			} else if (count == 0) {
+				fprintf(stderr,
+					"Unexpected EOF on supervisor fd\n");
+				goto err_kill_child;
+			} else if (count < 0 && errno != EAGAIN) {
+				if (errno == EINVAL) {
+					io_buf_len *= 2;
+					io_buf = realloc(io_buf, io_buf_len);
+					if (!io_buf) {
+						fprintf(stderr,
+							"Failed to realloc I/O buffer\n");
+						goto err_kill_child;
+					}
+					fprintf(stderr,
+						"Got EINVAL - possibly event too big. Realloced I/O buffer to %zu\n",
+						io_buf_len);
+					goto retry;
+				}
+				perror("Failed to read from supervisor");
+				goto err_kill_child;
+			}
+		}
+	}
+
+err_kill_child:
+	close(supervisor_fd);
+	kill(child_pid, SIGTERM);
+	return -1;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (8 preceding siblings ...)
  2025-03-04  1:13 ` [RFC PATCH 9/9] Enhance the sandboxer example to support landlock-supervise Tingmao Wang
@ 2025-03-04 19:48 ` Mickaël Salaün
  2025-03-06  2:57   ` Tingmao Wang
  2025-03-06 21:04 ` Jan Kara
  10 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-04 19:48 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jeff Xu, Mikhail Ivanov,
	Francis Laniel, Matthieu Buffet

On Tue, Mar 04, 2025 at 01:12:56AM +0000, Tingmao Wang wrote:
> Landlock supervise: a mechanism for interactive permission requests
> 
> Hi,
> 
> I would like to propose an extension to Landlock to support a "supervisor"
> mode, which would enable a user program to sandbox applications (or
> itself) in a dynamic, fine-grained, and potentially temporary way.
> Practically, this makes it easy to give maximal control to the user,
> perhaps in the form of a "just in time" permission prompt.  Read on, or
> check the sandboxer program in the last patch for a "demo".

Thanks for this RFC, this is very promising!

> 
> To Jan Kara and other fanotify reviewers, I've included you in this patch
> as Mickaël suggested that we could potentially extend and re-use the
> fanotify uapi and code instead of creating an entirely new representation
> for permission requests and mechanism for passing it (as this patch
> currently does).  I've not really thought out how that would work (there
> will probably have to be some extension of the fanotify-fd uapi since
> landlock handles more than FS access), but I think it is a promising idea,
> hence I would like to hear your thoughts if you could spare a moment to
> look at this.  A good outcome could also be that we add the necessary
> hooks so that both this and fanotify (but really fsnotify?) can have _perm
> events for create/delete/rename etc.
> 
> FS mailing list - I've CC'd this patchset to you too - even though the
> patch doesn't currently touch any FS code, this is very FS related, and
> also, in order to address an inode lock related problem which I will
> mention in patch 6 of this series, future versions of this patch will
> likely need to add a few more LSM hooks.  Especially for that part, but
> also other bits of this project, a pair of eyes from the FS community
> would be very helpful.
> 
> To Tycho Andersen -- I'm CC'ing you as you've worked on the seccomp-unotify
> feature which is also quite related, so if you could spare some time for a
> quick review, or provide some suggestions, that would be very appreciated
> :)
> 
> I'm submitting this series as a non-production-ready, proof-of-concept
> RFC, and I would appreciate feedback on any aspects of the design or
> implementation.  Note that due to the PoC nature of this, I have not
> handled checkpatch.pl errors etc.  I also welcome suggestions for
> alternative names for this feature (e.g. landlock-unotify?
> landlock-perm?).  At this point I'm very keen to hear some initial
> feedback from the community before investing further into polishing this
> patch.
> 
> (I've briefly pitched the overall idea to Mickaël, but he has not reviewed
> the patch yet)
> 
> 
> Why extend landlock?
> --------------------
> 
> While this feature could be implemented as its own LSM, I feel like it is
> a natural extension to landlock -- landlock has already defined a set of
> fine-grained access requests with the intention to add more (and not just
> for FS alone), is designed to be an unprivileged, stackable,
> process-scoped, ad-hoc mechanism with no persistent state, which works
> well as a generic API to support a dynamic sandbox, and landlock is
> already doing the path traversal work to evaluate hierarchical filesystem
> rules, which would also be useful for a performant dynamic sandbox
> implementation.

I agree, that would be a great Landlock feature.

> 
> 
> Use cases
> ---------
> 
> I have several potential use cases in mind that will benefit from
> landlock-supervise, for example:
> 
> 1. A patch to firejail (I have not discussed with the firejail maintainers
> on this yet - wanted to see the reception of this kernel patch first)
> which can leverage landlock in a highly flexible way, prompting the user
> for permission to access "extra" files after the sandbox has started
> (without e.g. having to restart a very stateful GUI program).
> 
> This way of using landlock can potentially replace its current approach of
> using bind mounts (as it will allow implementing "blacklists"), allowing
> unprivileged sandbox creation (although need to check with firejail if
> there are other factors preventing this).  This also allows editing
> profiles "live" in a highly interactive way (i.e. the user can choose
> "allow and remember" on a permission request which will also add the newly
> allowed path to a local firejail profile, all automatically)
> 
> 2. A "protected" mode for common development environments (e.g. VSCode or
> a terminal can be launched "protected") that doesn't compromise on
> ease-of-use.  File access to $PWD at launch can be allowed, and access to
> other places can be allowed ad-hoc by the developer with hopefully one UI
> click.  Since landlock can also be used to restrict network access, such a
> protected mode can also restrict outgoing connections by default (but ask
> the user if they allow it for all or certain processes, on the first
> attempt to connect).
> 
> Recently there has been incidents of secret-stealing malware targeting
> developers (on Linux) by social engineering them to open and build/run a
> project. [1]  The hope is that landlock-supervise can drive adoption of
> sandboxes for developers and others by making them more user-friendly.
> 
> In addition to the above, I also hope that this would help with landlock
> adoption even in non-interaction-heavy scenarios, by allowing application
> developers the choice to gracefully recover from over-restrictive rulesets
> and collect failure metrics, until they are confident that actually
> blocking non-allowed accesses would not break their application or degrade
> the user experience.

Another interesting use case is to trace programs and get an
unprivileged "permissive" mode to quickly create sandbox policies.

> 
> I have more exploration to do regarding applying this to applications, but
> I do have a working proof of concept already (implemented as an
> enhancement to the sandboxer example). Here is a shortened output:
> 
>     bash # env LL_FS_RO=/usr:/lib:/bin:/etc:/dev:/proc LL_FS_RW= LL_SUPERVISE=1 ./sandboxer bash -i
>     bash # echo "Hi, $(whoami)!"
>     Hi, root!
>     bash # ls /
>     ------------- Sandboxer access request -------------
>     Process ls[166] (/usr/bin/ls) wants to read
>       /
>     (y)es/(a)lways/(n)o > y
>     ----------------------------------------------------
>     bin
>     boot
>     dev
>     ...
>     usr
>     var
>     bash # echo 'evil' >> /etc/profile
>     (a spurious create request due to current issue with dcache miss is omitted)
>     ------------- Sandboxer access request -------------
>     Process bash[163] (/usr/bin/bash) wants to read/write
>       /etc/profile
>     (y)es/(a)lways/(n)o > n
>     ----------------------------------------------------
>     bash: /etc/profile: Permission denied
>     bash #
> 
> 
> Alternatives
> ------------
> 
> I have looked for existing ways to implement the proposed use cases (at
> least for FS access), and three main approaches stand out to me:
> 
> 1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
> response from a fanotify listener.  However, it does not currently have
> the equivalent _PERM for file creation, deletion, rename and linking, and
> it is also not designed for unprivileged, process-scoped use (unlike
> landlock).

As discussed, I was thinking about whether or not it would be possible
to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
but looking at your code, I think it would mostly increase complexity.
There are also the issue with the Landlock semantic (e.g. access rights)
which does not map 1:1 to the fanotify one.  A last thing is that
fanotify is deeply tied to the VFS.  So, unless someone has a better
idea, let's continue with your approach.

> 
> 2. Seccomp-unotify: this can be used to trap all syscalls and give the
> sandbox a chance to allow or deny any one of them. However, a correct,
> TOCTOU-proof implementation will likely require handling a large number of
> fs-related syscalls in user-space, with the sandboxer opening the file or
> carrying out the operation on behalf of the sandboxee.  This is probably
> going to be extremely complex and makes everything less performant.

We should get inspiration from the fanotify and seccomp-notify features
(while implementing the minimum for now) but also identify their design
issues and caveats.

Tycho, Christian, Kees, any suggestion?

> 
> 3. Using a FUSE filesystem which gates access.  This is actually an
> approach taken by an existing sandbox solution - flatpak [2], however it
> requires either tight integration with the application (and thus doesn't
> work well for the mentioned use cases), or if one wants to sandbox a
> program "transparently", SYS_ADMIN to chroot.

Android's SDCardFS is another example of such use.

> 
> 
> I've tested that what I have here works with the enhanced sandboxer, but
> have yet to write any self tests or do extensive testing or perf
> measurements.  I have also yet to implement support for supervising tcp
> rules as well as FS refer operations.

One of the main suggestion would be to align with the audit patch series
semantic and the defined "blockers":
https://lore.kernel.org/all/20250131163059.1139617-1-mic@digikod.net/
I'll send another series soon.

> 
> Base commit: 78332fdb956f18accfbca5993b10c5ed69f00a2c (tag:
> landlock-6.14-rc5, mic/next)
> 
> 
> [1]: https://cybersecuritynews.com/beware-of-lazarus-linkedin-recruiting-scam/
> [2]: https://flatpak.github.io/xdg-desktop-portal/docs/documents-and-fuse.html
> 
> 
> Tingmao Wang (9):
>   Define the supervisor and event structure
>   Refactor per-layer information in rulesets and rules
>   Adds a supervisor reference in the per-layer information
>   User-space API for creating a supervisor-fd
>   Define user structure for events and responses.
>   Creating supervisor events for filesystem operations
>   Implement fdinfo for ruleset and supervisor fd
>   Implement fops for supervisor-fd
>   Enhance the sandboxer example to support landlock-supervise
> 
>  include/uapi/linux/landlock.h | 119 ++++++
>  samples/landlock/sandboxer.c  | 759 +++++++++++++++++++++++++++++++++-
>  security/landlock/Makefile    |   2 +-
>  security/landlock/fs.c        | 134 +++++-
>  security/landlock/ruleset.c   |  49 ++-
>  security/landlock/ruleset.h   |  66 +--
>  security/landlock/supervise.c | 194 +++++++++
>  security/landlock/supervise.h | 171 ++++++++
>  security/landlock/syscalls.c  | 621 +++++++++++++++++++++++++++-
>  9 files changed, 2036 insertions(+), 79 deletions(-)
>  create mode 100644 security/landlock/supervise.c
>  create mode 100644 security/landlock/supervise.h
> 
> --
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules
  2025-03-04  1:12 ` [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules Tingmao Wang
@ 2025-03-04 19:49   ` Mickaël Salaün
  2025-03-06  2:58     ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-04 19:49 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On Tue, Mar 04, 2025 at 01:12:58AM +0000, Tingmao Wang wrote:
> We need a place to store the supervisor pointer for each layer in
> a domain.  Currently, the domain has a trailing flexible array
> for handled access masks of each layer.  This patch extends it by
> creating a separate landlock_ruleset_layer structure that will
> hold this access mask, and make the ruleset's flexible array use
> this structure instead.
> 
> An alternative is to use landlock_hierarchy, but I have chosen to
> extend the FAM as this is makes it more clear the supervisor
> pointer is tied to layers, just like access masks.

We could indeed have a pointer in the  landlock_hierarchy and have a
dedicated bit in each layer's access_masks to indicate that this layer
is supervised.  This should simplify the whole patch series.

> 
> This patch doesn't make any functional changes nor add any
> supervise specific stuff.  It is purely to pave the way for
> future patches.
> 
> Signed-off-by: Tingmao Wang <m@maowtm.org>
> ---
>  security/landlock/ruleset.c  | 29 +++++++++---------
>  security/landlock/ruleset.h  | 59 ++++++++++++++++++++++--------------
>  security/landlock/syscalls.c |  2 +-
>  3 files changed, 52 insertions(+), 38 deletions(-)
> 
> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
> index 69742467a0cf..2cc6f7c5eb1b 100644
> --- a/security/landlock/ruleset.c
> +++ b/security/landlock/ruleset.c
> @@ -31,9 +31,8 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>  {
>  	struct landlock_ruleset *new_ruleset;
>  
> -	new_ruleset =
> -		kzalloc(struct_size(new_ruleset, access_masks, num_layers),
> -			GFP_KERNEL_ACCOUNT);
> +	new_ruleset = kzalloc(struct_size(new_ruleset, layer_stack, num_layers),
> +			      GFP_KERNEL_ACCOUNT);
>  	if (!new_ruleset)
>  		return ERR_PTR(-ENOMEM);
>  	refcount_set(&new_ruleset->usage, 1);
> @@ -104,8 +103,9 @@ static bool is_object_pointer(const enum landlock_key_type key_type)
>  
>  static struct landlock_rule *
>  create_rule(const struct landlock_id id,
> -	    const struct landlock_layer (*const layers)[], const u32 num_layers,
> -	    const struct landlock_layer *const new_layer)
> +	    const struct landlock_rule_layer (*const layers)[],
> +	    const u32 num_layers,
> +	    const struct landlock_rule_layer *const new_layer)
>  {
>  	struct landlock_rule *new_rule;
>  	u32 new_num_layers;
> @@ -201,7 +201,7 @@ static void build_check_ruleset(void)
>   */
>  static int insert_rule(struct landlock_ruleset *const ruleset,
>  		       const struct landlock_id id,
> -		       const struct landlock_layer (*const layers)[],
> +		       const struct landlock_rule_layer (*const layers)[],
>  		       const size_t num_layers)
>  {
>  	struct rb_node **walker_node;
> @@ -284,7 +284,7 @@ static int insert_rule(struct landlock_ruleset *const ruleset,
>  
>  static void build_check_layer(void)
>  {
> -	const struct landlock_layer layer = {
> +	const struct landlock_rule_layer layer = {

It's not useful to rename this struct.

>  		.level = ~0,
>  		.access = ~0,
>  	};
> @@ -299,7 +299,7 @@ int landlock_insert_rule(struct landlock_ruleset *const ruleset,
>  			 const struct landlock_id id,
>  			 const access_mask_t access)
>  {
> -	struct landlock_layer layers[] = { {
> +	struct landlock_rule_layer layers[] = { {
>  		.access = access,
>  		/* When @level is zero, insert_rule() extends @ruleset. */
>  		.level = 0,
> @@ -344,7 +344,7 @@ static int merge_tree(struct landlock_ruleset *const dst,
>  	/* Merges the @src tree. */
>  	rbtree_postorder_for_each_entry_safe(walker_rule, next_rule, src_root,
>  					     node) {
> -		struct landlock_layer layers[] = { {
> +		struct landlock_rule_layer layers[] = { {
>  			.level = dst->num_layers,
>  		} };
>  		const struct landlock_id id = {
> @@ -389,8 +389,9 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
>  		err = -EINVAL;
>  		goto out_unlock;
>  	}
> -	dst->access_masks[dst->num_layers - 1] =
> -		landlock_upgrade_handled_access_masks(src->access_masks[0]);
> +	dst->layer_stack[dst->num_layers - 1].access_masks =
> +		landlock_upgrade_handled_access_masks(
> +			src->layer_stack[0].access_masks);
>  
>  	/* Merges the @src inode tree. */
>  	err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
> @@ -472,8 +473,8 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
>  		goto out_unlock;
>  	}
>  	/* Copies the parent layer stack and leaves a space for the new layer. */
> -	memcpy(child->access_masks, parent->access_masks,
> -	       flex_array_size(parent, access_masks, parent->num_layers));
> +	memcpy(child->layer_stack, parent->layer_stack,
> +	       flex_array_size(parent, layer_stack, parent->num_layers));
>  
>  	if (WARN_ON_ONCE(!parent->hierarchy)) {
>  		err = -EINVAL;
> @@ -644,7 +645,7 @@ bool landlock_unmask_layers(const struct landlock_rule *const rule,
>  	 * E.g. /a/b <execute> + /a <read> => /a/b <execute + read>
>  	 */
>  	for (layer_level = 0; layer_level < rule->num_layers; layer_level++) {
> -		const struct landlock_layer *const layer =
> +		const struct landlock_rule_layer *const layer =
>  			&rule->layers[layer_level];
>  		const layer_mask_t layer_bit = BIT_ULL(layer->level - 1);
>  		const unsigned long access_req = access_request;
> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
> index 52f4f0af6ab0..a2605959f733 100644
> --- a/security/landlock/ruleset.h
> +++ b/security/landlock/ruleset.h
> @@ -21,9 +21,10 @@
>  #include "object.h"
>  
>  /**
> - * struct landlock_layer - Access rights for a given layer
> + * struct landlock_rule_layer - Stores the access rights for a
> + * given layer in a rule.
>   */
> -struct landlock_layer {
> +struct landlock_rule_layer {
>  	/**
>  	 * @level: Position of this layer in the layer stack.
>  	 */
> @@ -102,10 +103,11 @@ struct landlock_rule {
>  	 */
>  	u32 num_layers;
>  	/**
> -	 * @layers: Stack of layers, from the latest to the newest, implemented
> -	 * as a flexible array member (FAM).
> +	 * @layers: Stack of layers, from the latest to the newest,
> +	 * implemented as a flexible array member (FAM). Only
> +	 * contains layers that has a rule for this object.
>  	 */
> -	struct landlock_layer layers[] __counted_by(num_layers);
> +	struct landlock_rule_layer layers[] __counted_by(num_layers);
>  };
>  
>  /**
> @@ -124,6 +126,18 @@ struct landlock_hierarchy {
>  	refcount_t usage;
>  };
>  
> +/**
> + * struct landlock_ruleset_layer - Store per-layer information
> + * within a domain (or a non-merged ruleset)
> + */
> +struct landlock_ruleset_layer {
> +	/**
> +	 * @access_masks: Contains the subset of filesystem and
> +	 * network actions that are restricted by a layer.
> +	 */
> +	struct access_masks access_masks;
> +};
> +
>  /**
>   * struct landlock_ruleset - Landlock ruleset
>   *
> @@ -187,18 +201,17 @@ struct landlock_ruleset {
>  			 */
>  			u32 num_layers;
>  			/**
> -			 * @access_masks: Contains the subset of filesystem and
> -			 * network actions that are restricted by a ruleset.
> -			 * A domain saves all layers of merged rulesets in a
> -			 * stack (FAM), starting from the first layer to the
> -			 * last one.  These layers are used when merging
> -			 * rulesets, for user space backward compatibility
> -			 * (i.e. future-proof), and to properly handle merged
> -			 * rulesets without overlapping access rights.  These
> -			 * layers are set once and never changed for the
> -			 * lifetime of the ruleset.
> +			 * @layer_stack: A domain saves all layers of merged
> +			 * rulesets in a stack (FAM), starting from the first
> +			 * layer to the last one.  These layers are used when
> +			 * merging rulesets, for user space backward
> +			 * compatibility (i.e. future-proof), and to properly
> +			 * handle merged rulesets without overlapping access
> +			 * rights.  These layers are set once and never
> +			 * changed for the lifetime of the ruleset.
>  			 */
> -			struct access_masks access_masks[];
> +			struct landlock_ruleset_layer
> +				layer_stack[] __counted_by(num_layers);
>  		};
>  	};
>  };
> @@ -248,7 +261,7 @@ landlock_union_access_masks(const struct landlock_ruleset *const domain)
>  
>  	for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
>  		union access_masks_all layer = {
> -			.masks = domain->access_masks[layer_level],
> +			.masks = domain->layer_stack[layer_level].access_masks,
>  		};
>  
>  		matches.all |= layer.all;
> @@ -296,7 +309,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset,
>  
>  	/* Should already be checked in sys_landlock_create_ruleset(). */
>  	WARN_ON_ONCE(fs_access_mask != fs_mask);
> -	ruleset->access_masks[layer_level].fs |= fs_mask;
> +	ruleset->layer_stack[layer_level].access_masks.fs |= fs_mask;
>  }
>  
>  static inline void
> @@ -308,7 +321,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
>  
>  	/* Should already be checked in sys_landlock_create_ruleset(). */
>  	WARN_ON_ONCE(net_access_mask != net_mask);
> -	ruleset->access_masks[layer_level].net |= net_mask;
> +	ruleset->layer_stack[layer_level].access_masks.net |= net_mask;
>  }
>  
>  static inline void
> @@ -319,7 +332,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset,
>  
>  	/* Should already be checked in sys_landlock_create_ruleset(). */
>  	WARN_ON_ONCE(scope_mask != mask);
> -	ruleset->access_masks[layer_level].scope |= mask;
> +	ruleset->layer_stack[layer_level].access_masks.scope |= mask;
>  }
>  
>  static inline access_mask_t
> @@ -327,7 +340,7 @@ landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset,
>  			    const u16 layer_level)
>  {
>  	/* Handles all initially denied by default access rights. */
> -	return ruleset->access_masks[layer_level].fs |
> +	return ruleset->layer_stack[layer_level].access_masks.fs |
>  	       _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
>  }
>  
> @@ -335,14 +348,14 @@ static inline access_mask_t
>  landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
>  			     const u16 layer_level)
>  {
> -	return ruleset->access_masks[layer_level].net;
> +	return ruleset->layer_stack[layer_level].access_masks.net;
>  }
>  
>  static inline access_mask_t
>  landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
>  			const u16 layer_level)
>  {
> -	return ruleset->access_masks[layer_level].scope;
> +	return ruleset->layer_stack[layer_level].access_masks.scope;
>  }
>  
>  bool landlock_unmask_layers(const struct landlock_rule *const rule,
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index a9760d252fc2..ead9b68168ad 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -313,7 +313,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
>  		return -ENOMSG;
>  
>  	/* Checks that allowed_access matches the @ruleset constraints. */
> -	mask = ruleset->access_masks[0].fs;
> +	mask = landlock_get_fs_access_mask(ruleset, 0);
>  	if ((path_beneath_attr.allowed_access | mask) != mask)
>  		return -EINVAL;
>  
> -- 
> 2.39.5
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-04  1:13 ` [RFC PATCH 5/9] Define user structure for events and responses Tingmao Wang
@ 2025-03-04 19:49   ` Mickaël Salaün
  2025-03-06  3:05     ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-04 19:49 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> The two structures are designed to be passed via read and write
> to the supervisor-fd.  Compile time check for no holes are added
> to build_check_abi.
> 
> The event structure will be a dynamically sized structure with
> possibly a NULL-terminating filename at the end.  This is so that
> we can pass a raw filename to the supervisor for file creation
> requests, without having the trouble of not being able to open a
> fd to a file that has not been created.
> 
> NOTE: despite this patch having a new uapi, I'm still very open to e.g.
> re-using fanotify stuff instead (if that makes sense in the end). This is
> just a PoC.
> 
> Signed-off-by: Tingmao Wang <m@maowtm.org>
> ---
>  include/uapi/linux/landlock.h | 107 ++++++++++++++++++++++++++++++++++
>  security/landlock/syscalls.c  |  28 +++++++++
>  2 files changed, 135 insertions(+)
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 7bc1eb4859fb..b5645fdd998d 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -318,4 +318,111 @@ struct landlock_net_port_attr {
>  #define LANDLOCK_SCOPE_SIGNAL		                (1ULL << 1)
>  /* clang-format on*/
>  
> +/**
> + * DOC: supervisor
> + *
> + * Supervise mode
> + * ~~~~~~~~~~~~~~
> + *
> + * TODO
> + */
> +
> +typedef __u16 landlock_supervise_event_type_t;
> +/* clang-format off */
> +#define LANDLOCK_SUPERVISE_EVENT_TYPE_FS_ACCESS         1
> +#define LANDLOCK_SUPERVISE_EVENT_TYPE_NET_ACCESS        2
> +/* clang-format on */
> +
> +struct landlock_supervise_event_hdr {
> +	/**
> +	 * @type: Type of the event.
> +	 */
> +	landlock_supervise_event_type_t type;
> +	/**
> +	 * @length: Length of the entire struct
> +	 * landlock_supervise_event including this header.
> +	 */
> +	__u16 length;
> +	/**
> +	 * @cookie: Opaque identifier to be included in the response.
> +	 */
> +	__u32 cookie;

I guess we could use a __u64 index counter per layer instead.  That
would also help to order requests if they are treated by different
supervisor threads.

> +};
> +
> +struct landlock_supervise_event {
> +	struct landlock_supervise_event_hdr hdr;
> +	__u64 access_request;
> +	__kernel_pid_t accessor;
> +	union {
> +		struct {
> +			/**
> +			 * @fd1: An open file descriptor for the file (open,
> +			 * delete, execute, link, readdir, rename, truncate),
> +			 * or the parent directory (for create operations
> +			 * targeting its child) being accessed.  Must be
> +			 * closed by the reader.
> +			 *
> +			 * If this points to a parent directory, @destname
> +			 * will contain the target filename. If @destname is
> +			 * empty, this points to the target file.
> +			 */
> +			int fd1;
> +			/**
> +			 * @fd2: For link or rename requests, a second file
> +			 * descriptor for the target parent directory.  Must
> +			 * be closed by the reader.  @destname contains the
> +			 * destination filename.  This field is -1 if not
> +			 * used.
> +			 */
> +			int fd2;

Can we just use one FD but identify the requested access instead and
send one event for each, like for the audit patch series?

> +			/**
> +			 * @destname: A filename for a file creation target.
> +			 *
> +			 * If either of fd1 or fd2 points to a parent
> +			 * directory rather than the target file, this is the
> +			 * NULL-terminated name of the file that will be
> +			 * newly created.
> +			 *
> +			 * Counting the NULL terminator, this field will
> +			 * contain one or more NULL padding at the end so
> +			 * that the length of the whole struct
> +			 * landlock_supervise_event is a multiple of 8 bytes.
> +			 *
> +			 * This is a variable length member, and the length
> +			 * including the terminating NULL(s) can be derived
> +			 * from hdr.length - offsetof(struct
> +			 * landlock_supervise_event, destname).
> +			 */
> +			char destname[];

I'd prefer to avoid sending file names for now.  I don't think it's
necessary, and that could encourage supervisors to filter access
according to names.

> +		};
> +		struct {
> +			__u16 port;
> +		};
> +	};
> +};
> +

[...]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 6/9] Creating supervisor events for filesystem operations
  2025-03-04  1:13 ` [RFC PATCH 6/9] Creating supervisor events for filesystem operations Tingmao Wang
@ 2025-03-04 19:50   ` Mickaël Salaün
  2025-03-10  0:39     ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-04 19:50 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On Tue, Mar 04, 2025 at 01:13:02AM +0000, Tingmao Wang wrote:
> NOTE from future me: This implementation which waits for user response
> while blocking inside the current security_path_* hooks is problematic due
> to taking exclusive inode lock on the parent directory, and while I have a
> proposal for a solution, outlined below, I haven't managed to include the
> code for that in this version of the patch. Thus for this commit in
> particular I'm probably more looking for suggestions on the approach
> rather than code review.  Please see the TODO section at the end of this
> message before reviewing this patch.

This is good for an RFC.

> 
> ----
> 
> This patch implements a proof-of-concept for modifying the current
> landlock LSM hooks to send supervisor events and wait for responses, when
> a supervised layer is involved.
> 
> In this design, access requests which would end up being denied by other
> non-supervised landlock layers (or which would fail the normal inode
> permission check anyways - but this is currently TODO, I only thought of
> this afterwards) are denied straight away to avoid pointless supervisor
> notifications.

Yes, only denied access should be forwarded to the supervisor.  In
another patch series we could enable the supervisor to update its layer
with new rules as well.

The audit patch series should help to properly identify which layer
denied a request, and to only use the related supervisor.

> 
> Currently current_check_access_path only gets the path of the parent
> directory for create/remove operations, which is not enough for what we
> want to pass to the supervisor.  Therefore we extend it by passing in any
> relevant child dentry (but see TODO below - this may not be possible with
> the proper implementation).

Hmm, I'm not sure this kind of information is required (this is not
implemented for the audit support).  The supervisor should be fine
getting only which access is missing, right?

> 
> This initial implementation doesn't handle links and renames, and for now
> these operations behave as if no supervisor is present (and thus will be
> denied, unless it is allowed by the layer rules).  Also note that we can
> get spurious create requests if the program tries to O_CREAT open an
> existing file that exists but not in the dcache (from my understanding).
> 
> Event IDs (referred to as an opaque cookie in the uapi) are currently
> generated with a simple `next_event_id++`.  I considered using e.g. xarray
> but decided to not for this PoC. Suggestions welcome. (Note that we have
> to design our own event id even if we use an extension of fanotify, as
> fanotify uses a file descriptor to identify events, which is not generic
> enough for us)

That's another noticable difference with fanotify.  You can add it to
the next cover letter.

> 
> ----
> 
> TODO:
> 
> When testing this I realized that doing it this way means that for the
> create/delete case, we end up holding an exclusive inode lock on the
> parent directory while waiting for supervisor to respond (see namei.c -
> security_path_mknod is called in may_o_create <- lookup_open which has an
> exclusive lock if O_CREAT is passed), which will prevent all other tasks
> from accessing that directory (regardless of whether or not they are under
> landlock).

Could we use a landlock_object to identify this inode instead?

> 
> This is clearly unacceptable, but since landlock (and also this extension)
> doesn't actually need a dentry for the child (which is allocated after the
> inode lock), I think this is not unsolvable.  I'm experimenting with
> creating a new LSM hook, something like security_pathname_mknod
> (suggestions welcome), which will be called after we looked up the dentry
> for the parent (to prevent racing symlinks TOCTOU), but before we take the
> lock for it.  Such a hook can still take as argument the parent dentry,
> plus name of the child (instead of a struct path for it).
> 
> Suggestions for alternative approaches are definitely welcome!
> 
> Signed-off-by: Tingmao Wang <m@maowtm.org>
> ---
>  security/landlock/fs.c        | 134 ++++++++++++++++++++++++++++++++--
>  security/landlock/supervise.c | 122 +++++++++++++++++++++++++++++++
>  security/landlock/supervise.h | 106 ++++++++++++++++++++++++++-
>  3 files changed, 354 insertions(+), 8 deletions(-)
> 

[...]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd
  2025-03-04  1:13 ` [RFC PATCH 4/9] User-space API for creating a supervisor-fd Tingmao Wang
@ 2025-03-05 16:09   ` Mickaël Salaün
  2025-03-10  0:41     ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-05 16:09 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Jann Horn, Andy Lutomirski

On Tue, Mar 04, 2025 at 01:13:00AM +0000, Tingmao Wang wrote:
> We allow the user to pass in an additional flag to landlock_create_ruleset
> which will make the ruleset operate in "supervise" mode, with a supervisor
> attached. We create additional space in the landlock_ruleset_attr
> structure to pass the newly created supervisor fd back to user-space.
> 
> The intention, while not implemented yet, is that the user-space will read
> events from this fd and write responses back to it.
> 
> Note: need to investigate if fd clone on fork() is handled correctly, but
> should be fine if it shares the struct file. We might also want to let the
> user customize the flags on this fd, so that they can request no
> O_CLOEXEC.
> 
> NOTE: despite this patch having a new uapi, I'm still very open to e.g.
> re-using fanotify stuff instead (if that makes sense in the end). This is
> just a PoC.

The main security risk of this feature is for this FD to leak and be
used by a sandboxed process to bypass all its restrictions.  This should
be highlighted in the UAPI documentation.

> 
> Signed-off-by: Tingmao Wang <m@maowtm.org>
> ---
>  include/uapi/linux/landlock.h |  10 ++++
>  security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
>  2 files changed, 98 insertions(+), 14 deletions(-)
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index e1d2c27533b4..7bc1eb4859fb 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
>  	 * resources (e.g. IPCs).
>  	 */
>  	__u64 scoped;
> +	/**
> +	 * @supervisor_fd: Placeholder to store the supervisor file
> +	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
> +	 */
> +	__s32 supervisor_fd;

This interface would require the ruleset_attr becoming updatable by the
kernel, which might be OK in theory but requires current syscall wrapper
signature update, see sandboxer.c change.  It also creates a FD which
might not be useful (e.g. if an error occurs before the actual
enforcement).

I see a few alternatives.  We could just use/extend the ruleset FD
instead of creating a new one, but because leaking current rulesets is
not currently a security risk, we should be careful to not change that.

Another approach, similar to seccomp unotify, is to get a
"[landlock-domain]" FD returned by the landlock_restrict_self(2) when a
new LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag is set.  This FD would be a
reference to the newly created domain, which is more specific than the
ruleset used to created this domain (and that can be used to create
other domains).  This domain FD could be used for introspection (i.e.
to get read-only properties such as domain ID), but being able to
directly supervise the referenced domain only with this FD would be a
risk that we should limit.

What we can do is to implement an IOCTL command for such domain FD that
would return a supervisor FD (if the LANDLOCK_RESTRICT_SELF_SUPERVISED
flag was also set).  The key point is to check (one time) that the
process calling this IOCTL is not restricted by the related domain (see
the scope helpers).

Relying on IOCTL commands (for all these FD types) instead of read/write
operations should also limit the risk of these FDs being misused through
a confused deputy attack (because such IOCTL command would convey an
explicit intent):
https://docs.kernel.org/security/credentials.html#open-file-credentials
https://lore.kernel.org/all/CAG48ez0HW-nScxn4G5p8UHtYy=T435ZkF3Tb1ARTyyijt_cNEg@mail.gmail.com/
We should get inspiration from seccomp unotify for this too:
https://lore.kernel.org/all/20181209182414.30862-1-tycho@tycho.ws/

> +	/**
> +	 * @pad: Unused, must be zero.
> +	 */
> +	__u32 pad;

In this case we should pack the struct instead.

>  };
>  
>  /*
> @@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
>   */
>  /* clang-format off */
>  #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
> +#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
>  /* clang-format on */
>  
>  /**

[...]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-04 19:48 ` [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Mickaël Salaün
@ 2025-03-06  2:57   ` Tingmao Wang
  2025-03-06 17:07     ` Amir Goldstein
  2025-03-08 18:57     ` Mickaël Salaün
  0 siblings, 2 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-06  2:57 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jeff Xu, Mikhail Ivanov,
	Francis Laniel, Matthieu Buffet

On 3/4/25 19:48, Mickaël Salaün wrote:

> Thanks for this RFC, this is very promising!

Hi Mickaël - thanks for the prompt review and for your support! I have 
read your replies and have some thoughts already, but I kept getting 
distracted by other stuff and so haven't had much chance to express 
them.  I will address some first today and some more over the weekend.

> Another interesting use case is to trace programs and get an
> unprivileged "permissive" mode to quickly create sandbox policies.

Yes that would also be a good use. I thought of this initially but was 
thinking "I guess you can always do that with audit" but if we have 
landlock supervise maybe that would be an easier thing for tools to 
build upon...?

> As discussed, I was thinking about whether or not it would be possible
> to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> but looking at your code, I think it would mostly increase complexity.
> There are also the issue with the Landlock semantic (e.g. access rights)
> which does not map 1:1 to the fanotify one.  A last thing is that
> fanotify is deeply tied to the VFS.  So, unless someone has a better
> idea, let's continue with your approach.

That sounds sensible - I will keep going with the current direction of a 
landlock-specific uapi. (happy to revisit should other people have 
suggestions)

> Android's SDCardFS is another example of such use.

Interesting - seems like it was deprecated for reasons unrelated to 
security though.

> One of the main suggestion would be to align with the audit patch series
> semantic and the defined "blockers":
> https://lore.kernel.org/all/20250131163059.1139617-1-mic@digikod.net/
> I'll send another series soon.

I will have a read of the existing audit series - are you planning 
significant changes to it in the next one?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules
  2025-03-04 19:49   ` Mickaël Salaün
@ 2025-03-06  2:58     ` Tingmao Wang
  2025-03-08 18:57       ` Mickaël Salaün
  0 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-06  2:58 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On 3/4/25 19:49, Mickaël Salaün wrote:

> We could indeed have a pointer in the  landlock_hierarchy and have a
> dedicated bit in each layer's access_masks to indicate that this layer
> is supervised.  This should simplify the whole patch series.

That seems sensible.  I did consider using the landlock_hierarchy, but 
chose the current way as it initially seemed more sensible, but on 
second thought this means that we have to carefully increment all the 
refcounts on domain merge etc.  On the other hand storing the supervisor 
pointer in the hierarchy, if we have an extra bit in struct access_masks 
then we can quickly determine if supervisors are involved without 
effectively walking a linked list, which is nice.

Actually, just to check, is the reason why we have the access_masks FAM 
in the ruleset purely for performance? Initially I wasn't sure if each 
layer correspond 1-to-1 with landlock_hierarchy, since otherwise it 
seemed to me you could just put the access mask in the hierarchy too.
In other words, is it right to assume that, if a domain has 3 layers, 
for example, then domain->hierarchy correspond to the third layer, 
domain->hierarchy->parent correspond to the second, and
d->h->parent->parent would be the first layer's hierarchy?

> 
>>
>> This patch doesn't make any functional changes nor add any
>> supervise specific stuff.  It is purely to pave the way for
>> future patches.
>>
>> Signed-off-by: Tingmao Wang <m@maowtm.org>
>> ---
>>   security/landlock/ruleset.c  | 29 +++++++++---------
>>   security/landlock/ruleset.h  | 59 ++++++++++++++++++++++--------------
>>   security/landlock/syscalls.c |  2 +-
>>   3 files changed, 52 insertions(+), 38 deletions(-)
>>
>> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
>> index 69742467a0cf..2cc6f7c5eb1b 100644
>> --- a/security/landlock/ruleset.c
>> +++ b/security/landlock/ruleset.c
>> @@ -31,9 +31,8 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>>   {
>>   	struct landlock_ruleset *new_ruleset;
>>   
>> -	new_ruleset =
>> -		kzalloc(struct_size(new_ruleset, access_masks, num_layers),
>> -			GFP_KERNEL_ACCOUNT);
>> +	new_ruleset = kzalloc(struct_size(new_ruleset, layer_stack, num_layers),
>> +			      GFP_KERNEL_ACCOUNT);
>>   	if (!new_ruleset)
>>   		return ERR_PTR(-ENOMEM);
>>   	refcount_set(&new_ruleset->usage, 1);
>> @@ -104,8 +103,9 @@ static bool is_object_pointer(const enum landlock_key_type key_type)
>>   
>>   static struct landlock_rule *
>>   create_rule(const struct landlock_id id,
>> -	    const struct landlock_layer (*const layers)[], const u32 num_layers,
>> -	    const struct landlock_layer *const new_layer)
>> +	    const struct landlock_rule_layer (*const layers)[],
>> +	    const u32 num_layers,
>> +	    const struct landlock_rule_layer *const new_layer)
>>   {
>>   	struct landlock_rule *new_rule;
>>   	u32 new_num_layers;
>> @@ -201,7 +201,7 @@ static void build_check_ruleset(void)
>>    */
>>   static int insert_rule(struct landlock_ruleset *const ruleset,
>>   		       const struct landlock_id id,
>> -		       const struct landlock_layer (*const layers)[],
>> +		       const struct landlock_rule_layer (*const layers)[],
>>   		       const size_t num_layers)
>>   {
>>   	struct rb_node **walker_node;
>> @@ -284,7 +284,7 @@ static int insert_rule(struct landlock_ruleset *const ruleset,
>>   
>>   static void build_check_layer(void)
>>   {
>> -	const struct landlock_layer layer = {
>> +	const struct landlock_rule_layer layer = {
> 
> It's not useful to rename this struct.
> 
>>   		.level = ~0,
>>   		.access = ~0,
>>   	};
>> @@ -299,7 +299,7 @@ int landlock_insert_rule(struct landlock_ruleset *const ruleset,
>>   			 const struct landlock_id id,
>>   			 const access_mask_t access)
>>   {
>> -	struct landlock_layer layers[] = { {
>> +	struct landlock_rule_layer layers[] = { {
>>   		.access = access,
>>   		/* When @level is zero, insert_rule() extends @ruleset. */
>>   		.level = 0,
>> @@ -344,7 +344,7 @@ static int merge_tree(struct landlock_ruleset *const dst,
>>   	/* Merges the @src tree. */
>>   	rbtree_postorder_for_each_entry_safe(walker_rule, next_rule, src_root,
>>   					     node) {
>> -		struct landlock_layer layers[] = { {
>> +		struct landlock_rule_layer layers[] = { {
>>   			.level = dst->num_layers,
>>   		} };
>>   		const struct landlock_id id = {
>> @@ -389,8 +389,9 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
>>   		err = -EINVAL;
>>   		goto out_unlock;
>>   	}
>> -	dst->access_masks[dst->num_layers - 1] =
>> -		landlock_upgrade_handled_access_masks(src->access_masks[0]);
>> +	dst->layer_stack[dst->num_layers - 1].access_masks =
>> +		landlock_upgrade_handled_access_masks(
>> +			src->layer_stack[0].access_masks);
>>   
>>   	/* Merges the @src inode tree. */
>>   	err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
>> @@ -472,8 +473,8 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
>>   		goto out_unlock;
>>   	}
>>   	/* Copies the parent layer stack and leaves a space for the new layer. */
>> -	memcpy(child->access_masks, parent->access_masks,
>> -	       flex_array_size(parent, access_masks, parent->num_layers));
>> +	memcpy(child->layer_stack, parent->layer_stack,
>> +	       flex_array_size(parent, layer_stack, parent->num_layers));
>>   
>>   	if (WARN_ON_ONCE(!parent->hierarchy)) {
>>   		err = -EINVAL;
>> @@ -644,7 +645,7 @@ bool landlock_unmask_layers(const struct landlock_rule *const rule,
>>   	 * E.g. /a/b <execute> + /a <read> => /a/b <execute + read>
>>   	 */
>>   	for (layer_level = 0; layer_level < rule->num_layers; layer_level++) {
>> -		const struct landlock_layer *const layer =
>> +		const struct landlock_rule_layer *const layer =
>>   			&rule->layers[layer_level];
>>   		const layer_mask_t layer_bit = BIT_ULL(layer->level - 1);
>>   		const unsigned long access_req = access_request;
>> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
>> index 52f4f0af6ab0..a2605959f733 100644
>> --- a/security/landlock/ruleset.h
>> +++ b/security/landlock/ruleset.h
>> @@ -21,9 +21,10 @@
>>   #include "object.h"
>>   
>>   /**
>> - * struct landlock_layer - Access rights for a given layer
>> + * struct landlock_rule_layer - Stores the access rights for a
>> + * given layer in a rule.
>>    */
>> -struct landlock_layer {
>> +struct landlock_rule_layer {
>>   	/**
>>   	 * @level: Position of this layer in the layer stack.
>>   	 */
>> @@ -102,10 +103,11 @@ struct landlock_rule {
>>   	 */
>>   	u32 num_layers;
>>   	/**
>> -	 * @layers: Stack of layers, from the latest to the newest, implemented
>> -	 * as a flexible array member (FAM).
>> +	 * @layers: Stack of layers, from the latest to the newest,
>> +	 * implemented as a flexible array member (FAM). Only
>> +	 * contains layers that has a rule for this object.
>>   	 */
>> -	struct landlock_layer layers[] __counted_by(num_layers);
>> +	struct landlock_rule_layer layers[] __counted_by(num_layers);
>>   };
>>   
>>   /**
>> @@ -124,6 +126,18 @@ struct landlock_hierarchy {
>>   	refcount_t usage;
>>   };
>>   
>> +/**
>> + * struct landlock_ruleset_layer - Store per-layer information
>> + * within a domain (or a non-merged ruleset)
>> + */
>> +struct landlock_ruleset_layer {
>> +	/**
>> +	 * @access_masks: Contains the subset of filesystem and
>> +	 * network actions that are restricted by a layer.
>> +	 */
>> +	struct access_masks access_masks;
>> +};
>> +
>>   /**
>>    * struct landlock_ruleset - Landlock ruleset
>>    *
>> @@ -187,18 +201,17 @@ struct landlock_ruleset {
>>   			 */
>>   			u32 num_layers;
>>   			/**
>> -			 * @access_masks: Contains the subset of filesystem and
>> -			 * network actions that are restricted by a ruleset.
>> -			 * A domain saves all layers of merged rulesets in a
>> -			 * stack (FAM), starting from the first layer to the
>> -			 * last one.  These layers are used when merging
>> -			 * rulesets, for user space backward compatibility
>> -			 * (i.e. future-proof), and to properly handle merged
>> -			 * rulesets without overlapping access rights.  These
>> -			 * layers are set once and never changed for the
>> -			 * lifetime of the ruleset.
>> +			 * @layer_stack: A domain saves all layers of merged
>> +			 * rulesets in a stack (FAM), starting from the first
>> +			 * layer to the last one.  These layers are used when
>> +			 * merging rulesets, for user space backward
>> +			 * compatibility (i.e. future-proof), and to properly
>> +			 * handle merged rulesets without overlapping access
>> +			 * rights.  These layers are set once and never
>> +			 * changed for the lifetime of the ruleset.
>>   			 */
>> -			struct access_masks access_masks[];
>> +			struct landlock_ruleset_layer
>> +				layer_stack[] __counted_by(num_layers);
>>   		};
>>   	};
>>   };
>> @@ -248,7 +261,7 @@ landlock_union_access_masks(const struct landlock_ruleset *const domain)
>>   
>>   	for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
>>   		union access_masks_all layer = {
>> -			.masks = domain->access_masks[layer_level],
>> +			.masks = domain->layer_stack[layer_level].access_masks,
>>   		};
>>   
>>   		matches.all |= layer.all;
>> @@ -296,7 +309,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset,
>>   
>>   	/* Should already be checked in sys_landlock_create_ruleset(). */
>>   	WARN_ON_ONCE(fs_access_mask != fs_mask);
>> -	ruleset->access_masks[layer_level].fs |= fs_mask;
>> +	ruleset->layer_stack[layer_level].access_masks.fs |= fs_mask;
>>   }
>>   
>>   static inline void
>> @@ -308,7 +321,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
>>   
>>   	/* Should already be checked in sys_landlock_create_ruleset(). */
>>   	WARN_ON_ONCE(net_access_mask != net_mask);
>> -	ruleset->access_masks[layer_level].net |= net_mask;
>> +	ruleset->layer_stack[layer_level].access_masks.net |= net_mask;
>>   }
>>   
>>   static inline void
>> @@ -319,7 +332,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset,
>>   
>>   	/* Should already be checked in sys_landlock_create_ruleset(). */
>>   	WARN_ON_ONCE(scope_mask != mask);
>> -	ruleset->access_masks[layer_level].scope |= mask;
>> +	ruleset->layer_stack[layer_level].access_masks.scope |= mask;
>>   }
>>   
>>   static inline access_mask_t
>> @@ -327,7 +340,7 @@ landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset,
>>   			    const u16 layer_level)
>>   {
>>   	/* Handles all initially denied by default access rights. */
>> -	return ruleset->access_masks[layer_level].fs |
>> +	return ruleset->layer_stack[layer_level].access_masks.fs |
>>   	       _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
>>   }
>>   
>> @@ -335,14 +348,14 @@ static inline access_mask_t
>>   landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
>>   			     const u16 layer_level)
>>   {
>> -	return ruleset->access_masks[layer_level].net;
>> +	return ruleset->layer_stack[layer_level].access_masks.net;
>>   }
>>   
>>   static inline access_mask_t
>>   landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
>>   			const u16 layer_level)
>>   {
>> -	return ruleset->access_masks[layer_level].scope;
>> +	return ruleset->layer_stack[layer_level].access_masks.scope;
>>   }
>>   
>>   bool landlock_unmask_layers(const struct landlock_rule *const rule,
>> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
>> index a9760d252fc2..ead9b68168ad 100644
>> --- a/security/landlock/syscalls.c
>> +++ b/security/landlock/syscalls.c
>> @@ -313,7 +313,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
>>   		return -ENOMSG;
>>   
>>   	/* Checks that allowed_access matches the @ruleset constraints. */
>> -	mask = ruleset->access_masks[0].fs;
>> +	mask = landlock_get_fs_access_mask(ruleset, 0);
>>   	if ((path_beneath_attr.allowed_access | mask) != mask)
>>   		return -EINVAL;
>>   
>> -- 
>> 2.39.5
>>
>>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-04 19:49   ` Mickaël Salaün
@ 2025-03-06  3:05     ` Tingmao Wang
  2025-03-08 19:07       ` Mickaël Salaün
  2025-03-10  0:39       ` Tingmao Wang
  0 siblings, 2 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-06  3:05 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On 3/4/25 19:49, Mickaël Salaün wrote:
> On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
[...]
>> +	/**
>> +	 * @cookie: Opaque identifier to be included in the response.
>> +	 */
>> +	__u32 cookie;
> 
> I guess we could use a __u64 index counter per layer instead.  That
> would also help to order requests if they are treated by different
> supervisor threads.

I don't immediately see a use for ordering requests (if we get more than 
one event at once, they are coming from different threads anyway so 
there can't be any dependencies between them, and the supervisor threads 
can use timestamps), but I think making it a __u64 is probably a good 
idea regardless, as it means we don't have to do some sort of ID 
allocation, and can just increment an atomic.

>> +};
>> +
>> +struct landlock_supervise_event {
>> +	struct landlock_supervise_event_hdr hdr;
>> +	__u64 access_request;
>> +	__kernel_pid_t accessor;
>> +	union {
>> +		struct {
>> +			/**
>> +			 * @fd1: An open file descriptor for the file (open,
>> +			 * delete, execute, link, readdir, rename, truncate),
>> +			 * or the parent directory (for create operations
>> +			 * targeting its child) being accessed.  Must be
>> +			 * closed by the reader.
>> +			 *
>> +			 * If this points to a parent directory, @destname
>> +			 * will contain the target filename. If @destname is
>> +			 * empty, this points to the target file.
>> +			 */
>> +			int fd1;
>> +			/**
>> +			 * @fd2: For link or rename requests, a second file
>> +			 * descriptor for the target parent directory.  Must
>> +			 * be closed by the reader.  @destname contains the
>> +			 * destination filename.  This field is -1 if not
>> +			 * used.
>> +			 */
>> +			int fd2;
> 
> Can we just use one FD but identify the requested access instead and
> send one event for each, like for the audit patch series?

I haven't managed to read or test out the audit patch yet (I will do), 
but I think having the ability to specifically tell whether the child is 
trying to move / rename / create a hard link of an existing file, and 
what it's trying to use as destination, might be useful (either for 
security, or purely for UX)?

For example, imagine something trying to link or move ~/.ssh/id_ecdsa to 
/tmp/innocent-tmp-file then read the latter. The supervisor can warn the 
user on the initial link attempt, and the shenanigan will probably be 
stopped there (although still, being able to say "[program] wants to 
link ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just 
"[program] wants to create a link for ~/.ssh/id_ecdsa"), but even if 
somehow this ends up allowed, later on for the read request it could say 
something like

	[program] wants to read /tmp/innocent-tmp-file
	    (previously moved from ~/.ssh/id_ecdsa)

Maybe this is a bit silly, but there might be other use cases for 
knowing the exact details of a rename/link request, either for 
at-the-time decision making, or tracking stuff for future requests?

I will try out the audit patch to see how things like these appears in 
the log before commenting further on this. Maybe there is a way to 
achieve this while still simplifying the event structure?

> 
>> +			/**
>> +			 * @destname: A filename for a file creation target.
>> +			 *
>> +			 * If either of fd1 or fd2 points to a parent
>> +			 * directory rather than the target file, this is the
>> +			 * NULL-terminated name of the file that will be
>> +			 * newly created.
>> +			 *
>> +			 * Counting the NULL terminator, this field will
>> +			 * contain one or more NULL padding at the end so
>> +			 * that the length of the whole struct
>> +			 * landlock_supervise_event is a multiple of 8 bytes.
>> +			 *
>> +			 * This is a variable length member, and the length
>> +			 * including the terminating NULL(s) can be derived
>> +			 * from hdr.length - offsetof(struct
>> +			 * landlock_supervise_event, destname).
>> +			 */
>> +			char destname[];
> 
> I'd prefer to avoid sending file names for now.  I don't think it's
> necessary, and that could encourage supervisors to filter access
> according to names.
>

This is also motivated by the potential UX I'm thinking of. For example, 
if a newly installed application tries to create ~/.app-name, it will be 
much more reassuring and convenient to the user if we can show something 
like

	[program] wants to mkdir ~/.app-name. Allow this and future
	access to the new directory?

rather than just "[program] wants to mkdir under ~". (The "Allow this 
and future access to the new directory" bit is made possible by the 
supervisor knowing the name of the file/directory being created, and can 
remember them / write them out to a persistent profile etc)

Note that this is just the filename under the dir represented by fd - 
this isn't a path or anything that can be subject to symlink-related 
attacks, etc.  If a program calls e.g.
mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
my understanding is that fd1 will point to /some/dir, and destname would 
be "stuff"

Actually, in case your question is "why not send a fd to represent the 
newly created file, instead of sending the name" -- I'm not sure whether 
you can open even an O_PATH fd to a non-existent file.

>> +		};
>> +		struct {
>> +			__u16 port;
>> +		};
>> +	};
>> +};
>> +
> 
> [...]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-06  2:57   ` Tingmao Wang
@ 2025-03-06 17:07     ` Amir Goldstein
  2025-03-08 19:14       ` Mickaël Salaün
  2025-03-11  0:42       ` Tingmao Wang
  2025-03-08 18:57     ` Mickaël Salaün
  1 sibling, 2 replies; 47+ messages in thread
From: Amir Goldstein @ 2025-03-06 17:07 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, Günther Noack, Jan Kara,
	linux-security-module, Matthew Bobrowski, linux-fsdevel,
	Tycho Andersen, Christian Brauner, Kees Cook, Jeff Xu,
	Mikhail Ivanov, Francis Laniel, Matthieu Buffet, Song Liu

On Thu, Mar 6, 2025 at 3:57 AM Tingmao Wang <m@maowtm.org> wrote:
>
> On 3/4/25 19:48, Mickaël Salaün wrote:
>
> > Thanks for this RFC, this is very promising!
>
> Hi Mickaël - thanks for the prompt review and for your support! I have
> read your replies and have some thoughts already, but I kept getting
> distracted by other stuff and so haven't had much chance to express
> them.  I will address some first today and some more over the weekend.
>
> > Another interesting use case is to trace programs and get an
> > unprivileged "permissive" mode to quickly create sandbox policies.
>
> Yes that would also be a good use. I thought of this initially but was
> thinking "I guess you can always do that with audit" but if we have
> landlock supervise maybe that would be an easier thing for tools to
> build upon...?
>
> > As discussed, I was thinking about whether or not it would be possible
> > to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> > but looking at your code, I think it would mostly increase complexity.
> > There are also the issue with the Landlock semantic (e.g. access rights)
> > which does not map 1:1 to the fanotify one.  A last thing is that
> > fanotify is deeply tied to the VFS.  So, unless someone has a better
> > idea, let's continue with your approach.
>
> That sounds sensible - I will keep going with the current direction of a
> landlock-specific uapi. (happy to revisit should other people have
> suggestions)
>

w.r.t sharing infrastructure with fanotify, I only looked briefly at
your patches
and I have only a vague familiarity with landlock, so I cannot yet form an
opinion whether this is a good idea, but I wanted to give you a few more
data points about fanotify that seem relevant.

1. There is already some intersection of fanotify and audit lsm via the
fanotify_response_info_audit_rule extension for permission
events, so it's kind of a precedent of using fanotify to aid an lsm

2. See this fan_pre_modify-wip branch [1] and specifically commit
  "fanotify: introduce directory entry pre-modify permission events"
I do have an intention to add create/delete/rename permission events.
Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
far from the security_path_ lsm hooks, but not exactly in the same place
because we want to fsnotify hooks to be before taking vfs locks, to allow
listener to write to filesystem from event context.
There are different semantics than just ALLOW/DENY that you need,
therefore, only if we move the security_path_ hooks outside the
vfs locks, our use cases could use the same hooks

3. There is a recent attempt to add BPF filter to fanotify [2]
which is driven among other things from the long standing requirement
to add subtree filtering to fanotify watches.
The challenge with all the attempt to implement a subtree filter so far,
is that adding vfs performance overhead for all the users in the system
is unacceptable.

IIUC, landlock rule set can already express a subtree filter (?),
so it is intriguing to know if there is room for some integration on this
aspect, but my guess is that landlock mostly uses subtree filter
after filtering by specific pids (?), so it can avoid the performance
overhead of a subtree filter on most of the users in the system.

Hope this information is useful.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
[2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
                   ` (9 preceding siblings ...)
  2025-03-04 19:48 ` [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Mickaël Salaün
@ 2025-03-06 21:04 ` Jan Kara
  2025-03-08 19:15   ` Mickaël Salaün
  10 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2025-03-06 21:04 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, Günther Noack, Jan Kara,
	linux-security-module, Amir Goldstein, Matthew Bobrowski,
	linux-fsdevel, Tycho Andersen

On Tue 04-03-25 01:12:56, Tingmao Wang wrote:
> Alternatives
> ------------
> 
> I have looked for existing ways to implement the proposed use cases (at
> least for FS access), and three main approaches stand out to me:
> 
> 1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
> response from a fanotify listener.  However, it does not currently have
> the equivalent _PERM for file creation, deletion, rename and linking, and
> it is also not designed for unprivileged, process-scoped use (unlike
> landlock).

As Amir wrote, arbitration of creation / deletion / ... is not a principial
problem for fanotify and we plan to go in that direction anyway for HSM
usecase. However adjusting fanotify permission events for a per-process
scope and for unpriviledged users is a fundamental difference to how
fanotify is designed to work (it watches filesystem objects, not processes
and actions they do) and so I don't think that would be a great fit. Also I
don't see fanotify expanding in the networking area as the concepts are
rather different there :).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-06  2:57   ` Tingmao Wang
  2025-03-06 17:07     ` Amir Goldstein
@ 2025-03-08 18:57     ` Mickaël Salaün
  1 sibling, 0 replies; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-08 18:57 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jeff Xu, Mikhail Ivanov,
	Francis Laniel, Matthieu Buffet

On Thu, Mar 06, 2025 at 02:57:13AM +0000, Tingmao Wang wrote:
> On 3/4/25 19:48, Mickaël Salaün wrote:
> 
> > Thanks for this RFC, this is very promising!
> 
> Hi Mickaël - thanks for the prompt review and for your support! I have read
> your replies and have some thoughts already, but I kept getting distracted
> by other stuff and so haven't had much chance to express them.  I will
> address some first today and some more over the weekend.
> 
> > Another interesting use case is to trace programs and get an
> > unprivileged "permissive" mode to quickly create sandbox policies.
> 
> Yes that would also be a good use. I thought of this initially but was
> thinking "I guess you can always do that with audit" but if we have landlock
> supervise maybe that would be an easier thing for tools to build upon...?

Both approaches are valuable.  The supervisor one would be unprivileged,
could get access to more information including O_PATH FD's, but it is
much slower and relies on user space monitoring code.

> 
> > As discussed, I was thinking about whether or not it would be possible
> > to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> > but looking at your code, I think it would mostly increase complexity.
> > There are also the issue with the Landlock semantic (e.g. access rights)
> > which does not map 1:1 to the fanotify one.  A last thing is that
> > fanotify is deeply tied to the VFS.  So, unless someone has a better
> > idea, let's continue with your approach.
> 
> That sounds sensible - I will keep going with the current direction of a
> landlock-specific uapi. (happy to revisit should other people have
> suggestions)
> 
> > Android's SDCardFS is another example of such use.
> 
> Interesting - seems like it was deprecated for reasons unrelated to security
> though.

Yes, Android first used FUSE, then SDCardFS, then FUSE again, but the
goal has been the same:
https://source.android.com/docs/core/storage/scoped

> 
> > One of the main suggestion would be to align with the audit patch series
> > semantic and the defined "blockers":
> > https://lore.kernel.org/all/20250131163059.1139617-1-mic@digikod.net/
> > I'll send another series soon.
> 
> I will have a read of the existing audit series - are you planning
> significant changes to it in the next one?

Not significant changes but still some that hook changes that might
require a rebase.  I just sent v6, you'll find it applied here:
https://web.git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git/log/?h=next

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules
  2025-03-06  2:58     ` Tingmao Wang
@ 2025-03-08 18:57       ` Mickaël Salaün
  2025-03-10  0:38         ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-08 18:57 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jann Horn, Andy Lutomirski

On Thu, Mar 06, 2025 at 02:58:01AM +0000, Tingmao Wang wrote:
> On 3/4/25 19:49, Mickaël Salaün wrote:
> 
> > We could indeed have a pointer in the  landlock_hierarchy and have a
> > dedicated bit in each layer's access_masks to indicate that this layer
> > is supervised.  This should simplify the whole patch series.
> 
> That seems sensible.  I did consider using the landlock_hierarchy, but chose
> the current way as it initially seemed more sensible, but on second thought
> this means that we have to carefully increment all the refcounts on domain
> merge etc.  On the other hand storing the supervisor pointer in the
> hierarchy, if we have an extra bit in struct access_masks then we can
> quickly determine if supervisors are involved without effectively walking a
> linked list, which is nice.

Right

> 
> Actually, just to check, is the reason why we have the access_masks FAM in
> the ruleset purely for performance? Initially I wasn't sure if each layer
> correspond 1-to-1 with landlock_hierarchy, since otherwise it seemed to me
> you could just put the access mask in the hierarchy too.

Yes, we could put the access rights in the hierarchy, but that would
involve walking through the hierarchy to know if Landlock should
actually handle (i.e. allow or potentially deny) an access request.
Landlock is designed in a way that makes legitimate/allowed access as
fast as possible (there is still room for improvement though).  In the
case of the supervisor feature, it should mainly be used to dynamically
allow access which are statically denied for one layer.  And because it
will require a round trip to user space anyway, the performance impact
of putting the supervisor pointer in landlock_hierarchy is negligible.

Initially the purpose of landlock_hierarchy was to be able to compare
domains (for ptrace and later scope restrictions), whereas the
landlock_ruleset is to store immutable data (without references) when
used as a domain.  With the audit feature, the landlock_hierarchy will
also contain domain's shared/mutable states and pointers that should
only be rarely accessed (i.e. only for denials).  So, in a nutshell
landlock_ruleset as a domain should stay minimal and improve data
locality to speed up allowed access requests.

We could decorrelate the current content of landlock_hierarchy from the
shared data, but I think it would only be meaningful if this data is
significant (see the landlock_details pointer in the audit patch series).

> In other words, is it right to assume that, if a domain has 3 layers, for
> example, then domain->hierarchy correspond to the third layer,
> domain->hierarchy->parent correspond to the second, and
> d->h->parent->parent would be the first layer's hierarchy?

Yes, that is always the case for a domain.

If we create the supervisor FD with landlock_restrict_self(2), then
we'll not have to add a new pointer to landlock_ruleset, but only
directly to landlock_hierarchy.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-06  3:05     ` Tingmao Wang
@ 2025-03-08 19:07       ` Mickaël Salaün
  2025-03-10  0:39         ` Tingmao Wang
  2025-03-10  0:39       ` Tingmao Wang
  1 sibling, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-08 19:07 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jann Horn, Andy Lutomirski,
	Paul Moore, linux-api

On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
> On 3/4/25 19:49, Mickaël Salaün wrote:
> > On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> [...]
> > > +	/**
> > > +	 * @cookie: Opaque identifier to be included in the response.
> > > +	 */
> > > +	__u32 cookie;
> > 
> > I guess we could use a __u64 index counter per layer instead.  That
> > would also help to order requests if they are treated by different
> > supervisor threads.
> 
> I don't immediately see a use for ordering requests (if we get more than one
> event at once, they are coming from different threads anyway so there can't
> be any dependencies between them, and the supervisor threads can use
> timestamps), but I think making it a __u64 is probably a good idea
> regardless, as it means we don't have to do some sort of ID allocation, and
> can just increment an atomic.

Indeed, we should follow the seccomp unotify approach with a random u64
incremented per request.

> 
> > > +};
> > > +
> > > +struct landlock_supervise_event {
> > > +	struct landlock_supervise_event_hdr hdr;
> > > +	__u64 access_request;
> > > +	__kernel_pid_t accessor;
> > > +	union {
> > > +		struct {
> > > +			/**
> > > +			 * @fd1: An open file descriptor for the file (open,
> > > +			 * delete, execute, link, readdir, rename, truncate),
> > > +			 * or the parent directory (for create operations
> > > +			 * targeting its child) being accessed.  Must be
> > > +			 * closed by the reader.
> > > +			 *
> > > +			 * If this points to a parent directory, @destname
> > > +			 * will contain the target filename. If @destname is
> > > +			 * empty, this points to the target file.
> > > +			 */
> > > +			int fd1;
> > > +			/**
> > > +			 * @fd2: For link or rename requests, a second file
> > > +			 * descriptor for the target parent directory.  Must
> > > +			 * be closed by the reader.  @destname contains the
> > > +			 * destination filename.  This field is -1 if not
> > > +			 * used.
> > > +			 */
> > > +			int fd2;
> > 
> > Can we just use one FD but identify the requested access instead and
> > send one event for each, like for the audit patch series?
> 
> I haven't managed to read or test out the audit patch yet (I will do), but I
> think having the ability to specifically tell whether the child is trying to
> move / rename / create a hard link of an existing file, and what it's trying
> to use as destination, might be useful (either for security, or purely for
> UX)?
> 
> For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
> /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
> user on the initial link attempt, and the shenanigan will probably be
> stopped there (although still, being able to say "[program] wants to link
> ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
> wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
> up allowed, later on for the read request it could say something like
> 
> 	[program] wants to read /tmp/innocent-tmp-file
> 	    (previously moved from ~/.ssh/id_ecdsa)
> 
> Maybe this is a bit silly, but there might be other use cases for knowing
> the exact details of a rename/link request, either for at-the-time decision
> making, or tracking stuff for future requests?

This pattern looks like datagram packets.  I think we should use the
netlink attributes.  There were concern about using a netlink socket for
the seccomp unotification though:
https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/

There are two main differences with seccomp unotify:
- the supervisor should be able to receive arbitrary-sized data (e.g.
  file name, not path);
- the supervisor should be able to receive file descriptors (instead of
  path).

Sockets are created with socket(2) whereas in our case we should only
get a supervisor FD (indirectly) through landlock_restrict_self(2),
which clearly identifies a kernel object.  Another issue would be to
deal with network namespaces, probably by creating a private one.
Sockets are powerful but we don't needs all the routing complexity.
Moreover, we should only need a blocking communication channel to avoid
issues managing in-flight object references (transformed to FDs when
received).  That makes me think that a socket might not be the right
construct, but we can still rely on the NLA macros to define a proper
protocol with dynamically-sized events, received and send with dedicated
IOCTL commands.

Netlink already provides a way to send a cookie, and
netlink_attribute_type defines the types we'll need, including string.

For instance, a link request/event could include 3 packets, one for each
of these properties:
1. the source file FD;
2. the destination directory FD;
3. the destination filename string.

This way we would avoid the union defined in this patch.

There is still the question about receiving FDs though. It would be nice
to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
sure how this could be properly handled wrt NLA.

> 
> I will try out the audit patch to see how things like these appears in the
> log before commenting further on this. Maybe there is a way to achieve this
> while still simplifying the event structure?
> 
> > 
> > > +			/**
> > > +			 * @destname: A filename for a file creation target.
> > > +			 *
> > > +			 * If either of fd1 or fd2 points to a parent
> > > +			 * directory rather than the target file, this is the
> > > +			 * NULL-terminated name of the file that will be
> > > +			 * newly created.
> > > +			 *
> > > +			 * Counting the NULL terminator, this field will
> > > +			 * contain one or more NULL padding at the end so
> > > +			 * that the length of the whole struct
> > > +			 * landlock_supervise_event is a multiple of 8 bytes.
> > > +			 *
> > > +			 * This is a variable length member, and the length
> > > +			 * including the terminating NULL(s) can be derived
> > > +			 * from hdr.length - offsetof(struct
> > > +			 * landlock_supervise_event, destname).
> > > +			 */
> > > +			char destname[];
> > 
> > I'd prefer to avoid sending file names for now.  I don't think it's
> > necessary, and that could encourage supervisors to filter access
> > according to names.
> > 
> 
> This is also motivated by the potential UX I'm thinking of. For example, if
> a newly installed application tries to create ~/.app-name, it will be much
> more reassuring and convenient to the user if we can show something like
> 
> 	[program] wants to mkdir ~/.app-name. Allow this and future
> 	access to the new directory?
> 
> rather than just "[program] wants to mkdir under ~". (The "Allow this and
> future access to the new directory" bit is made possible by the supervisor
> knowing the name of the file/directory being created, and can remember them
> / write them out to a persistent profile etc)
> 
> Note that this is just the filename under the dir represented by fd - this
> isn't a path or anything that can be subject to symlink-related attacks,
> etc.  If a program calls e.g.
> mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
> my understanding is that fd1 will point to /some/dir, and destname would be
> "stuff"

Right, this file name information would be useful.  In the case of
audit, the goal is to efficiently and asynchronously log security events
(and align with other LSM logs and related limitations), not primarily
to debug sandboxed apps nor to enrich this information for decision
making, but the supervisor feature would help here.  The patch message
should include this rationale.


> 
> Actually, in case your question is "why not send a fd to represent the newly
> created file, instead of sending the name" -- I'm not sure whether you can
> open even an O_PATH fd to a non-existent file.

That would not be possible because it would not exist yet, a file name
(not file path) is OK for this case.

> 
> > > +		};
> > > +		struct {
> > > +			__u16 port;
> > > +		};
> > > +	};
> > > +};
> > > +
> > 
> > [...]
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-06 17:07     ` Amir Goldstein
@ 2025-03-08 19:14       ` Mickaël Salaün
  2025-03-11  0:42       ` Tingmao Wang
  1 sibling, 0 replies; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-08 19:14 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Tingmao Wang, Günther Noack, Jan Kara, linux-security-module,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jeff Xu, Mikhail Ivanov,
	Francis Laniel, Matthieu Buffet, Song Liu

On Thu, Mar 06, 2025 at 06:07:35PM +0100, Amir Goldstein wrote:
> On Thu, Mar 6, 2025 at 3:57 AM Tingmao Wang <m@maowtm.org> wrote:
> >
> > On 3/4/25 19:48, Mickaël Salaün wrote:
> >
> > > Thanks for this RFC, this is very promising!
> >
> > Hi Mickaël - thanks for the prompt review and for your support! I have
> > read your replies and have some thoughts already, but I kept getting
> > distracted by other stuff and so haven't had much chance to express
> > them.  I will address some first today and some more over the weekend.
> >
> > > Another interesting use case is to trace programs and get an
> > > unprivileged "permissive" mode to quickly create sandbox policies.
> >
> > Yes that would also be a good use. I thought of this initially but was
> > thinking "I guess you can always do that with audit" but if we have
> > landlock supervise maybe that would be an easier thing for tools to
> > build upon...?
> >
> > > As discussed, I was thinking about whether or not it would be possible
> > > to use the fanotify interface (e.g. fanotify_init(), fanotify FD...),
> > > but looking at your code, I think it would mostly increase complexity.
> > > There are also the issue with the Landlock semantic (e.g. access rights)
> > > which does not map 1:1 to the fanotify one.  A last thing is that
> > > fanotify is deeply tied to the VFS.  So, unless someone has a better
> > > idea, let's continue with your approach.
> >
> > That sounds sensible - I will keep going with the current direction of a
> > landlock-specific uapi. (happy to revisit should other people have
> > suggestions)
> >
> 
> w.r.t sharing infrastructure with fanotify, I only looked briefly at
> your patches
> and I have only a vague familiarity with landlock, so I cannot yet form an
> opinion whether this is a good idea, but I wanted to give you a few more
> data points about fanotify that seem relevant.
> 
> 1. There is already some intersection of fanotify and audit lsm via the
> fanotify_response_info_audit_rule extension for permission
> events, so it's kind of a precedent of using fanotify to aid an lsm
> 
> 2. See this fan_pre_modify-wip branch [1] and specifically commit
>   "fanotify: introduce directory entry pre-modify permission events"
> I do have an intention to add create/delete/rename permission events.
> Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> far from the security_path_ lsm hooks, but not exactly in the same place
> because we want to fsnotify hooks to be before taking vfs locks, to allow
> listener to write to filesystem from event context.
> There are different semantics than just ALLOW/DENY that you need,
> therefore, only if we move the security_path_ hooks outside the
> vfs locks, our use cases could use the same hooks
> 
> 3. There is a recent attempt to add BPF filter to fanotify [2]
> which is driven among other things from the long standing requirement
> to add subtree filtering to fanotify watches.
> The challenge with all the attempt to implement a subtree filter so far,
> is that adding vfs performance overhead for all the users in the system
> is unacceptable.
> 
> IIUC, landlock rule set can already express a subtree filter (?),

Yes, Landlock uses a set of inode tags and a path walk to identify
hierarchies.

> so it is intriguing to know if there is room for some integration on this
> aspect, but my guess is that landlock mostly uses subtree filter
> after filtering by specific pids (?), so it can avoid the performance
> overhead of a subtree filter on most of the users in the system.

Landlock domains are indeed enforced for a set of specific tasks.

> 
> Hope this information is useful.

Yes, thanks for the explanations.  We should definitely take inspiration
from fanotify but I don't think it would be a good fit for Landlock: the
semantic of access rights is (and will) be different, and more
importantly it is not only to supervise filesystem accesses.

> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
> [2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-06 21:04 ` Jan Kara
@ 2025-03-08 19:15   ` Mickaël Salaün
  0 siblings, 0 replies; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-08 19:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tingmao Wang, Günther Noack, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On Thu, Mar 06, 2025 at 10:04:54PM +0100, Jan Kara wrote:
> On Tue 04-03-25 01:12:56, Tingmao Wang wrote:
> > Alternatives
> > ------------
> > 
> > I have looked for existing ways to implement the proposed use cases (at
> > least for FS access), and three main approaches stand out to me:
> > 
> > 1. Fanotify: there is already FAM_OPEN_PERM which waits for an allow/deny
> > response from a fanotify listener.  However, it does not currently have
> > the equivalent _PERM for file creation, deletion, rename and linking, and
> > it is also not designed for unprivileged, process-scoped use (unlike
> > landlock).
> 
> As Amir wrote, arbitration of creation / deletion / ... is not a principial
> problem for fanotify and we plan to go in that direction anyway for HSM
> usecase. However adjusting fanotify permission events for a per-process
> scope and for unpriviledged users is a fundamental difference to how
> fanotify is designed to work (it watches filesystem objects, not processes
> and actions they do) and so I don't think that would be a great fit. Also I
> don't see fanotify expanding in the networking area as the concepts are
> rather different there :).

Yes, I agree.  We should take inspiration from the fanonify interface
though.

> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules
  2025-03-08 18:57       ` Mickaël Salaün
@ 2025-03-10  0:38         ` Tingmao Wang
  0 siblings, 0 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-10  0:38 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jann Horn, Andy Lutomirski

On 3/8/25 18:57, Mickaël Salaün wrote:
[...]
> Yes, we could put the access rights in the hierarchy, but that would
> involve walking through the hierarchy to know if Landlock should
> actually handle (i.e. allow or potentially deny) an access request.
> Landlock is designed in a way that makes legitimate/allowed access as
> fast as possible (there is still room for improvement though).  In the
> case of the supervisor feature, it should mainly be used to dynamically
> allow access which are statically denied for one layer.  And because it
> will require a round trip to user space anyway, the performance impact
> of putting the supervisor pointer in landlock_hierarchy is negligible.
> 
> Initially the purpose of landlock_hierarchy was to be able to compare
> domains (for ptrace and later scope restrictions), whereas the
> landlock_ruleset is to store immutable data (without references) when
> used as a domain.  With the audit feature, the landlock_hierarchy will
> also contain domain's shared/mutable states and pointers that should
> only be rarely accessed (i.e. only for denials).  So, in a nutshell
> landlock_ruleset as a domain should stay minimal and improve data
> locality to speed up allowed access requests.

That makes total sense - I will move the supervisor pointer to 
landlock_hierarchy and drop this change in the next version.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-06  3:05     ` Tingmao Wang
  2025-03-08 19:07       ` Mickaël Salaün
@ 2025-03-10  0:39       ` Tingmao Wang
  2025-03-11 19:28         ` Mickaël Salaün
  1 sibling, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-10  0:39 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On 3/6/25 03:05, Tingmao Wang wrote:
[...]
> This is also motivated by the potential UX I'm thinking of. For example, 
> if a newly installed application tries to create ~/.app-name, it will be 
> much more reassuring and convenient to the user if we can show something 
> like
> 
>      [program] wants to mkdir ~/.app-name. Allow this and future
>      access to the new directory?
> 
> rather than just "[program] wants to mkdir under ~". (The "Allow this 
> and future access to the new directory" bit is made possible by the 
> supervisor knowing the name of the file/directory being created, and can 
> remember them / write them out to a persistent profile etc)

Another significant motivation, which I forgot to mention, is to 
auto-grant access to newly created files/sockets etc under things like 
/tmp, $XDG_RUNTIME_DIR, or ~/Downloads.

> [...]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-08 19:07       ` Mickaël Salaün
@ 2025-03-10  0:39         ` Tingmao Wang
  2025-03-11 19:29           ` Mickaël Salaün
  0 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-10  0:39 UTC (permalink / raw)
  To: Mickaël Salaün, Tycho Andersen
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel,
	Christian Brauner, Kees Cook, Jann Horn, Andy Lutomirski,
	Paul Moore, linux-api

On 3/8/25 19:07, Mickaël Salaün wrote:
> On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
>> On 3/4/25 19:49, Mickaël Salaün wrote:
>>> On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
>> [...]
>>>> +	/**
>>>> +	 * @cookie: Opaque identifier to be included in the response.
>>>> +	 */
>>>> +	__u32 cookie;
>>>
>>> I guess we could use a __u64 index counter per layer instead.  That
>>> would also help to order requests if they are treated by different
>>> supervisor threads.
>>
>> I don't immediately see a use for ordering requests (if we get more than one
>> event at once, they are coming from different threads anyway so there can't
>> be any dependencies between them, and the supervisor threads can use
>> timestamps), but I think making it a __u64 is probably a good idea
>> regardless, as it means we don't have to do some sort of ID allocation, and
>> can just increment an atomic.
> 
> Indeed, we should follow the seccomp unotify approach with a random u64
> incremented per request.

Do you mean a random starting value, incremented by one per request, or 
something like the landlock_id in the audit patch (random increments too)?

> 
>>
>>>> +};
>>>> +
>>>> +struct landlock_supervise_event {
>>>> +	struct landlock_supervise_event_hdr hdr;
>>>> +	__u64 access_request;
>>>> +	__kernel_pid_t accessor;
>>>> +	union {
>>>> +		struct {
>>>> +			/**
>>>> +			 * @fd1: An open file descriptor for the file (open,
>>>> +			 * delete, execute, link, readdir, rename, truncate),
>>>> +			 * or the parent directory (for create operations
>>>> +			 * targeting its child) being accessed.  Must be
>>>> +			 * closed by the reader.
>>>> +			 *
>>>> +			 * If this points to a parent directory, @destname
>>>> +			 * will contain the target filename. If @destname is
>>>> +			 * empty, this points to the target file.
>>>> +			 */
>>>> +			int fd1;
>>>> +			/**
>>>> +			 * @fd2: For link or rename requests, a second file
>>>> +			 * descriptor for the target parent directory.  Must
>>>> +			 * be closed by the reader.  @destname contains the
>>>> +			 * destination filename.  This field is -1 if not
>>>> +			 * used.
>>>> +			 */
>>>> +			int fd2;
>>>
>>> Can we just use one FD but identify the requested access instead and
>>> send one event for each, like for the audit patch series?
>>
>> I haven't managed to read or test out the audit patch yet (I will do), but I
>> think having the ability to specifically tell whether the child is trying to
>> move / rename / create a hard link of an existing file, and what it's trying
>> to use as destination, might be useful (either for security, or purely for
>> UX)?
>>
>> For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
>> /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
>> user on the initial link attempt, and the shenanigan will probably be
>> stopped there (although still, being able to say "[program] wants to link
>> ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
>> wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
>> up allowed, later on for the read request it could say something like
>>
>> 	[program] wants to read /tmp/innocent-tmp-file
>> 	    (previously moved from ~/.ssh/id_ecdsa)
>>
>> Maybe this is a bit silly, but there might be other use cases for knowing
>> the exact details of a rename/link request, either for at-the-time decision
>> making, or tracking stuff for future requests?
> 
> This pattern looks like datagram packets.  I think we should use the
> netlink attributes.  There were concern about using a netlink socket for
> the seccomp unotification though:
> https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/
> 
> There are two main differences with seccomp unotify:
> - the supervisor should be able to receive arbitrary-sized data (e.g.
>    file name, not path);
> - the supervisor should be able to receive file descriptors (instead of
>    path).
> 
> Sockets are created with socket(2) whereas in our case we should only
> get a supervisor FD (indirectly) through landlock_restrict_self(2),
> which clearly identifies a kernel object.  Another issue would be to
> deal with network namespaces, probably by creating a private one.
> Sockets are powerful but we don't needs all the routing complexity.
> Moreover, we should only need a blocking communication channel to avoid
> issues managing in-flight object references (transformed to FDs when
> received).  That makes me think that a socket might not be the right
> construct, but we can still rely on the NLA macros to define a proper
> protocol with dynamically-sized events, received and send with dedicated
> IOCTL commands.
> 
> Netlink already provides a way to send a cookie, and
> netlink_attribute_type defines the types we'll need, including string.
> 
> For instance, a link request/event could include 3 packets, one for each
> of these properties:
> 1. the source file FD;
> 2. the destination directory FD;
> 3. the destination filename string.
> 
> This way we would avoid the union defined in this patch.

I had no idea about netlink - I will take a look.  Do you know if there 
is any existing code which uses it in a similar way (i.e. not creating 
an actual socket, but using netlink messages)?

I think in the end seccomp-unotify went with an ioctl with a custom 
struct seccomp_notif due to friction with the NL API [1] - do you think 
we will face the same problem here? (I will take a deeper look at 
netlink after sending this.)

(Tycho - could you weigh in?)

[1]: 
https://lore.kernel.org/all/CAGXu5jKsLDSBjB74SrvCvmGy_RTEjBsMtR5dk1CcRFrHEQfM_g@mail.gmail.com/

> 
> There is still the question about receiving FDs though. It would be nice
> to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
> sure how this could be properly handled wrt NLA.

Also, if we go with netlink messages, why do we need additional IOCTLs? 
Can we open the fd when we write out the message? (Maybe I will end up 
realizing the reason for this after reading netlink code, but I would )

> 
>>
>> I will try out the audit patch to see how things like these appears in the
>> log before commenting further on this. Maybe there is a way to achieve this
>> while still simplifying the event structure?
>>
>>>
>>>> +			/**
>>>> +			 * @destname: A filename for a file creation target.
>>>> +			 *
>>>> +			 * If either of fd1 or fd2 points to a parent
>>>> +			 * directory rather than the target file, this is the
>>>> +			 * NULL-terminated name of the file that will be
>>>> +			 * newly created.
>>>> +			 *
>>>> +			 * Counting the NULL terminator, this field will
>>>> +			 * contain one or more NULL padding at the end so
>>>> +			 * that the length of the whole struct
>>>> +			 * landlock_supervise_event is a multiple of 8 bytes.
>>>> +			 *
>>>> +			 * This is a variable length member, and the length
>>>> +			 * including the terminating NULL(s) can be derived
>>>> +			 * from hdr.length - offsetof(struct
>>>> +			 * landlock_supervise_event, destname).
>>>> +			 */
>>>> +			char destname[];
>>>
>>> I'd prefer to avoid sending file names for now.  I don't think it's
>>> necessary, and that could encourage supervisors to filter access
>>> according to names.
>>>
>>
>> This is also motivated by the potential UX I'm thinking of. For example, if
>> a newly installed application tries to create ~/.app-name, it will be much
>> more reassuring and convenient to the user if we can show something like
>>
>> 	[program] wants to mkdir ~/.app-name. Allow this and future
>> 	access to the new directory?
>>
>> rather than just "[program] wants to mkdir under ~". (The "Allow this and
>> future access to the new directory" bit is made possible by the supervisor
>> knowing the name of the file/directory being created, and can remember them
>> / write them out to a persistent profile etc)
>>
>> Note that this is just the filename under the dir represented by fd - this
>> isn't a path or anything that can be subject to symlink-related attacks,
>> etc.  If a program calls e.g.
>> mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
>> my understanding is that fd1 will point to /some/dir, and destname would be
>> "stuff"
> 
> Right, this file name information would be useful.  In the case of
> audit, the goal is to efficiently and asynchronously log security events
> (and align with other LSM logs and related limitations), not primarily
> to debug sandboxed apps nor to enrich this information for decision
> making, but the supervisor feature would help here.  The patch message
> should include this rationale.

Will do

> 
>>
>> Actually, in case your question is "why not send a fd to represent the newly
>> created file, instead of sending the name" -- I'm not sure whether you can
>> open even an O_PATH fd to a non-existent file.
> 
> That would not be possible because it would not exist yet, a file name
> (not file path) is OK for this case.
> 
>>
>>>> +		};
>>>> +		struct {
>>>> +			__u16 port;
>>>> +		};
>>>> +	};
>>>> +};
>>>> +
>>>
>>> [...]
>>
>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 6/9] Creating supervisor events for filesystem operations
  2025-03-04 19:50   ` Mickaël Salaün
@ 2025-03-10  0:39     ` Tingmao Wang
  2025-03-11 19:29       ` Mickaël Salaün
  0 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-10  0:39 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On 3/4/25 19:50, Mickaël Salaün wrote:
> On Tue, Mar 04, 2025 at 01:13:02AM +0000, Tingmao Wang wrote:
>> NOTE from future me: This implementation which waits for user response
>> while blocking inside the current security_path_* hooks is problematic due
>> to taking exclusive inode lock on the parent directory, and while I have a
>> proposal for a solution, outlined below, I haven't managed to include the
>> code for that in this version of the patch. Thus for this commit in
>> particular I'm probably more looking for suggestions on the approach
>> rather than code review.  Please see the TODO section at the end of this
>> message before reviewing this patch.
> 
> This is good for an RFC.
> 
>>
>> ----
>>
>> This patch implements a proof-of-concept for modifying the current
>> landlock LSM hooks to send supervisor events and wait for responses, when
>> a supervised layer is involved.
>>
>> In this design, access requests which would end up being denied by other
>> non-supervised landlock layers (or which would fail the normal inode
>> permission check anyways - but this is currently TODO, I only thought of
>> this afterwards) are denied straight away to avoid pointless supervisor
>> notifications.
> 
> Yes, only denied access should be forwarded to the supervisor.

I assume you meant only denied access *by the supervised layers* should 
be forwarded to the supervisor.

> In another patch series we could enable the supervisor to update its layer
> with new rules as well.

I did consider the possibility of this - if the supervisor has decided 
to allow all future access to e.g. a directory, ideally this can be 
"offloaded" to the kernel, but I was a bit worried about the fact that 
landlock currently quite heavily assumes the domain is immutable. While 
in the supervised case breaking that rule here should be alright (no 
worse security), not sure if there is some potential logic / data race 
bugs if we now make domains mutable.

> 
> The audit patch series should help to properly identify which layer
> denied a request, and to only use the related supervisor.

The current patch does correctly identify which layer(s) (and sends 
events to the right supervisor(s)), but aligning with and re-using code 
in the audit patch is sensible.  Will have a look.

> 
>>
>> Currently current_check_access_path only gets the path of the parent
>> directory for create/remove operations, which is not enough for what we
>> want to pass to the supervisor.  Therefore we extend it by passing in any
>> relevant child dentry (but see TODO below - this may not be possible with
>> the proper implementation).
> 
> Hmm, I'm not sure this kind of information is required (this is not
> implemented for the audit support).  The supervisor should be fine
> getting only which access is missing, right?
> 
>>
>> This initial implementation doesn't handle links and renames, and for now
>> these operations behave as if no supervisor is present (and thus will be
>> denied, unless it is allowed by the layer rules).  Also note that we can
>> get spurious create requests if the program tries to O_CREAT open an
>> existing file that exists but not in the dcache (from my understanding).
>>
>> Event IDs (referred to as an opaque cookie in the uapi) are currently
>> generated with a simple `next_event_id++`.  I considered using e.g. xarray
>> but decided to not for this PoC. Suggestions welcome. (Note that we have
>> to design our own event id even if we use an extension of fanotify, as
>> fanotify uses a file descriptor to identify events, which is not generic
>> enough for us)
> 
> That's another noticable difference with fanotify.  You can add it to
> the next cover letter.
> 
>>
>> ----
>>
>> TODO:
>>
>> When testing this I realized that doing it this way means that for the
>> create/delete case, we end up holding an exclusive inode lock on the
>> parent directory while waiting for supervisor to respond (see namei.c -
>> security_path_mknod is called in may_o_create <- lookup_open which has an
>> exclusive lock if O_CREAT is passed), which will prevent all other tasks
>> from accessing that directory (regardless of whether or not they are under
>> landlock).
> 
> Could we use a landlock_object to identify this inode instead?

Sorry - earlier when reading this I didn't quite understand this 
suggestion and forgot to say so, however the problem here is the 
location of the security_path_... hooks (by the time they are called the 
lock is already held). I'm not sure how we identify the inode makes a 
difference?

> 
>>
>> This is clearly unacceptable, but since landlock (and also this extension)
>> doesn't actually need a dentry for the child (which is allocated after the
>> inode lock), I think this is not unsolvable.  I'm experimenting with
>> creating a new LSM hook, something like security_pathname_mknod
>> (suggestions welcome), which will be called after we looked up the dentry
>> for the parent (to prevent racing symlinks TOCTOU), but before we take the
>> lock for it.  Such a hook can still take as argument the parent dentry,
>> plus name of the child (instead of a struct path for it).
>>
>> Suggestions for alternative approaches are definitely welcome!
>>
>> Signed-off-by: Tingmao Wang <m@maowtm.org>
>> ---
>>   security/landlock/fs.c        | 134 ++++++++++++++++++++++++++++++++--
>>   security/landlock/supervise.c | 122 +++++++++++++++++++++++++++++++
>>   security/landlock/supervise.h | 106 ++++++++++++++++++++++++++-
>>   3 files changed, 354 insertions(+), 8 deletions(-)
>>
> 
> [...]


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd
  2025-03-05 16:09   ` Mickaël Salaün
@ 2025-03-10  0:41     ` Tingmao Wang
  2025-03-11 19:28       ` Mickaël Salaün
  0 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-10  0:41 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Jann Horn, Andy Lutomirski

On 3/5/25 16:09, Mickaël Salaün wrote:
> On Tue, Mar 04, 2025 at 01:13:00AM +0000, Tingmao Wang wrote:
>> We allow the user to pass in an additional flag to landlock_create_ruleset
>> which will make the ruleset operate in "supervise" mode, with a supervisor
>> attached. We create additional space in the landlock_ruleset_attr
>> structure to pass the newly created supervisor fd back to user-space.
>>
>> The intention, while not implemented yet, is that the user-space will read
>> events from this fd and write responses back to it.
>>
>> Note: need to investigate if fd clone on fork() is handled correctly, but
>> should be fine if it shares the struct file. We might also want to let the
>> user customize the flags on this fd, so that they can request no
>> O_CLOEXEC.
>>
>> NOTE: despite this patch having a new uapi, I'm still very open to e.g.
>> re-using fanotify stuff instead (if that makes sense in the end). This is
>> just a PoC.
> 
> The main security risk of this feature is for this FD to leak and be
> used by a sandboxed process to bypass all its restrictions.  This should
> be highlighted in the UAPI documentation.
> 
>>
>> Signed-off-by: Tingmao Wang <m@maowtm.org>
>> ---
>>   include/uapi/linux/landlock.h |  10 ++++
>>   security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
>>   2 files changed, 98 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
>> index e1d2c27533b4..7bc1eb4859fb 100644
>> --- a/include/uapi/linux/landlock.h
>> +++ b/include/uapi/linux/landlock.h
>> @@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
>>   	 * resources (e.g. IPCs).
>>   	 */
>>   	__u64 scoped;
>> +	/**
>> +	 * @supervisor_fd: Placeholder to store the supervisor file
>> +	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
>> +	 */
>> +	__s32 supervisor_fd;
> 
> This interface would require the ruleset_attr becoming updatable by the
> kernel, which might be OK in theory but requires current syscall wrapper
> signature update, see sandboxer.c change.  It also creates a FD which
> might not be useful (e.g. if an error occurs before the actual
> enforcement).
> 
> I see a few alternatives.  We could just use/extend the ruleset FD
> instead of creating a new one, but because leaking current rulesets is
> not currently a security risk, we should be careful to not change that.
> 
> Another approach, similar to seccomp unotify, is to get a
> "[landlock-domain]" FD returned by the landlock_restrict_self(2) when a
> new LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag is set.  This FD would be a
> reference to the newly created domain, which is more specific than the
> ruleset used to created this domain (and that can be used to create
> other domains).  This domain FD could be used for introspection (i.e.
> to get read-only properties such as domain ID), but being able to
> directly supervise the referenced domain only with this FD would be a
> risk that we should limit.
> 
> What we can do is to implement an IOCTL command for such domain FD that
> would return a supervisor FD (if the LANDLOCK_RESTRICT_SELF_SUPERVISED
> flag was also set).  The key point is to check (one time) that the
> process calling this IOCTL is not restricted by the related domain (see
> the scope helpers).

Is LANDLOCK_RESTRICT_SELF_DOMAIN_FD part of your (upcoming?) 
introspection patch? (thinking about when will someone pass that only 
and not LANDLOCK_RESTRICT_SELF_SUPERVISED, or vice versa)

By the way, is it alright to conceptually relate the supervisor to a 
domain? It really would be a layer inside a domain - the domain could 
have earlier or later layers which can deny access without supervision, 
or the supervisor for earlier layers can deny access first. Therefore 
having supervisor fd coming out of the ruleset felt sensible to me at first.

Also, isn't "check that process calling this IOCTL is not restricted by 
the related domain" and the fact that the IOCTL is on the domain fd, 
which is a return value of landlock_restrict_self, kind of 
contradictory?  I mean it is a sensible check, but that kind of 
highlights that this interface is slightly awkward - basically all 
callers are forced to have a setup where the child sends the domain fd 
back to the parent.

> 
> Relying on IOCTL commands (for all these FD types) instead of read/write
> operations should also limit the risk of these FDs being misused through
> a confused deputy attack (because such IOCTL command would convey an
> explicit intent):
> https://docs.kernel.org/security/credentials.html#open-file-credentials
> https://lore.kernel.org/all/CAG48ez0HW-nScxn4G5p8UHtYy=T435ZkF3Tb1ARTyyijt_cNEg@mail.gmail.com/
> We should get inspiration from seccomp unotify for this too:
> https://lore.kernel.org/all/20181209182414.30862-1-tycho@tycho.ws/

I think in the seccomp unotify case the problem arises from what the 
setuid binary thinks is just normal data getting interpreted by the 
kernel as a fd, and thus having different effect if the attacker writes 
it vs. if the suid app writes it.  In our case I *think* we should be 
alright, but maybe we should go with ioctl anyway... However, how does 
using netlink messages (a suggestion from a different thread) affect 
this (if we do end up using it)?  Would we have to do netlink msgs via 
IOCTL?


>> +	/**
>> +	 * @pad: Unused, must be zero.
>> +	 */
>> +	__u32 pad;
> 
> In this case we should pack the struct instead.
> 
>>   };
>>   
>>   /*
>> @@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
>>    */
>>   /* clang-format off */
>>   #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
>> +#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
>>   /* clang-format on */
>>   
>>   /**
> 
> [...]


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-06 17:07     ` Amir Goldstein
  2025-03-08 19:14       ` Mickaël Salaün
@ 2025-03-11  0:42       ` Tingmao Wang
  2025-03-11 19:28         ` Mickaël Salaün
                           ` (2 more replies)
  1 sibling, 3 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-11  0:42 UTC (permalink / raw)
  To: Amir Goldstein, Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Christian Brauner, Kees Cook, Jeff Xu, Mikhail Ivanov,
	Francis Laniel, Matthieu Buffet, Song Liu

On 3/6/25 17:07, Amir Goldstein wrote:
[...]
> 
> w.r.t sharing infrastructure with fanotify, I only looked briefly at
> your patches
> and I have only a vague familiarity with landlock, so I cannot yet form an
> opinion whether this is a good idea, but I wanted to give you a few more
> data points about fanotify that seem relevant.
> 
> 1. There is already some intersection of fanotify and audit lsm via the
> fanotify_response_info_audit_rule extension for permission
> events, so it's kind of a precedent of using fanotify to aid an lsm
> 
> 2. See this fan_pre_modify-wip branch [1] and specifically commit
>    "fanotify: introduce directory entry pre-modify permission events"
> I do have an intention to add create/delete/rename permission events.
> Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> far from the security_path_ lsm hooks, but not exactly in the same place
> because we want to fsnotify hooks to be before taking vfs locks, to allow
> listener to write to filesystem from event context.
> There are different semantics than just ALLOW/DENY that you need,
> therefore, only if we move the security_path_ hooks outside the
> vfs locks, our use cases could use the same hooks

Hi Amir,

(this is a slightly long message - feel free to respond at your 
convenience, thank you in advance!)

Thanks a lot for mentioning this branch, and for the explanation! I've 
had a look and realized that the changes you have there will be very 
useful for this patch, and in fact, I've already tried a worse attempt 
of this (not included in this patch series yet) to create some 
security_pathname_ hooks that takes the parent struct path + last name 
as char*, that will be called before locking the parent.  (We can't have 
an unprivileged supervisor cause a directory to be locked indefinitely, 
which will also block users outside of the landlock domain)

I'm not sure if we can move security_path tho, because it takes the 
dentry of the child as an argument, and (I think at least for create / 
mknod / link) that dentry is only created after locking.  Hence the 
proposal for separate security_pathname_ hooks.  A search shows that 
currently AppArmor and TOMOYO (plus Landlock) uses the security_path_ 
hooks that would need changing, if we move it (and we will have to 
understand if the move is ok to do for the other two LSMs...)

However, I think it would still make a lot of sense to align with 
fsnotify here, as you have already made the changes that I would need to 
do anyway should I implement the proposed new hooks.  I think a sensible 
thing might be to have the extra LSM hooks be called alongside 
fsnotify_(re)name_perm - following the pattern of what currently happens 
with fsnotify_open_perm (i.e. security_file_open called first, then 
fsnotify_open_perm right after).

What's your thought on this? Do you think it would be a good idea to 
have LSM hook equivalents of the fsnotify (re)name perm hooks / fanotify 
pre-modify events?

Also, do you have a rough estimate of when you would upstream the 
fa/fsnotify changes? (asking just to get an idea of things, not trying 
to rush or anything :) I suspect this supervise patch would take a while 
anyway)

If you think the general idea is right, here are some further questions 
I have:

I think going by this approach any error return from 
security_pathname_mknod (or in fact, fsnotify_name_perm) when called in 
the open O_CREAT code path would end up becoming a -EROFS.  Can we turn 
the bool got_write in open_last_lookups into an int to store any error 
from mnt_want_write_parent, and return it if lookup_open returns -EROFS? 
  This is so that the user space still gets an -EACCESS on create 
denials by landlock (and in fact, if fanotify denies a create maybe we 
want it to return the correct errno also?). Maybe there is a better way, 
this is just my first though...

I also noticed that you don't currently have fsnotify hook calls for 
link (although it does end up invoking the name_perm hook on the dest 
with MAY_CREATE).  I want to propose also changing do_linkat to (pass 
the right flags to filename_create_srcu -> mnt_want_write_parent to) 
call the security_pathname_link hook (instead of the LSM hook it would 
normally call for a creation event in this proposal) that is basically 
like security_path_link, except passing the destination as a dir/name 
pair, and without holding vfs lock (still passing in the dentry of the 
source itself), to enable landlock to handle link requests separately. 
Do you think this is alright?  (Maybe the code would be a bit convoluted 
if written verbatim from this logic, maybe there is a better way, but 
the general idea is hopefully right)

btw, side question, I see that you added srcu read sections around the 
events - I'm not familiar with rcu/locking usage in vfs but is this for 
preventing e.g. changing the mount in some way (but still allowing 
access / changes to the directory)?

I realize I'm asking you a lot of things - big thanks in advance!  (also 
let me know if I should be pulling in other VFS maintainers)

--

For Mickaël,

Would you be on board with changing Landlock to use the new hooks as 
mentioned above?  My thinking is that it shouldn't make any difference 
in terms of security - Landlock permissions for e.g. creating/deleting 
files are based on the parent, and in fact except for link and rename, 
the hook_path_ functions in Landlock don't even use the dentry argument. 
  If you're happy with the general direction of this, I can investigate 
further and test it out etc.  This change might also reduce the impact 
of Landlock on non-landlocked processes, if we avoid holding exclusive 
inode lock while evaluating rules / traversing paths...? (Just a 
thought, not measured)

In terms of other aspects, ignoring supervisors for now, moving to these 
hooks:

- Should make no difference in the "happy" (access allowed) case

- Only when an access is disallowed, in order to know what error to
   return, we can check (within Landlock hook handler) if the target
   already exists - if yes, return -EEXIST, otherwise -EACCESS

If this is too large of a change at this point and you see / would 
prefer another way we can progress this series (at least the initial 
version), let me know.

Kind regards,
Tingmao

> 
> 3. There is a recent attempt to add BPF filter to fanotify [2]
> which is driven among other things from the long standing requirement
> to add subtree filtering to fanotify watches.
> The challenge with all the attempt to implement a subtree filter so far,
> is that adding vfs performance overhead for all the users in the system
> is unacceptable.
> 
> IIUC, landlock rule set can already express a subtree filter (?),
> so it is intriguing to know if there is room for some integration on this
> aspect, but my guess is that landlock mostly uses subtree filter
> after filtering by specific pids (?), so it can avoid the performance
> overhead of a subtree filter on most of the users in the system.
> 
> Hope this information is useful.
> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
> [2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-11  0:42       ` Tingmao Wang
@ 2025-03-11 19:28         ` Mickaël Salaün
  2025-03-11 20:58           ` Song Liu
  2025-03-12 10:58         ` Jan Kara
  2025-03-12 12:26         ` Amir Goldstein
  2 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-11 19:28 UTC (permalink / raw)
  To: Tingmao Wang, Christian Brauner
  Cc: Amir Goldstein, Günther Noack, Jan Kara,
	linux-security-module, Matthew Bobrowski, linux-fsdevel,
	Tycho Andersen, Kees Cook, Jeff Xu, Mikhail Ivanov,
	Francis Laniel, Matthieu Buffet, Song Liu, Paul Moore,
	Kentaro Takeda, Tetsuo Handa, John Johansen

On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
> On 3/6/25 17:07, Amir Goldstein wrote:
> [...]
> > 
> > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > your patches
> > and I have only a vague familiarity with landlock, so I cannot yet form an
> > opinion whether this is a good idea, but I wanted to give you a few more
> > data points about fanotify that seem relevant.
> > 
> > 1. There is already some intersection of fanotify and audit lsm via the
> > fanotify_response_info_audit_rule extension for permission
> > events, so it's kind of a precedent of using fanotify to aid an lsm
> > 
> > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> >    "fanotify: introduce directory entry pre-modify permission events"
> > I do have an intention to add create/delete/rename permission events.
> > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > far from the security_path_ lsm hooks, but not exactly in the same place
> > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > listener to write to filesystem from event context.
> > There are different semantics than just ALLOW/DENY that you need,
> > therefore, only if we move the security_path_ hooks outside the
> > vfs locks, our use cases could use the same hooks
> 
> Hi Amir,
> 
> (this is a slightly long message - feel free to respond at your convenience,
> thank you in advance!)
> 
> Thanks a lot for mentioning this branch, and for the explanation! I've had a
> look and realized that the changes you have there will be very useful for
> this patch, and in fact, I've already tried a worse attempt of this (not
> included in this patch series yet) to create some security_pathname_ hooks
> that takes the parent struct path + last name as char*, that will be called
> before locking the parent.  (We can't have an unprivileged supervisor cause
> a directory to be locked indefinitely, which will also block users outside
> of the landlock domain)
> 
> I'm not sure if we can move security_path tho, because it takes the dentry
> of the child as an argument, and (I think at least for create / mknod /
> link) that dentry is only created after locking.  Hence the proposal for
> separate security_pathname_ hooks.  A search shows that currently AppArmor
> and TOMOYO (plus Landlock) uses the security_path_ hooks that would need
> changing, if we move it (and we will have to understand if the move is ok to
> do for the other two LSMs...)
> 
> However, I think it would still make a lot of sense to align with fsnotify
> here, as you have already made the changes that I would need to do anyway
> should I implement the proposed new hooks.  I think a sensible thing might
> be to have the extra LSM hooks be called alongside fsnotify_(re)name_perm -
> following the pattern of what currently happens with fsnotify_open_perm
> (i.e. security_file_open called first, then fsnotify_open_perm right after).

Yes, I think it would make sense to use the same hooks for fanotify and
other security subsystems, or at least to share them.  It would improve
consistency across different Linux subsystems and simplify changes and
maintenance where these hooks are called.

> 
> What's your thought on this? Do you think it would be a good idea to have
> LSM hook equivalents of the fsnotify (re)name perm hooks / fanotify
> pre-modify events?
> 
> Also, do you have a rough estimate of when you would upstream the
> fa/fsnotify changes? (asking just to get an idea of things, not trying to
> rush or anything :) I suspect this supervise patch would take a while
> anyway)
> 
> If you think the general idea is right, here are some further questions I
> have:
> 
> I think going by this approach any error return from security_pathname_mknod
> (or in fact, fsnotify_name_perm) when called in the open O_CREAT code path
> would end up becoming a -EROFS.  Can we turn the bool got_write in
> open_last_lookups into an int to store any error from mnt_want_write_parent,
> and return it if lookup_open returns -EROFS?  This is so that the user space
> still gets an -EACCESS on create denials by landlock (and in fact, if
> fanotify denies a create maybe we want it to return the correct errno
> also?). Maybe there is a better way, this is just my first though...
> 
> I also noticed that you don't currently have fsnotify hook calls for link
> (although it does end up invoking the name_perm hook on the dest with
> MAY_CREATE).  I want to propose also changing do_linkat to (pass the right
> flags to filename_create_srcu -> mnt_want_write_parent to) call the
> security_pathname_link hook (instead of the LSM hook it would normally call
> for a creation event in this proposal) that is basically like
> security_path_link, except passing the destination as a dir/name pair, and
> without holding vfs lock (still passing in the dentry of the source itself),
> to enable landlock to handle link requests separately. Do you think this is
> alright?  (Maybe the code would be a bit convoluted if written verbatim from
> this logic, maybe there is a better way, but the general idea is hopefully
> right)
> 
> btw, side question, I see that you added srcu read sections around the
> events - I'm not familiar with rcu/locking usage in vfs but is this for
> preventing e.g. changing the mount in some way (but still allowing access /
> changes to the directory)?
> 
> I realize I'm asking you a lot of things - big thanks in advance!  (also let
> me know if I should be pulling in other VFS maintainers)
> 
> --
> 
> For Mickaël,
> 
> Would you be on board with changing Landlock to use the new hooks as
> mentioned above?  My thinking is that it shouldn't make any difference in
> terms of security - Landlock permissions for e.g. creating/deleting files
> are based on the parent, and in fact except for link and rename, the
> hook_path_ functions in Landlock don't even use the dentry argument.  If
> you're happy with the general direction of this, I can investigate further
> and test it out etc.  This change might also reduce the impact of Landlock
> on non-landlocked processes, if we avoid holding exclusive inode lock while
> evaluating rules / traversing paths...? (Just a thought, not measured)

This looks reasonable.  As long as the semantic does not change it
should be good and Landlock tests should pass.  That would also require
other users of this hook to make sure it works for them too.  If it is
not the case, I guess we could add an alternative hooks with different
properties.  However, see the issue and the alternative approach below.

> 
> In terms of other aspects, ignoring supervisors for now, moving to these
> hooks:
> 
> - Should make no difference in the "happy" (access allowed) case
> 
> - Only when an access is disallowed, in order to know what error to
>   return, we can check (within Landlock hook handler) if the target
>   already exists - if yes, return -EEXIST, otherwise -EACCESS

We should avoid as much as possible to reimplement the error types in
fanotify/LSM hooks.  This is partially done for the VFS, and completely
duplicated for the network, which can lead to inconsistent errors.  It
would be good to only have one source of truth, but that might not be
possible in all cases.

> 
> If this is too large of a change at this point and you see / would prefer
> another way we can progress this series (at least the initial version), let
> me know.

For this patch series to work, we need all (used) LSM hooks to be
blockable (and interruptible).  We should then investigate if this is
possible, especially with the new fanotify hooks, but I don't think it
would work for all hooks (already or that will potentially be used by
Landlock).

An alternative approach would be to add a task_work (executed before
returning to user space) that will wait for the supervisor to take a
decision, and in the meantime the LSM hook would return -ERESTARTNOINTR
for the syscall to start again after the wait.  However, because the
request to the supervisor would be called outside of the hook, it should
not be possible to directly allow the request (because of race
condition) but to update the domain accordingly.  The restarted syscall
must not trigger a supervisor request though.

> 
> Kind regards,
> Tingmao
> 
> > 
> > 3. There is a recent attempt to add BPF filter to fanotify [2]
> > which is driven among other things from the long standing requirement
> > to add subtree filtering to fanotify watches.
> > The challenge with all the attempt to implement a subtree filter so far,
> > is that adding vfs performance overhead for all the users in the system
> > is unacceptable.
> > 
> > IIUC, landlock rule set can already express a subtree filter (?),
> > so it is intriguing to know if there is room for some integration on this
> > aspect, but my guess is that landlock mostly uses subtree filter
> > after filtering by specific pids (?), so it can avoid the performance
> > overhead of a subtree filter on most of the users in the system.
> > 
> > Hope this information is useful.
> > 
> > Thanks,
> > Amir.
> > 
> > [1] https://github.com/amir73il/linux/commits/fan_pre_modify-wip/
> > [2] https://lore.kernel.org/linux-fsdevel/20241122225958.1775625-1-song@kernel.org/
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd
  2025-03-10  0:41     ` Tingmao Wang
@ 2025-03-11 19:28       ` Mickaël Salaün
  2025-03-26  0:06         ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-11 19:28 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Jann Horn, Andy Lutomirski

On Mon, Mar 10, 2025 at 12:41:28AM +0000, Tingmao Wang wrote:
> On 3/5/25 16:09, Mickaël Salaün wrote:
> > On Tue, Mar 04, 2025 at 01:13:00AM +0000, Tingmao Wang wrote:
> > > We allow the user to pass in an additional flag to landlock_create_ruleset
> > > which will make the ruleset operate in "supervise" mode, with a supervisor
> > > attached. We create additional space in the landlock_ruleset_attr
> > > structure to pass the newly created supervisor fd back to user-space.
> > > 
> > > The intention, while not implemented yet, is that the user-space will read
> > > events from this fd and write responses back to it.
> > > 
> > > Note: need to investigate if fd clone on fork() is handled correctly, but
> > > should be fine if it shares the struct file. We might also want to let the
> > > user customize the flags on this fd, so that they can request no
> > > O_CLOEXEC.
> > > 
> > > NOTE: despite this patch having a new uapi, I'm still very open to e.g.
> > > re-using fanotify stuff instead (if that makes sense in the end). This is
> > > just a PoC.
> > 
> > The main security risk of this feature is for this FD to leak and be
> > used by a sandboxed process to bypass all its restrictions.  This should
> > be highlighted in the UAPI documentation.
> > 
> > > 
> > > Signed-off-by: Tingmao Wang <m@maowtm.org>
> > > ---
> > >   include/uapi/linux/landlock.h |  10 ++++
> > >   security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
> > >   2 files changed, 98 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > > index e1d2c27533b4..7bc1eb4859fb 100644
> > > --- a/include/uapi/linux/landlock.h
> > > +++ b/include/uapi/linux/landlock.h
> > > @@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
> > >   	 * resources (e.g. IPCs).
> > >   	 */
> > >   	__u64 scoped;
> > > +	/**
> > > +	 * @supervisor_fd: Placeholder to store the supervisor file
> > > +	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
> > > +	 */
> > > +	__s32 supervisor_fd;
> > 
> > This interface would require the ruleset_attr becoming updatable by the
> > kernel, which might be OK in theory but requires current syscall wrapper
> > signature update, see sandboxer.c change.  It also creates a FD which
> > might not be useful (e.g. if an error occurs before the actual
> > enforcement).
> > 
> > I see a few alternatives.  We could just use/extend the ruleset FD
> > instead of creating a new one, but because leaking current rulesets is
> > not currently a security risk, we should be careful to not change that.
> > 
> > Another approach, similar to seccomp unotify, is to get a
> > "[landlock-domain]" FD returned by the landlock_restrict_self(2) when a
> > new LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag is set.  This FD would be a
> > reference to the newly created domain, which is more specific than the
> > ruleset used to created this domain (and that can be used to create
> > other domains).  This domain FD could be used for introspection (i.e.
> > to get read-only properties such as domain ID), but being able to
> > directly supervise the referenced domain only with this FD would be a
> > risk that we should limit.
> > 
> > What we can do is to implement an IOCTL command for such domain FD that
> > would return a supervisor FD (if the LANDLOCK_RESTRICT_SELF_SUPERVISED
> > flag was also set).  The key point is to check (one time) that the
> > process calling this IOCTL is not restricted by the related domain (see
> > the scope helpers).
> 
> Is LANDLOCK_RESTRICT_SELF_DOMAIN_FD part of your (upcoming?) introspection
> patch? (thinking about when will someone pass that only and not
> LANDLOCK_RESTRICT_SELF_SUPERVISED, or vice versa)

I don't plan to work on such LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag for
now, but the introspection feature(s) would help for this supervisor
feature.

> 
> By the way, is it alright to conceptually relate the supervisor to a domain?
> It really would be a layer inside a domain - the domain could have earlier
> or later layers which can deny access without supervision, or the supervisor
> for earlier layers can deny access first. Therefore having supervisor fd
> coming out of the ruleset felt sensible to me at first.

Good question.  I've been using the name "domain" to refer to the set of
restrictions enforced on a set of processes, but these restrictions are
composed of inherited ones plus the latest layer.  In this case, a
domain FD should refer to all the restrictions, but the supervisor FD
should indeed only refer to the latest layer of a domain (created by
landlock_restrict_self).

> 
> Also, isn't "check that process calling this IOCTL is not restricted by the
> related domain" and the fact that the IOCTL is on the domain fd, which is a
> return value of landlock_restrict_self, kind of contradictory?  I mean it is
> a sensible check, but that kind of highlights that this interface is
> slightly awkward - basically all callers are forced to have a setup where
> the child sends the domain fd back to the parent.

I agree that its confusing.  I'd like to avoid the ruleset to gain any
control on domains after they are created.

Another approach would be to create a supervisor FD with the
landlock_create_ruleset() syscall, and pass this FD to the ruleset,
potentially with landlock_add_rule() calls to only request this
supervisor when matching specific rules (that could potentially be
catch-all rules)?

Overall, my main concern about this patch series is that the supervisor
could get a lot of requests, which will make the sandbox unusable
because always blocked by some thread/process.  This latest approach and
the ability to update the domain somehow could make it workable.

> 
> > 
> > Relying on IOCTL commands (for all these FD types) instead of read/write
> > operations should also limit the risk of these FDs being misused through
> > a confused deputy attack (because such IOCTL command would convey an
> > explicit intent):
> > https://docs.kernel.org/security/credentials.html#open-file-credentials
> > https://lore.kernel.org/all/CAG48ez0HW-nScxn4G5p8UHtYy=T435ZkF3Tb1ARTyyijt_cNEg@mail.gmail.com/
> > We should get inspiration from seccomp unotify for this too:
> > https://lore.kernel.org/all/20181209182414.30862-1-tycho@tycho.ws/
> 
> I think in the seccomp unotify case the problem arises from what the setuid
> binary thinks is just normal data getting interpreted by the kernel as a fd,
> and thus having different effect if the attacker writes it vs. if the suid
> app writes it.  In our case I *think* we should be alright, but maybe we
> should go with ioctl anyway...

I don't see why Jann's attack scenario could work for this Landlock
supervisor too.  The main point that it the read/write interfaces are
used by a lot of different FDs, and we may not need them.

> However, how does using netlink messages (a
> suggestion from a different thread) affect this (if we do end up using it)?
> Would we have to do netlink msgs via IOCTL?

Because all requests should be synchronous, one IOCTL could be used to
both acknowledge a previous event (or just start) and read the next one.

I was thinking about an IOCTL with these arguments:
1. supervisor FD
2. (extensible) IOCTL command (see PIDFD_GET_INFO for instance)
3. pointer to a fixed-size control structure

The fixed-size control structure could contain:
- handled access rights, used to only get event related to specific
  access.
- flags, to specify which kind of FD we would like to get (e.g. only
  directory FD, pidfd...)
- fd[6]: an array of received file descriptors.
- pointer to a variable-size data buffer that would contain all the
  records (e.g. source dir FD, source file name, destination dir FD,
  destination file name) for one event, potentially formatted with NLA.
- the size of this buffer

I'm not sure about the content of this buffer and the NLA format, and
the related API might not be usable without netlink sockets though.
Taking inspiration from the fanotify message format is another option.

> 
> 
> > > +	/**
> > > +	 * @pad: Unused, must be zero.
> > > +	 */
> > > +	__u32 pad;
> > 
> > In this case we should pack the struct instead.
> > 
> > >   };
> > >   /*
> > > @@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
> > >    */
> > >   /* clang-format off */
> > >   #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
> > > +#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
> > >   /* clang-format on */
> > >   /**
> > 
> > [...]
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-10  0:39       ` Tingmao Wang
@ 2025-03-11 19:28         ` Mickaël Salaün
  2025-03-11 23:18           ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-11 19:28 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On Mon, Mar 10, 2025 at 12:39:04AM +0000, Tingmao Wang wrote:
> On 3/6/25 03:05, Tingmao Wang wrote:
> [...]
> > This is also motivated by the potential UX I'm thinking of. For example,
> > if a newly installed application tries to create ~/.app-name, it will be
> > much more reassuring and convenient to the user if we can show something
> > like
> > 
> >      [program] wants to mkdir ~/.app-name. Allow this and future
> >      access to the new directory?
> > 
> > rather than just "[program] wants to mkdir under ~". (The "Allow this
> > and future access to the new directory" bit is made possible by the
> > supervisor knowing the name of the file/directory being created, and can
> > remember them / write them out to a persistent profile etc)
> 
> Another significant motivation, which I forgot to mention, is to auto-grant
> access to newly created files/sockets etc under things like /tmp,
> $XDG_RUNTIME_DIR, or ~/Downloads.

What do you mean?  What is not currently possible?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-10  0:39         ` Tingmao Wang
@ 2025-03-11 19:29           ` Mickaël Salaün
  0 siblings, 0 replies; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-11 19:29 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Tycho Andersen, Günther Noack, Jan Kara,
	linux-security-module, Amir Goldstein, Matthew Bobrowski,
	linux-fsdevel, Christian Brauner, Kees Cook, Jann Horn,
	Andy Lutomirski, Paul Moore, linux-api

On Mon, Mar 10, 2025 at 12:39:08AM +0000, Tingmao Wang wrote:
> On 3/8/25 19:07, Mickaël Salaün wrote:
> > On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
> > > On 3/4/25 19:49, Mickaël Salaün wrote:
> > > > On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> > > [...]
> > > > > +	/**
> > > > > +	 * @cookie: Opaque identifier to be included in the response.
> > > > > +	 */
> > > > > +	__u32 cookie;
> > > > 
> > > > I guess we could use a __u64 index counter per layer instead.  That
> > > > would also help to order requests if they are treated by different
> > > > supervisor threads.
> > > 
> > > I don't immediately see a use for ordering requests (if we get more than one
> > > event at once, they are coming from different threads anyway so there can't
> > > be any dependencies between them, and the supervisor threads can use
> > > timestamps), but I think making it a __u64 is probably a good idea
> > > regardless, as it means we don't have to do some sort of ID allocation, and
> > > can just increment an atomic.
> > 
> > Indeed, we should follow the seccomp unotify approach with a random u64
> > incremented per request.
> 
> Do you mean a random starting value, incremented by one per request, or

Yes

> something like the landlock_id in the audit patch (random increments too)?

There is no need for that because the supervisor is more privileged than
the sandbox.

> 
> > 
> > > 
> > > > > +};
> > > > > +
> > > > > +struct landlock_supervise_event {
> > > > > +	struct landlock_supervise_event_hdr hdr;
> > > > > +	__u64 access_request;
> > > > > +	__kernel_pid_t accessor;
> > > > > +	union {
> > > > > +		struct {
> > > > > +			/**
> > > > > +			 * @fd1: An open file descriptor for the file (open,
> > > > > +			 * delete, execute, link, readdir, rename, truncate),
> > > > > +			 * or the parent directory (for create operations
> > > > > +			 * targeting its child) being accessed.  Must be
> > > > > +			 * closed by the reader.
> > > > > +			 *
> > > > > +			 * If this points to a parent directory, @destname
> > > > > +			 * will contain the target filename. If @destname is
> > > > > +			 * empty, this points to the target file.
> > > > > +			 */
> > > > > +			int fd1;
> > > > > +			/**
> > > > > +			 * @fd2: For link or rename requests, a second file
> > > > > +			 * descriptor for the target parent directory.  Must
> > > > > +			 * be closed by the reader.  @destname contains the
> > > > > +			 * destination filename.  This field is -1 if not
> > > > > +			 * used.
> > > > > +			 */
> > > > > +			int fd2;
> > > > 
> > > > Can we just use one FD but identify the requested access instead and
> > > > send one event for each, like for the audit patch series?
> > > 
> > > I haven't managed to read or test out the audit patch yet (I will do), but I
> > > think having the ability to specifically tell whether the child is trying to
> > > move / rename / create a hard link of an existing file, and what it's trying
> > > to use as destination, might be useful (either for security, or purely for
> > > UX)?
> > > 
> > > For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
> > > /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
> > > user on the initial link attempt, and the shenanigan will probably be
> > > stopped there (although still, being able to say "[program] wants to link
> > > ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
> > > wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
> > > up allowed, later on for the read request it could say something like
> > > 
> > > 	[program] wants to read /tmp/innocent-tmp-file
> > > 	    (previously moved from ~/.ssh/id_ecdsa)
> > > 
> > > Maybe this is a bit silly, but there might be other use cases for knowing
> > > the exact details of a rename/link request, either for at-the-time decision
> > > making, or tracking stuff for future requests?
> > 
> > This pattern looks like datagram packets.  I think we should use the
> > netlink attributes.  There were concern about using a netlink socket for
> > the seccomp unotification though:
> > https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/
> > 
> > There are two main differences with seccomp unotify:
> > - the supervisor should be able to receive arbitrary-sized data (e.g.
> >    file name, not path);
> > - the supervisor should be able to receive file descriptors (instead of
> >    path).
> > 
> > Sockets are created with socket(2) whereas in our case we should only
> > get a supervisor FD (indirectly) through landlock_restrict_self(2),
> > which clearly identifies a kernel object.  Another issue would be to
> > deal with network namespaces, probably by creating a private one.
> > Sockets are powerful but we don't needs all the routing complexity.
> > Moreover, we should only need a blocking communication channel to avoid
> > issues managing in-flight object references (transformed to FDs when
> > received).  That makes me think that a socket might not be the right
> > construct, but we can still rely on the NLA macros to define a proper
> > protocol with dynamically-sized events, received and send with dedicated
> > IOCTL commands.
> > 
> > Netlink already provides a way to send a cookie, and
> > netlink_attribute_type defines the types we'll need, including string.
> > 
> > For instance, a link request/event could include 3 packets, one for each
> > of these properties:
> > 1. the source file FD;
> > 2. the destination directory FD;
> > 3. the destination filename string.
> > 
> > This way we would avoid the union defined in this patch.
> 
> I had no idea about netlink - I will take a look.  Do you know if there is
> any existing code which uses it in a similar way (i.e. not creating an
> actual socket, but using netlink messages)?

I don't know.

> 
> I think in the end seccomp-unotify went with an ioctl with a custom struct
> seccomp_notif due to friction with the NL API [1] - do you think we will
> face the same problem here? (I will take a deeper look at netlink after
> sending this.)
> 
> (Tycho - could you weigh in?)
> 
> [1]: https://lore.kernel.org/all/CAGXu5jKsLDSBjB74SrvCvmGy_RTEjBsMtR5dk1CcRFrHEQfM_g@mail.gmail.com/

We need to check if the NLA API could work.  Kees's answer was missing
explanation.  Otherwise we should get inspiration from fanotify
messages.

> 
> > 
> > There is still the question about receiving FDs though. It would be nice
> > to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
> > sure how this could be properly handled wrt NLA.
> 
> Also, if we go with netlink messages, why do we need additional IOCTLs? Can
> we open the fd when we write out the message? (Maybe I will end up realizing
> the reason for this after reading netlink code, but I would )

It's much easier to have static-sized struct, both for developers and
for introspection tools (e.g. strace).  However, in this case we also
would also have variable-lenght data.  See my other reply discussing the
IOCTL idea.

> 
> > 
> > > 
> > > I will try out the audit patch to see how things like these appears in the
> > > log before commenting further on this. Maybe there is a way to achieve this
> > > while still simplifying the event structure?
> > > 
> > > > 
> > > > > +			/**
> > > > > +			 * @destname: A filename for a file creation target.
> > > > > +			 *
> > > > > +			 * If either of fd1 or fd2 points to a parent
> > > > > +			 * directory rather than the target file, this is the
> > > > > +			 * NULL-terminated name of the file that will be
> > > > > +			 * newly created.
> > > > > +			 *
> > > > > +			 * Counting the NULL terminator, this field will
> > > > > +			 * contain one or more NULL padding at the end so
> > > > > +			 * that the length of the whole struct
> > > > > +			 * landlock_supervise_event is a multiple of 8 bytes.
> > > > > +			 *
> > > > > +			 * This is a variable length member, and the length
> > > > > +			 * including the terminating NULL(s) can be derived
> > > > > +			 * from hdr.length - offsetof(struct
> > > > > +			 * landlock_supervise_event, destname).
> > > > > +			 */
> > > > > +			char destname[];
> > > > 
> > > > I'd prefer to avoid sending file names for now.  I don't think it's
> > > > necessary, and that could encourage supervisors to filter access
> > > > according to names.
> > > > 
> > > 
> > > This is also motivated by the potential UX I'm thinking of. For example, if
> > > a newly installed application tries to create ~/.app-name, it will be much
> > > more reassuring and convenient to the user if we can show something like
> > > 
> > > 	[program] wants to mkdir ~/.app-name. Allow this and future
> > > 	access to the new directory?
> > > 
> > > rather than just "[program] wants to mkdir under ~". (The "Allow this and
> > > future access to the new directory" bit is made possible by the supervisor
> > > knowing the name of the file/directory being created, and can remember them
> > > / write them out to a persistent profile etc)
> > > 
> > > Note that this is just the filename under the dir represented by fd - this
> > > isn't a path or anything that can be subject to symlink-related attacks,
> > > etc.  If a program calls e.g.
> > > mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
> > > my understanding is that fd1 will point to /some/dir, and destname would be
> > > "stuff"
> > 
> > Right, this file name information would be useful.  In the case of
> > audit, the goal is to efficiently and asynchronously log security events
> > (and align with other LSM logs and related limitations), not primarily
> > to debug sandboxed apps nor to enrich this information for decision
> > making, but the supervisor feature would help here.  The patch message
> > should include this rationale.
> 
> Will do
> 
> > 
> > > 
> > > Actually, in case your question is "why not send a fd to represent the newly
> > > created file, instead of sending the name" -- I'm not sure whether you can
> > > open even an O_PATH fd to a non-existent file.
> > 
> > That would not be possible because it would not exist yet, a file name
> > (not file path) is OK for this case.
> > 
> > > 
> > > > > +		};
> > > > > +		struct {
> > > > > +			__u16 port;
> > > > > +		};
> > > > > +	};
> > > > > +};
> > > > > +
> > > > 
> > > > [...]
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 6/9] Creating supervisor events for filesystem operations
  2025-03-10  0:39     ` Tingmao Wang
@ 2025-03-11 19:29       ` Mickaël Salaün
  0 siblings, 0 replies; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-11 19:29 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On Mon, Mar 10, 2025 at 12:39:17AM +0000, Tingmao Wang wrote:
> On 3/4/25 19:50, Mickaël Salaün wrote:
> > On Tue, Mar 04, 2025 at 01:13:02AM +0000, Tingmao Wang wrote:
> > > NOTE from future me: This implementation which waits for user response
> > > while blocking inside the current security_path_* hooks is problematic due
> > > to taking exclusive inode lock on the parent directory, and while I have a
> > > proposal for a solution, outlined below, I haven't managed to include the
> > > code for that in this version of the patch. Thus for this commit in
> > > particular I'm probably more looking for suggestions on the approach
> > > rather than code review.  Please see the TODO section at the end of this
> > > message before reviewing this patch.
> > 
> > This is good for an RFC.
> > 
> > > 
> > > ----
> > > 
> > > This patch implements a proof-of-concept for modifying the current
> > > landlock LSM hooks to send supervisor events and wait for responses, when
> > > a supervised layer is involved.
> > > 
> > > In this design, access requests which would end up being denied by other
> > > non-supervised landlock layers (or which would fail the normal inode
> > > permission check anyways - but this is currently TODO, I only thought of
> > > this afterwards) are denied straight away to avoid pointless supervisor
> > > notifications.
> > 
> > Yes, only denied access should be forwarded to the supervisor.
> 
> I assume you meant only denied access *by the supervised layers* should be
> forwarded to the supervisor.

Yes

> 
> > In another patch series we could enable the supervisor to update its layer
> > with new rules as well.
> 
> I did consider the possibility of this - if the supervisor has decided to
> allow all future access to e.g. a directory, ideally this can be "offloaded"
> to the kernel, but I was a bit worried about the fact that landlock
> currently quite heavily assumes the domain is immutable. While in the
> supervised case breaking that rule here should be alright (no worse
> security), not sure if there is some potential logic / data race bugs if we
> now make domains mutable.

Domains are currently immutable, it would be good to keep this property
as much as possible, but at the same time I don't see how this
supervisor feature would work in practice without the ability to update
the domain.

> 
> > 
> > The audit patch series should help to properly identify which layer
> > denied a request, and to only use the related supervisor.
> 
> The current patch does correctly identify which layer(s) (and sends events
> to the right supervisor(s)), but aligning with and re-using code in the
> audit patch is sensible.  Will have a look.

Yes please, some helpers look very similar.  It would be useful if you
reviewed this part in the audit patch series.

> 
> > 
> > > 
> > > Currently current_check_access_path only gets the path of the parent
> > > directory for create/remove operations, which is not enough for what we
> > > want to pass to the supervisor.  Therefore we extend it by passing in any
> > > relevant child dentry (but see TODO below - this may not be possible with
> > > the proper implementation).
> > 
> > Hmm, I'm not sure this kind of information is required (this is not
> > implemented for the audit support).  The supervisor should be fine
> > getting only which access is missing, right?
> > 
> > > 
> > > This initial implementation doesn't handle links and renames, and for now
> > > these operations behave as if no supervisor is present (and thus will be
> > > denied, unless it is allowed by the layer rules).  Also note that we can
> > > get spurious create requests if the program tries to O_CREAT open an
> > > existing file that exists but not in the dcache (from my understanding).
> > > 
> > > Event IDs (referred to as an opaque cookie in the uapi) are currently
> > > generated with a simple `next_event_id++`.  I considered using e.g. xarray
> > > but decided to not for this PoC. Suggestions welcome. (Note that we have
> > > to design our own event id even if we use an extension of fanotify, as
> > > fanotify uses a file descriptor to identify events, which is not generic
> > > enough for us)
> > 
> > That's another noticable difference with fanotify.  You can add it to
> > the next cover letter.
> > 
> > > 
> > > ----
> > > 
> > > TODO:
> > > 
> > > When testing this I realized that doing it this way means that for the
> > > create/delete case, we end up holding an exclusive inode lock on the
> > > parent directory while waiting for supervisor to respond (see namei.c -
> > > security_path_mknod is called in may_o_create <- lookup_open which has an
> > > exclusive lock if O_CREAT is passed), which will prevent all other tasks
> > > from accessing that directory (regardless of whether or not they are under
> > > landlock).
> > 
> > Could we use a landlock_object to identify this inode instead?
> 
> Sorry - earlier when reading this I didn't quite understand this suggestion
> and forgot to say so, however the problem here is the location of the
> security_path_... hooks (by the time they are called the lock is already
> held). I'm not sure how we identify the inode makes a difference?

Yes, we should just be able to create a O_PATH FD from the hooks, but in
the task_work (see my other reply).

> 
> > 
> > > 
> > > This is clearly unacceptable, but since landlock (and also this extension)
> > > doesn't actually need a dentry for the child (which is allocated after the
> > > inode lock), I think this is not unsolvable.  I'm experimenting with
> > > creating a new LSM hook, something like security_pathname_mknod
> > > (suggestions welcome), which will be called after we looked up the dentry
> > > for the parent (to prevent racing symlinks TOCTOU), but before we take the
> > > lock for it.  Such a hook can still take as argument the parent dentry,
> > > plus name of the child (instead of a struct path for it).
> > > 
> > > Suggestions for alternative approaches are definitely welcome!
> > > 
> > > Signed-off-by: Tingmao Wang <m@maowtm.org>
> > > ---
> > >   security/landlock/fs.c        | 134 ++++++++++++++++++++++++++++++++--
> > >   security/landlock/supervise.c | 122 +++++++++++++++++++++++++++++++
> > >   security/landlock/supervise.h | 106 ++++++++++++++++++++++++++-
> > >   3 files changed, 354 insertions(+), 8 deletions(-)
> > > 
> > 
> > [...]
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-11 19:28         ` Mickaël Salaün
@ 2025-03-11 20:58           ` Song Liu
  2025-03-11 22:03             ` Tingmao Wang
  2025-03-12 11:50             ` Mickaël Salaün
  0 siblings, 2 replies; 47+ messages in thread
From: Song Liu @ 2025-03-11 20:58 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Tingmao Wang, Christian Brauner, Amir Goldstein,
	Günther Noack, Jan Kara, linux-security-module,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen, Kees Cook,
	Jeff Xu, Mikhail Ivanov, Francis Laniel, Matthieu Buffet,
	Paul Moore, Kentaro Takeda, Tetsuo Handa, John Johansen

On Tue, Mar 11, 2025 at 12:28 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
> > On 3/6/25 17:07, Amir Goldstein wrote:
> > [...]
> > >
> > > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > > your patches
> > > and I have only a vague familiarity with landlock, so I cannot yet form an
> > > opinion whether this is a good idea, but I wanted to give you a few more
> > > data points about fanotify that seem relevant.
> > >
> > > 1. There is already some intersection of fanotify and audit lsm via the
> > > fanotify_response_info_audit_rule extension for permission
> > > events, so it's kind of a precedent of using fanotify to aid an lsm
> > >
> > > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> > >    "fanotify: introduce directory entry pre-modify permission events"
> > > I do have an intention to add create/delete/rename permission events.
> > > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > > far from the security_path_ lsm hooks, but not exactly in the same place
> > > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > > listener to write to filesystem from event context.
> > > There are different semantics than just ALLOW/DENY that you need,
> > > therefore, only if we move the security_path_ hooks outside the
> > > vfs locks, our use cases could use the same hooks
> >
> > Hi Amir,
> >
> > (this is a slightly long message - feel free to respond at your convenience,
> > thank you in advance!)
> >
> > Thanks a lot for mentioning this branch, and for the explanation! I've had a
> > look and realized that the changes you have there will be very useful for
> > this patch, and in fact, I've already tried a worse attempt of this (not
> > included in this patch series yet) to create some security_pathname_ hooks
> > that takes the parent struct path + last name as char*, that will be called
> > before locking the parent.  (We can't have an unprivileged supervisor cause
> > a directory to be locked indefinitely, which will also block users outside
> > of the landlock domain)
> >
> > I'm not sure if we can move security_path tho, because it takes the dentry
> > of the child as an argument, and (I think at least for create / mknod /
> > link) that dentry is only created after locking.  Hence the proposal for
> > separate security_pathname_ hooks.  A search shows that currently AppArmor
> > and TOMOYO (plus Landlock) uses the security_path_ hooks that would need
> > changing, if we move it (and we will have to understand if the move is ok to
> > do for the other two LSMs...)
> >
> > However, I think it would still make a lot of sense to align with fsnotify
> > here, as you have already made the changes that I would need to do anyway
> > should I implement the proposed new hooks.  I think a sensible thing might
> > be to have the extra LSM hooks be called alongside fsnotify_(re)name_perm -
> > following the pattern of what currently happens with fsnotify_open_perm
> > (i.e. security_file_open called first, then fsnotify_open_perm right after).

I think there is a fundamental difference between LSM hooks and fsnotify,
so putting fsnotify behind some LSM hooks might be weird. Specifically,
LSM hooks are always global. If a LSM attaches to a hook, say
security_file_open, it will see all the file open calls in the system. On the
other hand, each fsnotify rule only applies to a group, so that one fanotify
handler doesn't touch files watched by another fanotify handler. Given this
difference, I am not sure how fsnotify LSM hooks should look like.

Does this make sense?

> Yes, I think it would make sense to use the same hooks for fanotify and
> other security subsystems, or at least to share them.  It would improve
> consistency across different Linux subsystems and simplify changes and
> maintenance where these hooks are called.

[...]

> > --
> >
> > For Mickaël,
> >
> > Would you be on board with changing Landlock to use the new hooks as
> > mentioned above?  My thinking is that it shouldn't make any difference in
> > terms of security - Landlock permissions for e.g. creating/deleting files
> > are based on the parent, and in fact except for link and rename, the
> > hook_path_ functions in Landlock don't even use the dentry argument.  If
> > you're happy with the general direction of this, I can investigate further
> > and test it out etc.  This change might also reduce the impact of Landlock
> > on non-landlocked processes, if we avoid holding exclusive inode lock while
> > evaluating rules / traversing paths...? (Just a thought, not measured)

I think the filter for process/thread is usually faster than the filter for
file/path/subtree? Therefore, it is better for landlock to check the filter for
process/thread first. Did I miss/misunderstand something?

Thanks,
Song




> This looks reasonable.  As long as the semantic does not change it
> should be good and Landlock tests should pass.  That would also require
> other users of this hook to make sure it works for them too.  If it is
> not the case, I guess we could add an alternative hooks with different
> properties.  However, see the issue and the alternative approach below.
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-11 20:58           ` Song Liu
@ 2025-03-11 22:03             ` Tingmao Wang
  2025-03-11 23:23               ` Song Liu
  2025-03-12 11:50             ` Mickaël Salaün
  1 sibling, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-11 22:03 UTC (permalink / raw)
  To: Song Liu, Mickaël Salaün, Christian Brauner
  Cc: Amir Goldstein, Günther Noack, Jan Kara,
	linux-security-module, Matthew Bobrowski, linux-fsdevel,
	Tycho Andersen, Kees Cook, Jeff Xu, Mikhail Ivanov,
	Francis Laniel, Matthieu Buffet, Paul Moore, Kentaro Takeda,
	Tetsuo Handa, John Johansen

On 3/11/25 20:58, Song Liu wrote:
> On Tue, Mar 11, 2025 at 12:28 PM Mickaël Salaün <mic@digikod.net> wrote:
>>
>> On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
>>> On 3/6/25 17:07, Amir Goldstein wrote:
>>> [...]
>>>>
>>>> w.r.t sharing infrastructure with fanotify, I only looked briefly at
>>>> your patches
>>>> and I have only a vague familiarity with landlock, so I cannot yet form an
>>>> opinion whether this is a good idea, but I wanted to give you a few more
>>>> data points about fanotify that seem relevant.
>>>>
>>>> 1. There is already some intersection of fanotify and audit lsm via the
>>>> fanotify_response_info_audit_rule extension for permission
>>>> events, so it's kind of a precedent of using fanotify to aid an lsm
>>>>
>>>> 2. See this fan_pre_modify-wip branch [1] and specifically commit
>>>>     "fanotify: introduce directory entry pre-modify permission events"
>>>> I do have an intention to add create/delete/rename permission events.
>>>> Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
>>>> far from the security_path_ lsm hooks, but not exactly in the same place
>>>> because we want to fsnotify hooks to be before taking vfs locks, to allow
>>>> listener to write to filesystem from event context.
>>>> There are different semantics than just ALLOW/DENY that you need,
>>>> therefore, only if we move the security_path_ hooks outside the
>>>> vfs locks, our use cases could use the same hooks
>>>
>>> Hi Amir,
>>>
>>> (this is a slightly long message - feel free to respond at your convenience,
>>> thank you in advance!)
>>>
>>> Thanks a lot for mentioning this branch, and for the explanation! I've had a
>>> look and realized that the changes you have there will be very useful for
>>> this patch, and in fact, I've already tried a worse attempt of this (not
>>> included in this patch series yet) to create some security_pathname_ hooks
>>> that takes the parent struct path + last name as char*, that will be called
>>> before locking the parent.  (We can't have an unprivileged supervisor cause
>>> a directory to be locked indefinitely, which will also block users outside
>>> of the landlock domain)
>>>
>>> I'm not sure if we can move security_path tho, because it takes the dentry
>>> of the child as an argument, and (I think at least for create / mknod /
>>> link) that dentry is only created after locking.  Hence the proposal for
>>> separate security_pathname_ hooks.  A search shows that currently AppArmor
>>> and TOMOYO (plus Landlock) uses the security_path_ hooks that would need
>>> changing, if we move it (and we will have to understand if the move is ok to
>>> do for the other two LSMs...)
>>>
>>> However, I think it would still make a lot of sense to align with fsnotify
>>> here, as you have already made the changes that I would need to do anyway
>>> should I implement the proposed new hooks.  I think a sensible thing might
>>> be to have the extra LSM hooks be called alongside fsnotify_(re)name_perm -
>>> following the pattern of what currently happens with fsnotify_open_perm
>>> (i.e. security_file_open called first, then fsnotify_open_perm right after).
> 
> I think there is a fundamental difference between LSM hooks and fsnotify,
> so putting fsnotify behind some LSM hooks might be weird. Specifically,
> LSM hooks are always global. If a LSM attaches to a hook, say
> security_file_open, it will see all the file open calls in the system. On the
> other hand, each fsnotify rule only applies to a group, so that one fanotify
> handler doesn't touch files watched by another fanotify handler. Given this
> difference, I am not sure how fsnotify LSM hooks should look like.
> 
> Does this make sense?

To clarify, I wasn't suggesting that we put one hook _behind_ another 
("behind" in the sense of one calling the other), just that the place 
that calls the new fsnotify_name_perm/fsnotify_rename_perm hook (in 
Amir's WIP branch) could also be made to call some new LSM hooks in 
addition to fsnotify (i.e. security_pathname_create/delete/rename).

My understanding of the current code is that VFS calls security_... and 
fsnotify_... unconditionally, and the fsnotify_... functions figure out 
who needs to be notified.

> 
>> Yes, I think it would make sense to use the same hooks for fanotify and
>> other security subsystems, or at least to share them.  It would improve
>> consistency across different Linux subsystems and simplify changes and
>> maintenance where these hooks are called.

Mickaël - I'm not sure what you mean by "the same hook" - do you mean 
the relevant VFS functions could call both fsnotify and LSM hooks?

> 
> [...]
> 
>>> --
>>>
>>> For Mickaël,
>>>
>>> Would you be on board with changing Landlock to use the new hooks as
>>> mentioned above?  My thinking is that it shouldn't make any difference in
>>> terms of security - Landlock permissions for e.g. creating/deleting files
>>> are based on the parent, and in fact except for link and rename, the
>>> hook_path_ functions in Landlock don't even use the dentry argument.  If
>>> you're happy with the general direction of this, I can investigate further
>>> and test it out etc.  This change might also reduce the impact of Landlock
>>> on non-landlocked processes, if we avoid holding exclusive inode lock while
>>> evaluating rules / traversing paths...? (Just a thought, not measured)
> 
> I think the filter for process/thread is usually faster than the filter for
> file/path/subtree? Therefore, it is better for landlock to check the filter for
> process/thread first. Did I miss/misunderstand something?
>

Sorry, I should have clarified that the "impact" I'm talking about here 
isn't referring to directly the time it takes for landlock to decide if 
an access is allowed or not - in a non-landlocked process, the landlock 
hooks already returns really early and fast.  However, I'm thinking of a 
situation where a landlocked process makes lots of create/delete etc 
requests on a directory, and landlock does need to do some work (e.g. 
path traversal) to decide those access.  Because the 
security_path_mknod/unlink/... hooks are called in the VFS from a place 
where it is holding an exclusive lock on the directory (for O_CREAT'ing 
a child or other directory modification cases), when landlock is working 
out an access by the landlocked process, no other tasks will be able to 
read/write the directory (they will be blocked on the lock), even if 
their access have nothing to do with landlock.

I should add that this is probably just a very minor impact: the user 
space can't cause the dir to be blocked for arbitrary amount of time, at 
worst slowing everyone else down by a bit if it deliberately creates 
lots of layers (max 16) each with lots of rules (the ruleset evaluation 
is O(log(#rules) * dir_depth)). I didn't measure it, it's just something 
that occurred to me that could be improved by using new hooks that 
aren't called with inode locks held.

Kind regards,
Tingmao

> Thanks,
> Song
> 
> 
> 
> 
>> This looks reasonable.  As long as the semantic does not change it
>> should be good and Landlock tests should pass.  That would also require
>> other users of this hook to make sure it works for them too.  If it is
>> not the case, I guess we could add an alternative hooks with different
>> properties.  However, see the issue and the alternative approach below.
>>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-11 19:28         ` Mickaël Salaün
@ 2025-03-11 23:18           ` Tingmao Wang
  2025-03-12 11:49             ` Mickaël Salaün
  0 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-11 23:18 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen

On 3/11/25 19:28, Mickaël Salaün wrote:
> On Mon, Mar 10, 2025 at 12:39:04AM +0000, Tingmao Wang wrote:
>> On 3/6/25 03:05, Tingmao Wang wrote:
>> [...]
>>> This is also motivated by the potential UX I'm thinking of. For example,
>>> if a newly installed application tries to create ~/.app-name, it will be
>>> much more reassuring and convenient to the user if we can show something
>>> like
>>>
>>>       [program] wants to mkdir ~/.app-name. Allow this and future
>>>       access to the new directory?
>>>
>>> rather than just "[program] wants to mkdir under ~". (The "Allow this
>>> and future access to the new directory" bit is made possible by the
>>> supervisor knowing the name of the file/directory being created, and can
>>> remember them / write them out to a persistent profile etc)
>>
>> Another significant motivation, which I forgot to mention, is to auto-grant
>> access to newly created files/sockets etc under things like /tmp,
>> $XDG_RUNTIME_DIR, or ~/Downloads.
> 
> What do you mean?  What is not currently possible?

It is not currently possible with landlock to say "I will allow this 
application access to create and open new file/folders under this 
directory, change or delete the files it creates, but not touch any 
existing files". Landlock supervisor can make this possible (keeping 
track via its own state to allow future requests on the new file, or 
modifying the domain if we support that), but for that the supervisor 
has to know what file the application tried to create, hence motivating 
sending filename.

(I can see this kind of policy being applied to dirs like /tmp or my 
Downloads folder. $XDG_RUNTIME_DIR is also a sensible place for this 
behaviour due to the common pattern of creating a lock/pid file/socket 
there, although on second thought a GUI sandbox probably will want to 
create a private copy of that dir anyway for each app, to do dbus 
filtering etc)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-11 22:03             ` Tingmao Wang
@ 2025-03-11 23:23               ` Song Liu
  0 siblings, 0 replies; 47+ messages in thread
From: Song Liu @ 2025-03-11 23:23 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, Christian Brauner, Amir Goldstein,
	Günther Noack, Jan Kara, linux-security-module,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen, Kees Cook,
	Jeff Xu, Mikhail Ivanov, Francis Laniel, Matthieu Buffet,
	Paul Moore, Kentaro Takeda, Tetsuo Handa, John Johansen

On Tue, Mar 11, 2025 at 3:03 PM Tingmao Wang <m@maowtm.org> wrote:
[...]
> >
> > I think there is a fundamental difference between LSM hooks and fsnotify,
> > so putting fsnotify behind some LSM hooks might be weird. Specifically,
> > LSM hooks are always global. If a LSM attaches to a hook, say
> > security_file_open, it will see all the file open calls in the system. On the
> > other hand, each fsnotify rule only applies to a group, so that one fanotify
> > handler doesn't touch files watched by another fanotify handler. Given this
> > difference, I am not sure how fsnotify LSM hooks should look like.
> >
> > Does this make sense?
>
> To clarify, I wasn't suggesting that we put one hook _behind_ another
> ("behind" in the sense of one calling the other), just that the place
> that calls the new fsnotify_name_perm/fsnotify_rename_perm hook (in
> Amir's WIP branch) could also be made to call some new LSM hooks in
> addition to fsnotify (i.e. security_pathname_create/delete/rename).
>
> My understanding of the current code is that VFS calls security_... and
> fsnotify_... unconditionally, and the fsnotify_... functions figure out
> who needs to be notified.

Yes, VFS calls security_* and fsnotify_* unconditionally. In this sense,
fsnotify can be implemented as a LSM. But fsnotify also supports some
non-security use cases. So it will be weird to implement it as a LSM.

Thanks,
Song

[...]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-11  0:42       ` Tingmao Wang
  2025-03-11 19:28         ` Mickaël Salaün
@ 2025-03-12 10:58         ` Jan Kara
  2025-03-12 12:26         ` Amir Goldstein
  2 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2025-03-12 10:58 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Amir Goldstein, Mickaël Salaün, Günther Noack,
	Jan Kara, linux-security-module, Matthew Bobrowski, linux-fsdevel,
	Tycho Andersen, Christian Brauner, Kees Cook, Jeff Xu,
	Mikhail Ivanov, Francis Laniel, Matthieu Buffet, Song Liu

On Tue 11-03-25 00:42:05, Tingmao Wang wrote:
> On 3/6/25 17:07, Amir Goldstein wrote:
> [...]
> > 
> > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > your patches
> > and I have only a vague familiarity with landlock, so I cannot yet form an
> > opinion whether this is a good idea, but I wanted to give you a few more
> > data points about fanotify that seem relevant.
> > 
> > 1. There is already some intersection of fanotify and audit lsm via the
> > fanotify_response_info_audit_rule extension for permission
> > events, so it's kind of a precedent of using fanotify to aid an lsm
> > 
> > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> >    "fanotify: introduce directory entry pre-modify permission events"
> > I do have an intention to add create/delete/rename permission events.
> > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > far from the security_path_ lsm hooks, but not exactly in the same place
> > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > listener to write to filesystem from event context.
> > There are different semantics than just ALLOW/DENY that you need,
> > therefore, only if we move the security_path_ hooks outside the
> > vfs locks, our use cases could use the same hooks
> 
> Hi Amir,
> 
> (this is a slightly long message - feel free to respond at your convenience,
> thank you in advance!)
> 
> Thanks a lot for mentioning this branch, and for the explanation! I've had a
> look and realized that the changes you have there will be very useful for
> this patch, and in fact, I've already tried a worse attempt of this (not
> included in this patch series yet) to create some security_pathname_ hooks
> that takes the parent struct path + last name as char*, that will be called
> before locking the parent.  (We can't have an unprivileged supervisor cause
> a directory to be locked indefinitely, which will also block users outside
> of the landlock domain)

Well, but if you call the hook before locking the parent isn't your hook
prone to TOCTOU races? I mean you call the hook do your stuff in it and
then before the parent is locked, the whole directory hierarchy can get
reorganized (from another process) without you knowing... So I'm not sure
which guarantees you can provide for such hooks.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-11 23:18           ` Tingmao Wang
@ 2025-03-12 11:49             ` Mickaël Salaün
  2025-03-26  0:02               ` Tingmao Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-12 11:49 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Nicolas Bouchinet

On Tue, Mar 11, 2025 at 11:18:49PM +0000, Tingmao Wang wrote:
> On 3/11/25 19:28, Mickaël Salaün wrote:
> > On Mon, Mar 10, 2025 at 12:39:04AM +0000, Tingmao Wang wrote:
> > > On 3/6/25 03:05, Tingmao Wang wrote:
> > > [...]
> > > > This is also motivated by the potential UX I'm thinking of. For example,
> > > > if a newly installed application tries to create ~/.app-name, it will be
> > > > much more reassuring and convenient to the user if we can show something
> > > > like
> > > > 
> > > >       [program] wants to mkdir ~/.app-name. Allow this and future
> > > >       access to the new directory?
> > > > 
> > > > rather than just "[program] wants to mkdir under ~". (The "Allow this
> > > > and future access to the new directory" bit is made possible by the
> > > > supervisor knowing the name of the file/directory being created, and can
> > > > remember them / write them out to a persistent profile etc)
> > > 
> > > Another significant motivation, which I forgot to mention, is to auto-grant
> > > access to newly created files/sockets etc under things like /tmp,
> > > $XDG_RUNTIME_DIR, or ~/Downloads.
> > 
> > What do you mean?  What is not currently possible?
> 
> It is not currently possible with landlock to say "I will allow this
> application access to create and open new file/folders under this directory,
> change or delete the files it creates, but not touch any existing files".
> Landlock supervisor can make this possible (keeping track via its own state
> to allow future requests on the new file, or modifying the domain if we
> support that), but for that the supervisor has to know what file the
> application tried to create, hence motivating sending filename.

This capability would be at least inconsistent, and dangerous at worse,
because of policy inconsistencies over time.  A sandbox policy should be
seen over several invocations of the same sandbox.  See related deny
listing issues: https://github.com/landlock-lsm/linux/issues/28

Let's say a first instance of the sandbox can create files and access
them, but not other existing files in the same directory.  A second
instance of this sandbox would not be able to access the files the same
application created, so it will not be able to clean them if required.
That could be OK in the case of the ~/Downloads directory but I think it
would be weird for users to not be able to open their previous
downloaded files from the browser, whereas it was allowed before.

For such use case, if we want to avoid new browser instances to access
old downloaded files, I'd recommand to create a new download directory
per browser/sandbox launch.

> 
> (I can see this kind of policy being applied to dirs like /tmp or my
> Downloads folder. $XDG_RUNTIME_DIR is also a sensible place for this
> behaviour due to the common pattern of creating a lock/pid file/socket
> there, although on second thought a GUI sandbox probably will want to create
> a private copy of that dir anyway for each app, to do dbus filtering etc)

An $XDG_RUNTIME_DIR per sandbox looks reasonable, but in practice we
also need secure proxies/portals to still share some user's resources.
This part should be implemented in user space because the kernel doesn't
know about this semantic (e.g. DBus requests).

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-11 20:58           ` Song Liu
  2025-03-11 22:03             ` Tingmao Wang
@ 2025-03-12 11:50             ` Mickaël Salaün
  1 sibling, 0 replies; 47+ messages in thread
From: Mickaël Salaün @ 2025-03-12 11:50 UTC (permalink / raw)
  To: Song Liu
  Cc: Tingmao Wang, Christian Brauner, Amir Goldstein,
	Günther Noack, Jan Kara, linux-security-module,
	Matthew Bobrowski, linux-fsdevel, Tycho Andersen, Kees Cook,
	Jeff Xu, Mikhail Ivanov, Francis Laniel, Matthieu Buffet,
	Paul Moore, Kentaro Takeda, Tetsuo Handa, John Johansen

On Tue, Mar 11, 2025 at 01:58:57PM -0700, Song Liu wrote:
> On Tue, Mar 11, 2025 at 12:28 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Tue, Mar 11, 2025 at 12:42:05AM +0000, Tingmao Wang wrote:
> > > On 3/6/25 17:07, Amir Goldstein wrote:
> > > [...]

> > > --
> > >
> > > For Mickaël,
> > >
> > > Would you be on board with changing Landlock to use the new hooks as
> > > mentioned above?  My thinking is that it shouldn't make any difference in
> > > terms of security - Landlock permissions for e.g. creating/deleting files
> > > are based on the parent, and in fact except for link and rename, the
> > > hook_path_ functions in Landlock don't even use the dentry argument.  If
> > > you're happy with the general direction of this, I can investigate further
> > > and test it out etc.  This change might also reduce the impact of Landlock
> > > on non-landlocked processes, if we avoid holding exclusive inode lock while
> > > evaluating rules / traversing paths...? (Just a thought, not measured)
> 
> I think the filter for process/thread is usually faster than the filter for
> file/path/subtree? Therefore, it is better for landlock to check the filter for
> process/thread first. Did I miss/misunderstand something?

The main reason is because only sandboxed processes should be impacted
by Landlock.  Similarly, only the security policies restricting a
process impact this process.  Using 16 layers would only impact the
process that sandboxed itself (and BTW the impact of the number of
layers would be negligible).  There is not really process filters, only
pointers set or not in tasks' credentials.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests
  2025-03-11  0:42       ` Tingmao Wang
  2025-03-11 19:28         ` Mickaël Salaün
  2025-03-12 10:58         ` Jan Kara
@ 2025-03-12 12:26         ` Amir Goldstein
  2 siblings, 0 replies; 47+ messages in thread
From: Amir Goldstein @ 2025-03-12 12:26 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, Günther Noack, Jan Kara,
	linux-security-module, Matthew Bobrowski, linux-fsdevel,
	Tycho Andersen, Christian Brauner, Kees Cook, Jeff Xu,
	Mikhail Ivanov, Francis Laniel, Matthieu Buffet, Song Liu

On Tue, Mar 11, 2025 at 1:42 AM Tingmao Wang <m@maowtm.org> wrote:
>
> On 3/6/25 17:07, Amir Goldstein wrote:
> [...]
> >
> > w.r.t sharing infrastructure with fanotify, I only looked briefly at
> > your patches
> > and I have only a vague familiarity with landlock, so I cannot yet form an
> > opinion whether this is a good idea, but I wanted to give you a few more
> > data points about fanotify that seem relevant.
> >
> > 1. There is already some intersection of fanotify and audit lsm via the
> > fanotify_response_info_audit_rule extension for permission
> > events, so it's kind of a precedent of using fanotify to aid an lsm
> >
> > 2. See this fan_pre_modify-wip branch [1] and specifically commit
> >    "fanotify: introduce directory entry pre-modify permission events"
> > I do have an intention to add create/delete/rename permission events.
> > Note that the new fsnotify hooks are added in to do_ vfs helpers, not very
> > far from the security_path_ lsm hooks, but not exactly in the same place
> > because we want to fsnotify hooks to be before taking vfs locks, to allow
> > listener to write to filesystem from event context.
> > There are different semantics than just ALLOW/DENY that you need,
> > therefore, only if we move the security_path_ hooks outside the
> > vfs locks, our use cases could use the same hooks
>
> Hi Amir,
>
> (this is a slightly long message - feel free to respond at your
> convenience, thank you in advance!)
>
> Thanks a lot for mentioning this branch, and for the explanation! I've
> had a look and realized that the changes you have there will be very
> useful for this patch, and in fact, I've already tried a worse attempt
> of this (not included in this patch series yet) to create some
> security_pathname_ hooks that takes the parent struct path + last name
> as char*, that will be called before locking the parent.  (We can't have
> an unprivileged supervisor cause a directory to be locked indefinitely,
> which will also block users outside of the landlock domain)
>
> I'm not sure if we can move security_path tho, because it takes the
> dentry of the child as an argument, and (I think at least for create /
> mknod / link) that dentry is only created after locking.  Hence the
> proposal for separate security_pathname_ hooks.  A search shows that
> currently AppArmor and TOMOYO (plus Landlock) uses the security_path_
> hooks that would need changing, if we move it (and we will have to
> understand if the move is ok to do for the other two LSMs...)
>
> However, I think it would still make a lot of sense to align with
> fsnotify here, as you have already made the changes that I would need to
> do anyway should I implement the proposed new hooks.  I think a sensible
> thing might be to have the extra LSM hooks be called alongside
> fsnotify_(re)name_perm - following the pattern of what currently happens
> with fsnotify_open_perm (i.e. security_file_open called first, then
> fsnotify_open_perm right after).
>
> What's your thought on this? Do you think it would be a good idea to
> have LSM hook equivalents of the fsnotify (re)name perm hooks / fanotify
> pre-modify events?
>

No clear answer but some data points:

The fanotify permission hooks (formerly fsnotify_perm) used to be inside
security_file_{open,permission} so when I started looking at dir modification
permission events I started to try using the security_path_ hooks, but as the
work progressed I found that fsnotify hooks have different
requirements (no locks).

Later, we found out that the existing fsnotify permission hooks have different
needs than the existing security hooks (for pre-content events), so after:

1cda52f1b4611 fsnotify, lsm: Decouple fsnotify from lsm

fanotify is not using any LSM hooks and not dependent on CONFIG_SECURITY.

Mentally, I do find it easy for fsnotify and security hook to be next
to each other,
unless there is a reason to do it otherwise, because from vfs POV they
are mostly
the same, but note that my branch implements the new fsnotify hooks actually as
scopes (for sb_write_srcu) and in some cases as other people have mentioned
on this thread, the security hooks need to be inside the vfs locks,
while the fsnotify
hooks need to be outside of the locks.

> Also, do you have a rough estimate of when you would upstream the
> fa/fsnotify changes? (asking just to get an idea of things, not trying
> to rush or anything :) I suspect this supervise patch would take a while
> anyway)
>

Besides my time to work on this, these patches are waiting for some
other things.

One is that I was waiting with promoting those patches until pre-content patches
got merged and that took longer than expected and even now there may
need to be follow ups in the next cycle.

Another thing is that these patches rely on the sb_write_srcu design concept
which is pretty intrusive to vfs, so I still need to sell this to vfs people.
I am going to make another shot of an elevator pitch at LSFMM in two weeks,
If we get past this design hurdle, the rest of the work will depend on
how much time I can spend on it.

I do *want* to  make the patches in time for the 2025 LTS kernel, but it may
not be a realistic goal.

One thing that helped a lot with pushing pre-content events is that Meta
already had a production use case for it.

I do not know of anyone else that requested the pre-directory-modify
hooks (besides myself), so that may make the sale a bit harder.
If there is someone out there that does need the pre-directory-modify
hooks now would be a good time to speak up.

> If you think the general idea is right, here are some further questions
> I have:
>
> I think going by this approach any error return from
> security_pathname_mknod (or in fact, fsnotify_name_perm) when called in
> the open O_CREAT code path would end up becoming a -EROFS.  Can we turn
> the bool got_write in open_last_lookups into an int to store any error
> from mnt_want_write_parent, and return it if lookup_open returns -EROFS?

IIUC you mean like this:

               err = mnt_want_write_parent(&nd->path, MAY_CREATE,
                                                  &res, &idx);
               if (err && err != -EROFS)
                       return err;
               got_write = !err;
               /*
                * do _not_ fail yet - ....

Yes, I think that is better, because the logic in the comment only
applies to EROFS.

>   This is so that the user space still gets an -EACCESS on create
> denials by landlock (and in fact, if fanotify denies a create maybe we
> want it to return the correct errno also?). Maybe there is a better way,
> this is just my first though...
>
> I also noticed that you don't currently have fsnotify hook calls for
> link (although it does end up invoking the name_perm hook on the dest
> with MAY_CREATE).  I want to propose also changing do_linkat to (pass
> the right flags to filename_create_srcu -> mnt_want_write_parent to)
> call the security_pathname_link hook (instead of the LSM hook it would
> normally call for a creation event in this proposal) that is basically
> like security_path_link, except passing the destination as a dir/name
> pair, and without holding vfs lock (still passing in the dentry of the
> source itself), to enable landlock to handle link requests separately.
> Do you think this is alright?  (Maybe the code would be a bit convoluted
> if written verbatim from this logic, maybe there is a better way, but
> the general idea is hopefully right)

I am not sure I understand your question.
fsnotify does not need to know this is a LINK and not CREATE.
I do not know what the requirements of other LSMs for those hooks,
so hard to say if it is ok to move those hooks but my guess is not ok.

>
> btw, side question, I see that you added srcu read sections around the
> events - I'm not familiar with rcu/locking usage in vfs but is this for
> preventing e.g. changing the mount in some way (but still allowing
> access / changes to the directory)?
>

No. this is meant to accommodate fsnotify_wait_handle_events()
(see last patch) - wait for in-flight modifications to complete without blocking
new modifications. That's the concept that I need to sell to vfs people.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 5/9] Define user structure for events and responses.
  2025-03-12 11:49             ` Mickaël Salaün
@ 2025-03-26  0:02               ` Tingmao Wang
  0 siblings, 0 replies; 47+ messages in thread
From: Tingmao Wang @ 2025-03-26  0:02 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel, Tycho Andersen,
	Nicolas Bouchinet

On 3/12/25 11:49, Mickaël Salaün wrote:
> On Tue, Mar 11, 2025 at 11:18:49PM +0000, Tingmao Wang wrote:
>> On 3/11/25 19:28, Mickaël Salaün wrote:
>>> On Mon, Mar 10, 2025 at 12:39:04AM +0000, Tingmao Wang wrote:
>>>> On 3/6/25 03:05, Tingmao Wang wrote:
>>>> [...]
>>>>> This is also motivated by the potential UX I'm thinking of. For example,
>>>>> if a newly installed application tries to create ~/.app-name, it will be
>>>>> much more reassuring and convenient to the user if we can show something
>>>>> like
>>>>>
>>>>>        [program] wants to mkdir ~/.app-name. Allow this and future
>>>>>        access to the new directory?
>>>>>
>>>>> rather than just "[program] wants to mkdir under ~". (The "Allow this
>>>>> and future access to the new directory" bit is made possible by the
>>>>> supervisor knowing the name of the file/directory being created, and can
>>>>> remember them / write them out to a persistent profile etc)
>>>>
>>>> Another significant motivation, which I forgot to mention, is to auto-grant
>>>> access to newly created files/sockets etc under things like /tmp,
>>>> $XDG_RUNTIME_DIR, or ~/Downloads.
>>>
>>> What do you mean?  What is not currently possible?
>>
>> It is not currently possible with landlock to say "I will allow this
>> application access to create and open new file/folders under this directory,
>> change or delete the files it creates, but not touch any existing files".
>> Landlock supervisor can make this possible (keeping track via its own state
>> to allow future requests on the new file, or modifying the domain if we
>> support that), but for that the supervisor has to know what file the
>> application tried to create, hence motivating sending filename.
> 
> This capability would be at least inconsistent, and dangerous at worse,
> because of policy inconsistencies over time.  A sandbox policy should be
> seen over several invocations of the same sandbox.  See related deny
> listing issues: https://github.com/landlock-lsm/linux/issues/28
> 
> Let's say a first instance of the sandbox can create files and access
> them, but not other existing files in the same directory.  A second
> instance of this sandbox would not be able to access the files the same
> application created, so it will not be able to clean them if required.
> That could be OK in the case of the ~/Downloads directory but I think it
> would be weird for users to not be able to open their previous
> downloaded files from the browser, whereas it was allowed before.
> 
> For such use case, if we want to avoid new browser instances to access
> old downloaded files, I'd recommand to create a new download directory
> per browser/sandbox launch.
> 

I had some more thoughts on this - In terms of inconsistency / security 
implications of such a supervisor behaviour, I think I can identify two 
aspects:

First is policy inconsistency over different instances / restarts (like 
the example you mentioned about not being able to open previously 
downloaded files).  I think in this case, this is fine and would not be 
dangerous, because it will only result in extra permission requests 
(potentially the user having to allow the access again, or maybe the 
sandboxer can remember it from last time and auto-allow it internally). 
(whereas an inconsistent deny rule is more problematic because it opens 
up the access on the next restart / for other instances, if done wrong)

The second problem is that if the supervisor wants to automatically 
permit further access to the newly created files, it can only do so by 
remembering and comparing file names, since the new inode doesn't exist 
yet*, and so even with mutable domains there is nothing to attach new 
rules to.  This means that there is a potential for files/dirs to be 
moved/created/linked behind its back onto the destination by someone 
outside the sandbox, and this may result in the supervisor 
unintentionally allowing access to files it doesn't want to? (like, if 
it approves the request based solely on the belief that the file is new)

*: Assuming we don't want to lock the parent dir forever until the 
supervisor replies.

While this does seem like a problem, I'm not sure how practical it would 
be to exploit, since any further action by the sandboxed app itself on 
the destination can/would also be blocked by landlock, and in some sense 
we're already dead if the sandboxed app can somehow convince something 
outside of the sandbox to create arbitrary links or move arbitrary files 
to a destination path that would appear to belong legitimately to the 
malicious app.  But this does raises more questions than I initially 
thought, and shows how an overly creative supervisor may shoot itself in 
the foot -- when filenames are involved in permission decisions the 
semantics starts becoming a bit fuzzy, and is different from current 
landlock which is entirely inode-based.

With that said, I would still really like to make the mentioned UX 
possible tho - allowing an app to create a file/dir and any further 
access to it as well _feels very intuitive_, and is especially 
convenient for cases where the first launch of an app is sandboxed.  But 
I do recognize that this capability is less important for self-sandbox 
scenarios (since the supervisor can pre-create all the scaffolding 
directories it knows the app would need).

I have some thoughts, none of which are perfect, and not doing any of 
them is also an option (i.e. the supervisor just have to decide whether 
to give permission to create files of arbitrary names or not, and can't 
find out about any new files/dirs created (unless with some other Linux 
mechanism)):

1. Maybe there can be a mechanism for the supervisor to be invoked 
post-creation (passing in a fd for the new file directly), then it can 
prompt the user and either allow and optionally add the new inode to the 
mutable domain, or it can "undo" the operation by deleting the new 
file/dir then reject the "request".  I recognize that this is a bit 
weird and is also only applicable to supervise mode, but it might be 
acceptable since merely creating an empty file/dir is relatively 
harmless (ignoring symlinks and device nodes for the moment).

2. The supervisor can create the file/dir/device-node/symlink on behalf 
of the sandboxed app, if we can pass all the relevant arguments to it in 
the request.  Then there needs to be a mechanism for it to tell the 
kernel to return a custom error code to the invoking program.
(seccomp-unotify deja vu)

3. We find a way to implement "allow once" which will only allow this 
particular create request, with this name.  At least this way the 
supervisor can implement the above mentioned feature, with the caveat 
mentioned above.

(For other's reference, I had a discussion with Mickaël and it looks 
like we will want to have mutable domains and base the implementation of 
landlock supervise off that, returning a -ERESTARTNOINTR from the hook 
when access is allowed.  I will write up the discussion tomorrow / later)

>>
>> (I can see this kind of policy being applied to dirs like /tmp or my
>> Downloads folder. $XDG_RUNTIME_DIR is also a sensible place for this
>> behaviour due to the common pattern of creating a lock/pid file/socket
>> there, although on second thought a GUI sandbox probably will want to create
>> a private copy of that dir anyway for each app, to do dbus filtering etc)
> 
> An $XDG_RUNTIME_DIR per sandbox looks reasonable, but in practice we
> also need secure proxies/portals to still share some user's resources.
> This part should be implemented in user space because the kernel doesn't
> know about this semantic (e.g. DBus requests).

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd
  2025-03-11 19:28       ` Mickaël Salaün
@ 2025-03-26  0:06         ` Tingmao Wang
  2025-04-11 10:55           ` Mickaël Salaün
  0 siblings, 1 reply; 47+ messages in thread
From: Tingmao Wang @ 2025-03-26  0:06 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, linux-fsdevel, Jann Horn, Andy Lutomirski

On 3/11/25 19:28, Mickaël Salaün wrote:
> On Mon, Mar 10, 2025 at 12:41:28AM +0000, Tingmao Wang wrote:
>> On 3/5/25 16:09, Mickaël Salaün wrote:
>>> On Tue, Mar 04, 2025 at 01:13:00AM +0000, Tingmao Wang wrote:
>>>> We allow the user to pass in an additional flag to landlock_create_ruleset
>>>> which will make the ruleset operate in "supervise" mode, with a supervisor
>>>> attached. We create additional space in the landlock_ruleset_attr
>>>> structure to pass the newly created supervisor fd back to user-space.
>>>>
>>>> The intention, while not implemented yet, is that the user-space will read
>>>> events from this fd and write responses back to it.
>>>>
>>>> Note: need to investigate if fd clone on fork() is handled correctly, but
>>>> should be fine if it shares the struct file. We might also want to let the
>>>> user customize the flags on this fd, so that they can request no
>>>> O_CLOEXEC.
>>>>
>>>> NOTE: despite this patch having a new uapi, I'm still very open to e.g.
>>>> re-using fanotify stuff instead (if that makes sense in the end). This is
>>>> just a PoC.
>>>
>>> The main security risk of this feature is for this FD to leak and be
>>> used by a sandboxed process to bypass all its restrictions.  This should
>>> be highlighted in the UAPI documentation.

In particular, if for some reason the supervisor does a fork without 
exec, it must close this fd in the "about-to-be-untrusted" child.

(I wonder if it would be worth enforcing that the child calling 
landlock_restrict_self must not have any open supervisor fd that can 
supervise its own domain (returning an error if it does), but that can 
be difficult to implement so nevermind)

>>>
>>>>
>>>> Signed-off-by: Tingmao Wang <m@maowtm.org>
>>>> ---
>>>>    include/uapi/linux/landlock.h |  10 ++++
>>>>    security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
>>>>    2 files changed, 98 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
>>>> index e1d2c27533b4..7bc1eb4859fb 100644
>>>> --- a/include/uapi/linux/landlock.h
>>>> +++ b/include/uapi/linux/landlock.h
>>>> @@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
>>>>    	 * resources (e.g. IPCs).
>>>>    	 */
>>>>    	__u64 scoped;
>>>> +	/**
>>>> +	 * @supervisor_fd: Placeholder to store the supervisor file
>>>> +	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
>>>> +	 */
>>>> +	__s32 supervisor_fd;
>>>
>>> This interface would require the ruleset_attr becoming updatable by the
>>> kernel, which might be OK in theory but requires current syscall wrapper
>>> signature update, see sandboxer.c change.  It also creates a FD which
>>> might not be useful (e.g. if an error occurs before the actual
>>> enforcement).
>>>
>>> I see a few alternatives.  We could just use/extend the ruleset FD
>>> instead of creating a new one, but because leaking current rulesets is
>>> not currently a security risk, we should be careful to not change that.
>>>
>>> Another approach, similar to seccomp unotify, is to get a
>>> "[landlock-domain]" FD returned by the landlock_restrict_self(2) when a
>>> new LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag is set.  This FD would be a
>>> reference to the newly created domain, which is more specific than the
>>> ruleset used to created this domain (and that can be used to create
>>> other domains).  This domain FD could be used for introspection (i.e.
>>> to get read-only properties such as domain ID), but being able to
>>> directly supervise the referenced domain only with this FD would be a
>>> risk that we should limit.
>>>
>>> What we can do is to implement an IOCTL command for such domain FD that
>>> would return a supervisor FD (if the LANDLOCK_RESTRICT_SELF_SUPERVISED
>>> flag was also set).  The key point is to check (one time) that the
>>> process calling this IOCTL is not restricted by the related domain (see
>>> the scope helpers).
>>
>> Is LANDLOCK_RESTRICT_SELF_DOMAIN_FD part of your (upcoming?) introspection
>> patch? (thinking about when will someone pass that only and not
>> LANDLOCK_RESTRICT_SELF_SUPERVISED, or vice versa)
> 
> I don't plan to work on such LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag for
> now, but the introspection feature(s) would help for this supervisor
> feature.
> 
>>
>> By the way, is it alright to conceptually relate the supervisor to a domain?
>> It really would be a layer inside a domain - the domain could have earlier
>> or later layers which can deny access without supervision, or the supervisor
>> for earlier layers can deny access first. Therefore having supervisor fd
>> coming out of the ruleset felt sensible to me at first.
> 
> Good question.  I've been using the name "domain" to refer to the set of
> restrictions enforced on a set of processes, but these restrictions are
> composed of inherited ones plus the latest layer.  In this case, a
> domain FD should refer to all the restrictions, but the supervisor FD
> should indeed only refer to the latest layer of a domain (created by
> landlock_restrict_self).
> 
>>
>> Also, isn't "check that process calling this IOCTL is not restricted by the
>> related domain" and the fact that the IOCTL is on the domain fd, which is a
>> return value of landlock_restrict_self, kind of contradictory?  I mean it is
>> a sensible check, but that kind of highlights that this interface is
>> slightly awkward - basically all callers are forced to have a setup where
>> the child sends the domain fd back to the parent.
> 
> I agree that its confusing.  I'd like to avoid the ruleset to gain any
> control on domains after they are created.
> 
> Another approach would be to create a supervisor FD with the
> landlock_create_ruleset() syscall, and pass this FD to the ruleset,
> potentially with landlock_add_rule() calls to only request this
> supervisor when matching specific rules (that could potentially be
> catch-all rules)?

Maybe passing in a fd per landlock_add_rule calls, and thus potentially 
allowing different supervisor fd tied to different rules in the same 
ruleset, is a bit overkill (as now each rule needs to store a supervisor 
pointer?) and I don't really see the use of it.  I think it would be 
better to just pass it once in the landlock_ruleset_attr, which gets 
around the signature having const for the ruleset_attr problem. (I'm 
also open to the ioctl on domain fd idea, but I'm slightly wary of 
making this more complicated then necessary for the user space, as it 
now has to set up a socket (?) and pass a fd with scm_rights (?))

The other aspect of this is whether we want to have the supervisor mark 
specific rules as supervised, rather than having all denied access (from 
this layer) result in a supervisor invocation.  I also don't think this 
is necessary, as denials are supposed to be "abnormal" in some sense, 
and I would imagine most supervisors would want to find out about these 
(at least to print/show a warning of some sort, if it knows that the 
requested access is bad).  If a supervisor really wants to have the 
kernel just "silently" (from its perspective, but maybe there would be 
audit logs) deny any access outside of some known rules, it can also 
create a nested, unsupervised landlock domain that has the right effect. 
Avoiding having some sort of tri-state rules would simplify 
implementation, I imagine.

> 
> Overall, my main concern about this patch series is that the supervisor
> could get a lot of requests, which will make the sandbox unusable
> because always blocked by some thread/process.  This latest approach and
> the ability to update the domain somehow could make it workable.
> 
>>
>>>
>>> Relying on IOCTL commands (for all these FD types) instead of read/write
>>> operations should also limit the risk of these FDs being misused through
>>> a confused deputy attack (because such IOCTL command would convey an
>>> explicit intent):
>>> https://docs.kernel.org/security/credentials.html#open-file-credentials
>>> https://lore.kernel.org/all/CAG48ez0HW-nScxn4G5p8UHtYy=T435ZkF3Tb1ARTyyijt_cNEg@mail.gmail.com/
>>> We should get inspiration from seccomp unotify for this too:
>>> https://lore.kernel.org/all/20181209182414.30862-1-tycho@tycho.ws/
>>
>> I think in the seccomp unotify case the problem arises from what the setuid
>> binary thinks is just normal data getting interpreted by the kernel as a fd,
>> and thus having different effect if the attacker writes it vs. if the suid
>> app writes it.  In our case I *think* we should be alright, but maybe we
>> should go with ioctl anyway...
> 
> I don't see why Jann's attack scenario could work for this Landlock
> supervisor too.  The main point that it the read/write interfaces are
> used by a lot of different FDs, and we may not need them.
> 
>> However, how does using netlink messages (a
>> suggestion from a different thread) affect this (if we do end up using it)?
>> Would we have to do netlink msgs via IOCTL?
> 
> Because all requests should be synchronous, one IOCTL could be used to
> both acknowledge a previous event (or just start) and read the next one.
> 
> I was thinking about an IOCTL with these arguments:
> 1. supervisor FD
> 2. (extensible) IOCTL command (see PIDFD_GET_INFO for instance)
> 3. pointer to a fixed-size control structure
> 
> The fixed-size control structure could contain:
> - handled access rights, used to only get event related to specific
>    access.
> - flags, to specify which kind of FD we would like to get (e.g. only
>    directory FD, pidfd...)
> - fd[6]: an array of received file descriptors.
> - pointer to a variable-size data buffer that would contain all the
>    records (e.g. source dir FD, source file name, destination dir FD,
>    destination file name) for one event, potentially formatted with NLA.
> - the size of this buffer
> 
> I'm not sure about the content of this buffer and the NLA format, and
> the related API might not be usable without netlink sockets though.
> Taking inspiration from the fanotify message format is another option.
> 
>>
>>
>>>> +	/**
>>>> +	 * @pad: Unused, must be zero.
>>>> +	 */
>>>> +	__u32 pad;
>>>
>>> In this case we should pack the struct instead.
>>>
>>>>    };
>>>>    /*
>>>> @@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
>>>>     */
>>>>    /* clang-format off */
>>>>    #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
>>>> +#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
>>>>    /* clang-format on */
>>>>    /**
>>>
>>> [...]
>>
>>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd
  2025-03-26  0:06         ` Tingmao Wang
@ 2025-04-11 10:55           ` Mickaël Salaün
  0 siblings, 0 replies; 47+ messages in thread
From: Mickaël Salaün @ 2025-04-11 10:55 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Günther Noack, Jan Kara, linux-security-module,
	Amir Goldstein, linux-fsdevel, Jann Horn, Andy Lutomirski,
	Nicolas Bouchinet

On Wed, Mar 26, 2025 at 12:06:11AM +0000, Tingmao Wang wrote:
> On 3/11/25 19:28, Mickaël Salaün wrote:
> > On Mon, Mar 10, 2025 at 12:41:28AM +0000, Tingmao Wang wrote:
> > > On 3/5/25 16:09, Mickaël Salaün wrote:
> > > > On Tue, Mar 04, 2025 at 01:13:00AM +0000, Tingmao Wang wrote:
> > > > > We allow the user to pass in an additional flag to landlock_create_ruleset
> > > > > which will make the ruleset operate in "supervise" mode, with a supervisor
> > > > > attached. We create additional space in the landlock_ruleset_attr
> > > > > structure to pass the newly created supervisor fd back to user-space.
> > > > > 
> > > > > The intention, while not implemented yet, is that the user-space will read
> > > > > events from this fd and write responses back to it.
> > > > > 
> > > > > Note: need to investigate if fd clone on fork() is handled correctly, but
> > > > > should be fine if it shares the struct file. We might also want to let the
> > > > > user customize the flags on this fd, so that they can request no
> > > > > O_CLOEXEC.
> > > > > 
> > > > > NOTE: despite this patch having a new uapi, I'm still very open to e.g.
> > > > > re-using fanotify stuff instead (if that makes sense in the end). This is
> > > > > just a PoC.
> > > > 
> > > > The main security risk of this feature is for this FD to leak and be
> > > > used by a sandboxed process to bypass all its restrictions.  This should
> > > > be highlighted in the UAPI documentation.
> 
> In particular, if for some reason the supervisor does a fork without exec,
> it must close this fd in the "about-to-be-untrusted" child.

Yes...

> 
> (I wonder if it would be worth enforcing that the child calling
> landlock_restrict_self must not have any open supervisor fd that can
> supervise its own domain (returning an error if it does), but that can be
> difficult to implement so nevermind)

That would mean that a call can fail according to the caller's context
(e.g. FDs), which is not good for reproducibility (i.e. not idempotent).

Being able to tie a supervisor FD to a set of rulesets and then to a set
of domains is interesting too.  We might want to also add a "cookie"
value when creating a ruleset for the supervisor to identify which
ruleset it received a request from.

I was also thinking about pidfd, but they do not refer to a domain but
to a process (which may be sandboxed several times).  I found a better
idea, see below.

> 
> > > > 
> > > > > 
> > > > > Signed-off-by: Tingmao Wang <m@maowtm.org>
> > > > > ---
> > > > >    include/uapi/linux/landlock.h |  10 ++++
> > > > >    security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
> > > > >    2 files changed, 98 insertions(+), 14 deletions(-)
> > > > > 
> > > > > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > > > > index e1d2c27533b4..7bc1eb4859fb 100644
> > > > > --- a/include/uapi/linux/landlock.h
> > > > > +++ b/include/uapi/linux/landlock.h
> > > > > @@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
> > > > >    	 * resources (e.g. IPCs).
> > > > >    	 */
> > > > >    	__u64 scoped;
> > > > > +	/**
> > > > > +	 * @supervisor_fd: Placeholder to store the supervisor file
> > > > > +	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
> > > > > +	 */
> > > > > +	__s32 supervisor_fd;
> > > > 
> > > > This interface would require the ruleset_attr becoming updatable by the
> > > > kernel, which might be OK in theory but requires current syscall wrapper
> > > > signature update, see sandboxer.c change.  It also creates a FD which
> > > > might not be useful (e.g. if an error occurs before the actual
> > > > enforcement).
> > > > 
> > > > I see a few alternatives.  We could just use/extend the ruleset FD
> > > > instead of creating a new one, but because leaking current rulesets is
> > > > not currently a security risk, we should be careful to not change that.
> > > > 
> > > > Another approach, similar to seccomp unotify, is to get a
> > > > "[landlock-domain]" FD returned by the landlock_restrict_self(2) when a
> > > > new LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag is set.  This FD would be a
> > > > reference to the newly created domain, which is more specific than the
> > > > ruleset used to created this domain (and that can be used to create
> > > > other domains).  This domain FD could be used for introspection (i.e.
> > > > to get read-only properties such as domain ID), but being able to
> > > > directly supervise the referenced domain only with this FD would be a
> > > > risk that we should limit.
> > > > 
> > > > What we can do is to implement an IOCTL command for such domain FD that
> > > > would return a supervisor FD (if the LANDLOCK_RESTRICT_SELF_SUPERVISED
> > > > flag was also set).  The key point is to check (one time) that the
> > > > process calling this IOCTL is not restricted by the related domain (see
> > > > the scope helpers).
> > > 
> > > Is LANDLOCK_RESTRICT_SELF_DOMAIN_FD part of your (upcoming?) introspection
> > > patch? (thinking about when will someone pass that only and not
> > > LANDLOCK_RESTRICT_SELF_SUPERVISED, or vice versa)
> > 
> > I don't plan to work on such LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag for
> > now, but the introspection feature(s) would help for this supervisor
> > feature.
> > 
> > > 
> > > By the way, is it alright to conceptually relate the supervisor to a domain?
> > > It really would be a layer inside a domain - the domain could have earlier
> > > or later layers which can deny access without supervision, or the supervisor
> > > for earlier layers can deny access first. Therefore having supervisor fd
> > > coming out of the ruleset felt sensible to me at first.
> > 
> > Good question.  I've been using the name "domain" to refer to the set of
> > restrictions enforced on a set of processes, but these restrictions are
> > composed of inherited ones plus the latest layer.  In this case, a
> > domain FD should refer to all the restrictions, but the supervisor FD
> > should indeed only refer to the latest layer of a domain (created by
> > landlock_restrict_self).
> > 
> > > 
> > > Also, isn't "check that process calling this IOCTL is not restricted by the
> > > related domain" and the fact that the IOCTL is on the domain fd, which is a
> > > return value of landlock_restrict_self, kind of contradictory?  I mean it is
> > > a sensible check, but that kind of highlights that this interface is
> > > slightly awkward - basically all callers are forced to have a setup where
> > > the child sends the domain fd back to the parent.
> > 
> > I agree that its confusing.  I'd like to avoid the ruleset to gain any
> > control on domains after they are created.
> > 
> > Another approach would be to create a supervisor FD with the
> > landlock_create_ruleset() syscall, and pass this FD to the ruleset,
> > potentially with landlock_add_rule() calls to only request this
> > supervisor when matching specific rules (that could potentially be
> > catch-all rules)?
> 
> Maybe passing in a fd per landlock_add_rule calls, and thus potentially
> allowing different supervisor fd tied to different rules in the same
> ruleset, is a bit overkill (as now each rule needs to store a supervisor
> pointer?) and I don't really see the use of it.

I though about this approach too but being able to update the domain
with new rules would be more useful and powerful.

> I think it would be better
> to just pass it once in the landlock_ruleset_attr, which gets around the
> signature having const for the ruleset_attr problem. (I'm also open to the
> ioctl on domain fd idea, but I'm slightly wary of making this more
> complicated then necessary for the user space, as it now has to set up a
> socket (?) and pass a fd with scm_rights (?))

OK, here is another proposal: supervisor rulesets and supervisee FDs.
The idea is to add a new flag to landlock_restrict_self(2) to created a
ruleset marked as "supervisor".  This ruleset could not be passed to
landlock_restrict_self(2), but a dedicated IOCTL would create a
supervisee file descriptor.  This supervisee could be passed to a
landlock_ruleset_attr to created a supervised ruleset.

This approach is interesting because it makes it explicit the access
rights which are handled by the supervisor, which enables us to only
supervise a set of actions and update the supervisor ruleset with
landlock_add_rule(2).

Another interesting property is that because we have at least two file
descriptors for a supervisor, it's easy to create a ruleset supervisor
in process A and then only pass a supervisee FD to process B.  A leaked
supervisee FD could not give more privileges, and it is unlikely that a
supervisor FD is passed to process B because it could not be usable as a
supervisee and should then be detected early in the development cycle.

> 
> The other aspect of this is whether we want to have the supervisor mark
> specific rules as supervised, rather than having all denied access (from
> this layer) result in a supervisor invocation.  I also don't think this is
> necessary, as denials are supposed to be "abnormal" in some sense, and I
> would imagine most supervisors would want to find out about these (at least
> to print/show a warning of some sort, if it knows that the requested access
> is bad).  If a supervisor really wants to have the kernel just "silently"
> (from its perspective, but maybe there would be audit logs) deny any access
> outside of some known rules, it can also create a nested, unsupervised
> landlock domain that has the right effect. Avoiding having some sort of
> tri-state rules would simplify implementation, I imagine.

Because this supervisor use case is mainly about sandboxing programs
which may not be aware of such restrictions, they could legitimately
request a lot of time the same denied actions.  To avoid overloading the
supervisor, we need a way to filter such requests.  But being able to
initially get these request would be useful too, which is why being able
to dynamically update the supervisor ruleset is interesting.

> 
> > 
> > Overall, my main concern about this patch series is that the supervisor
> > could get a lot of requests, which will make the sandbox unusable
> > because always blocked by some thread/process.  This latest approach and
> > the ability to update the domain somehow could make it workable.
> > 
> > > 
> > > > 
> > > > Relying on IOCTL commands (for all these FD types) instead of read/write
> > > > operations should also limit the risk of these FDs being misused through
> > > > a confused deputy attack (because such IOCTL command would convey an
> > > > explicit intent):
> > > > https://docs.kernel.org/security/credentials.html#open-file-credentials
> > > > https://lore.kernel.org/all/CAG48ez0HW-nScxn4G5p8UHtYy=T435ZkF3Tb1ARTyyijt_cNEg@mail.gmail.com/
> > > > We should get inspiration from seccomp unotify for this too:
> > > > https://lore.kernel.org/all/20181209182414.30862-1-tycho@tycho.ws/
> > > 
> > > I think in the seccomp unotify case the problem arises from what the setuid
> > > binary thinks is just normal data getting interpreted by the kernel as a fd,
> > > and thus having different effect if the attacker writes it vs. if the suid
> > > app writes it.  In our case I *think* we should be alright, but maybe we
> > > should go with ioctl anyway...
> > 
> > I don't see why Jann's attack scenario could work for this Landlock
> > supervisor too.  The main point that it the read/write interfaces are
> > used by a lot of different FDs, and we may not need them.
> > 
> > > However, how does using netlink messages (a
> > > suggestion from a different thread) affect this (if we do end up using it)?
> > > Would we have to do netlink msgs via IOCTL?
> > 
> > Because all requests should be synchronous, one IOCTL could be used to
> > both acknowledge a previous event (or just start) and read the next one.
> > 
> > I was thinking about an IOCTL with these arguments:
> > 1. supervisor FD
> > 2. (extensible) IOCTL command (see PIDFD_GET_INFO for instance)
> > 3. pointer to a fixed-size control structure
> > 
> > The fixed-size control structure could contain:
> > - handled access rights, used to only get event related to specific
> >    access.
> > - flags, to specify which kind of FD we would like to get (e.g. only
> >    directory FD, pidfd...)
> > - fd[6]: an array of received file descriptors.
> > - pointer to a variable-size data buffer that would contain all the
> >    records (e.g. source dir FD, source file name, destination dir FD,
> >    destination file name) for one event, potentially formatted with NLA.
> > - the size of this buffer
> > 
> > I'm not sure about the content of this buffer and the NLA format, and
> > the related API might not be usable without netlink sockets though.
> > Taking inspiration from the fanotify message format is another option.
> > 
> > > 
> > > 
> > > > > +	/**
> > > > > +	 * @pad: Unused, must be zero.
> > > > > +	 */
> > > > > +	__u32 pad;
> > > > 
> > > > In this case we should pack the struct instead.
> > > > 
> > > > >    };
> > > > >    /*
> > > > > @@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
> > > > >     */
> > > > >    /* clang-format off */
> > > > >    #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
> > > > > +#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
> > > > >    /* clang-format on */
> > > > >    /**
> > > > 
> > > > [...]
> > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2025-04-11 11:15 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
2025-03-04  1:12 ` [RFC PATCH 1/9] Define the supervisor and event structure Tingmao Wang
2025-03-04  1:12 ` [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules Tingmao Wang
2025-03-04 19:49   ` Mickaël Salaün
2025-03-06  2:58     ` Tingmao Wang
2025-03-08 18:57       ` Mickaël Salaün
2025-03-10  0:38         ` Tingmao Wang
2025-03-04  1:12 ` [RFC PATCH 3/9] Adds a supervisor reference in the per-layer information Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 4/9] User-space API for creating a supervisor-fd Tingmao Wang
2025-03-05 16:09   ` Mickaël Salaün
2025-03-10  0:41     ` Tingmao Wang
2025-03-11 19:28       ` Mickaël Salaün
2025-03-26  0:06         ` Tingmao Wang
2025-04-11 10:55           ` Mickaël Salaün
2025-03-04  1:13 ` [RFC PATCH 5/9] Define user structure for events and responses Tingmao Wang
2025-03-04 19:49   ` Mickaël Salaün
2025-03-06  3:05     ` Tingmao Wang
2025-03-08 19:07       ` Mickaël Salaün
2025-03-10  0:39         ` Tingmao Wang
2025-03-11 19:29           ` Mickaël Salaün
2025-03-10  0:39       ` Tingmao Wang
2025-03-11 19:28         ` Mickaël Salaün
2025-03-11 23:18           ` Tingmao Wang
2025-03-12 11:49             ` Mickaël Salaün
2025-03-26  0:02               ` Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 6/9] Creating supervisor events for filesystem operations Tingmao Wang
2025-03-04 19:50   ` Mickaël Salaün
2025-03-10  0:39     ` Tingmao Wang
2025-03-11 19:29       ` Mickaël Salaün
2025-03-04  1:13 ` [RFC PATCH 7/9] Implement fdinfo for ruleset and supervisor fd Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 8/9] Implement fops for supervisor-fd Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 9/9] Enhance the sandboxer example to support landlock-supervise Tingmao Wang
2025-03-04 19:48 ` [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Mickaël Salaün
2025-03-06  2:57   ` Tingmao Wang
2025-03-06 17:07     ` Amir Goldstein
2025-03-08 19:14       ` Mickaël Salaün
2025-03-11  0:42       ` Tingmao Wang
2025-03-11 19:28         ` Mickaël Salaün
2025-03-11 20:58           ` Song Liu
2025-03-11 22:03             ` Tingmao Wang
2025-03-11 23:23               ` Song Liu
2025-03-12 11:50             ` Mickaël Salaün
2025-03-12 10:58         ` Jan Kara
2025-03-12 12:26         ` Amir Goldstein
2025-03-08 18:57     ` Mickaël Salaün
2025-03-06 21:04 ` Jan Kara
2025-03-08 19:15   ` Mickaël Salaün

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).