linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Landlock: Disconnected directory handling
@ 2025-07-19 10:41 Mickaël Salaün
  2025-07-19 10:42 ` [PATCH v3 1/4] landlock: Fix cosmetic change Mickaël Salaün
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-19 10:41 UTC (permalink / raw)
  To: Günther Noack, Tingmao Wang
  Cc: Mickaël Salaün, Al Viro, Ben Scarlato,
	Christian Brauner, Daniel Burgener, Jann Horn, Jeff Xu, NeilBrown,
	Paul Moore, Ryan Sullivan, Song Liu, linux-fsdevel,
	linux-security-module

Hi,

This patch series fixes and test Landlock's handling of disconnected
directories.

This third version mostly fixes a race condition spotted by Tingmao, and
adds more comments.

Previous versions:
v2: https://lore.kernel.org/r/20250711191938.2007175-1-mic@digikod.net
v1: https://lore.kernel.org/r/20250701183812.3201231-1-mic@digikod.net

Regards,

Mickaël Salaün (3):
  landlock: Fix cosmetic change
  landlock: Fix handling of disconnected directories
  selftests/landlock: Add disconnected leafs and branch test suites

Tingmao Wang (1):
  selftests/landlock: Add tests for access through disconnected paths

 fs/namei.c                                 |    2 +-
 include/linux/fs.h                         |    1 +
 security/landlock/errata/abi-1.h           |   16 +
 security/landlock/fs.c                     |  192 ++-
 tools/testing/selftests/landlock/fs_test.c | 1317 +++++++++++++++++++-
 5 files changed, 1491 insertions(+), 37 deletions(-)
 create mode 100644 security/landlock/errata/abi-1.h

-- 
2.50.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/4] landlock: Fix cosmetic change
  2025-07-19 10:41 [PATCH v3 0/4] Landlock: Disconnected directory handling Mickaël Salaün
@ 2025-07-19 10:42 ` Mickaël Salaün
  2025-07-19 10:42 ` [PATCH v3 2/4] landlock: Fix handling of disconnected directories Mickaël Salaün
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-19 10:42 UTC (permalink / raw)
  To: Günther Noack, Tingmao Wang
  Cc: Mickaël Salaün, Al Viro, Ben Scarlato,
	Christian Brauner, Daniel Burgener, Jann Horn, Jeff Xu, NeilBrown,
	Paul Moore, Ryan Sullivan, Song Liu, linux-fsdevel,
	linux-security-module, Konstantin Meskhidze

This line removal should not be there and it makes it more difficult to
backport the following patch.

Cc: Günther Noack <gnoack@google.com>
Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Fixes: 7a11275c3787 ("landlock: Refactor layer helpers")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v2:
- New patch.
---
 security/landlock/fs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 6fee7c20f64d..c04f8879ad03 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -895,6 +895,7 @@ static bool is_access_to_paths_allowed(
 		/* Stops when a rule from each layer grants access. */
 		if (allowed_parent1 && allowed_parent2)
 			break;
+
 jump_up:
 		if (walker_path.dentry == walker_path.mnt->mnt_root) {
 			if (follow_up(&walker_path)) {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-19 10:41 [PATCH v3 0/4] Landlock: Disconnected directory handling Mickaël Salaün
  2025-07-19 10:42 ` [PATCH v3 1/4] landlock: Fix cosmetic change Mickaël Salaün
@ 2025-07-19 10:42 ` Mickaël Salaün
  2025-07-22 18:04   ` Tingmao Wang
  2025-07-19 10:42 ` [PATCH v3 3/4] selftests/landlock: Add tests for access through disconnected paths Mickaël Salaün
  2025-07-19 10:42 ` [PATCH v3 4/4] selftests/landlock: Add disconnected leafs and branch test suites Mickaël Salaün
  3 siblings, 1 reply; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-19 10:42 UTC (permalink / raw)
  To: Günther Noack, Tingmao Wang
  Cc: Mickaël Salaün, Al Viro, Ben Scarlato,
	Christian Brauner, Daniel Burgener, Jann Horn, Jeff Xu, NeilBrown,
	Paul Moore, Ryan Sullivan, Song Liu, linux-fsdevel,
	linux-security-module

We can get disconnected files or directories when they are visible and
opened from a bind mount, before being renamed/moved from the source of
the bind mount in a way that makes them inaccessible from the mount
point (i.e. out of scope).

Until now, access rights tied to files or directories opened through a
disconnected directory were collected by walking the related hierarchy
down to the root of this filesystem because the mount point couldn't be
found.  This could lead to inconsistent access results, and
hard-to-debug renames, especially because such paths cannot be printed.

For a sandboxed task to create a disconnected directory, it needs to
have write access (i.e. FS_MAKE_REG, FS_REMOVE_FILE, and FS_REFER) to
the underlying source of the bind mount, and read access to the related
mount point.  Because a sandboxed task cannot get more access than those
defined by its Landlock domain, this could only lead to inconsistent
access rights because of missing those that should be inherited from the
mount point hierarchy and inheriting from the hierarchy of the mounted
filesystem instead.

Landlock now handles files/directories opened from disconnected
directories like the mount point these disconnected directories were
opened from.  This gives the guarantee that access rights on a
file/directory cannot be more than those at open time.  The rationale is
that disconnected hierarchies might not be visible nor accessible to a
sandboxed task, and relying on the collected access rights from them
could introduce unexpected results, especially for rename actions
because of the access right comparison between the source and the
destination (see LANDLOCK_ACCESS_FS_REFER).  This new behavior is much
less surprising to users and safer from an access point of view.

Unlike follow_dotdot(), we don't need to check for each directory if it
is part of the mount's root, but instead this is only checked when we
reached a root dentry (not a mount point), or when the access
request is about to be allowed.  This limits the number of calls to
is_subdir() which walks down the hierarchy (again).  This also avoids
checking path connection at the beginning of the walk for each mount
point, which would be racy.

Remove a wrong WARN_ON_ONCE() canary in collect_domain_accesses() and
fix comment.

This change increases the stack size with two Landlock layer masks
backups and a boolean, that are needed to reset the collected and
requested access rights from the latest mount point.

Because opened files have their access rights stored in the related file
security properties, their is no impact for disconnected or unlinked
files.

Make path_connected() public to stay consistent with the VFS.  This
helper is used when we are about to allowed an access.

Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reported-by: Tingmao Wang <m@maowtm.org>
Closes: https://lore.kernel.org/r/027d5190-b37a-40a8-84e9-4ccbc352bcdf@maowtm.org
Closes: https://lore.kernel.org/r/09b24128f86973a6022e6aa8338945fcfb9a33e4.1749925391.git.m@maowtm.org
Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
Fixes: cb2c7d1a1776 ("landlock: Support filesystem access-control")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v2:
- Fix domain check mode reset, spotted by Tingmao.  Replace a
  conditional branch (when all accesses are granted) by only looking for
  a rule when needed.  This simplifies code and remove a call to
  path_connected().  Add a new is_dom_check_bkp to safely handle race
  conditions and move initial backup for layer_masks_parent1.
- Fix layer masks parent backup initialization, spotted by Tingmao.
- Add more comments, suggested by Tingmao.
- Reformat erratum.

Changes since v1:
- Remove useless else branch in is_access_to_paths_allowed().
- Update commit message and squash "landlock: Remove warning in
  collect_domain_accesses()":
  https://lore.kernel.org/r/20250618134734.1673254-1-mic@digikod.net
- Remove "extern" for path_connected() in fs.h, requested by Christian.
- Add Acked-by Christian.
- Fix docstring and improve doc for collect_domain_accesses().
- Remove path_connected() check for the real root.
- Fix allowed_parent* resets to be consistent with their initial values
  infered from the evaluated domain.
- Add Landlock erratum.
---
 fs/namei.c                       |   2 +-
 include/linux/fs.h               |   1 +
 security/landlock/errata/abi-1.h |  16 +++
 security/landlock/fs.c           | 191 +++++++++++++++++++++++++------
 4 files changed, 176 insertions(+), 34 deletions(-)
 create mode 100644 security/landlock/errata/abi-1.h

diff --git a/fs/namei.c b/fs/namei.c
index 4bb889fc980b..7853a876fc1c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -716,7 +716,7 @@ static bool nd_alloc_stack(struct nameidata *nd)
  * Rename can sometimes move a file or directory outside of a bind
  * mount, path_connected allows those cases to be detected.
  */
-static bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
+bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
 {
 	struct super_block *sb = mnt->mnt_sb;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4ec77da65f14..fce95b30c4aa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3252,6 +3252,7 @@ extern struct file * open_exec(const char *);
 /* fs/dcache.c -- generic fs support functions */
 extern bool is_subdir(struct dentry *, struct dentry *);
 extern bool path_is_under(const struct path *, const struct path *);
+bool path_connected(struct vfsmount *mnt, struct dentry *dentry);
 
 extern char *file_path(struct file *, char *, int);
 
diff --git a/security/landlock/errata/abi-1.h b/security/landlock/errata/abi-1.h
new file mode 100644
index 000000000000..6edc7182800f
--- /dev/null
+++ b/security/landlock/errata/abi-1.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/**
+ * DOC: erratum_3
+ *
+ * Erratum 3: Disconnected directory handling
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This fix addresses an issue with disconnected directories that occur when a
+ * directory is moved outside the scope of a bind mount.  The change ensures
+ * that evaluated access rights exclude those inherited from disconnected file
+ * hierarchies (no longer accessible from the related mount point), and instead
+ * only consider rights tied to directories that remain visible.  This prevents
+ * access inconsistencies caused by missing access rights.
+ */
+LANDLOCK_ERRATUM(3)
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index c04f8879ad03..25bba840aff3 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -764,11 +764,14 @@ static bool is_access_to_paths_allowed(
 	struct dentry *const dentry_child2)
 {
 	bool allowed_parent1 = false, allowed_parent2 = false, is_dom_check,
-	     child1_is_directory = true, child2_is_directory = true;
+	     is_dom_check_bkp, child1_is_directory = true,
+	     child2_is_directory = true;
 	struct path walker_path;
 	access_mask_t access_masked_parent1, access_masked_parent2;
 	layer_mask_t _layer_masks_child1[LANDLOCK_NUM_ACCESS_FS],
-		_layer_masks_child2[LANDLOCK_NUM_ACCESS_FS];
+		_layer_masks_child2[LANDLOCK_NUM_ACCESS_FS],
+		_layer_masks_parent1_bkp[LANDLOCK_NUM_ACCESS_FS],
+		_layer_masks_parent2_bkp[LANDLOCK_NUM_ACCESS_FS];
 	layer_mask_t(*layer_masks_child1)[LANDLOCK_NUM_ACCESS_FS] = NULL,
 	(*layer_masks_child2)[LANDLOCK_NUM_ACCESS_FS] = NULL;
 
@@ -784,12 +787,18 @@ static bool is_access_to_paths_allowed(
 	if (WARN_ON_ONCE(!layer_masks_parent1))
 		return false;
 
-	allowed_parent1 = is_layer_masks_allowed(layer_masks_parent1);
-
 	if (unlikely(layer_masks_parent2)) {
 		if (WARN_ON_ONCE(!dentry_child1))
 			return false;
 
+		/*
+		 * Creates a backup of the initial layer masks to be able to restore
+		 * them if we find out that we were walking a disconnected directory,
+		 * which would make the collected access rights inconsistent (cf.
+		 * reset_to_mount_root).
+		 */
+		memcpy(&_layer_masks_parent2_bkp, layer_masks_parent2,
+		       sizeof(_layer_masks_parent2_bkp));
 		allowed_parent2 = is_layer_masks_allowed(layer_masks_parent2);
 
 		/*
@@ -809,6 +818,16 @@ static bool is_access_to_paths_allowed(
 		is_dom_check = false;
 	}
 
+	/*
+	 * Creates a backup of the initial layer masks to be able to restore them if
+	 * we find out that we were walking a disconnected directory, which would
+	 * make the collected access rights inconsistent (cf. reset_to_mount_root).
+	 */
+	memcpy(&_layer_masks_parent1_bkp, layer_masks_parent1,
+	       sizeof(_layer_masks_parent1_bkp));
+	allowed_parent1 = is_layer_masks_allowed(layer_masks_parent1);
+	is_dom_check_bkp = is_dom_check;
+
 	if (unlikely(dentry_child1)) {
 		landlock_unmask_layers(
 			find_rule(domain, dentry_child1),
@@ -874,13 +893,13 @@ static bool is_access_to_paths_allowed(
 				allowed_parent2 ||
 				scope_to_request(access_masked_parent2,
 						 layer_masks_parent2);
-
-			/* Stops when all accesses are granted. */
-			if (allowed_parent1 && allowed_parent2)
-				break;
 		}
 
-		rule = find_rule(domain, walker_path.dentry);
+		/* Looks for a rule when needed. */
+		rule = unlikely(allowed_parent1 && allowed_parent2) ?
+			       NULL :
+			       find_rule(domain, walker_path.dentry);
+
 		allowed_parent1 = allowed_parent1 ||
 				  landlock_unmask_layers(
 					  rule, access_masked_parent1,
@@ -893,12 +912,47 @@ static bool is_access_to_paths_allowed(
 					  ARRAY_SIZE(*layer_masks_parent2));
 
 		/* Stops when a rule from each layer grants access. */
-		if (allowed_parent1 && allowed_parent2)
+		if (allowed_parent1 && allowed_parent2) {
+			/*
+			 * Before allowing the access request, checks that the walk was not
+			 * in a disconnected directory.
+			 */
+			if (unlikely(!path_connected(walker_path.mnt,
+						     walker_path.dentry)))
+				goto reset_to_mount_root;
+
 			break;
+		}
 
 jump_up:
 		if (walker_path.dentry == walker_path.mnt->mnt_root) {
+			/*
+			 * We reached a mount point which is not a disconnected directory.
+			 * We can now safely assume that the collected access rights are
+			 * valid, and we can save them to be able to get back to this state
+			 * later on.  If we reached the real root, the walk will end, and we
+			 * will not need to restore anything, so we only need to save the
+			 * collected access rights if we are not at the real root.
+			 */
 			if (follow_up(&walker_path)) {
+				/*
+				 * The mount point before this follow_up() call was connected.
+				 * We will know during the ongoing walk if the path from this
+				 * new mount point (i.e. walker_path) is disconnected.  If it is
+				 * the case, we will restore the collected access rights from
+				 * here and jump to walker_path.mnt->mnt_root, short-circuiting
+				 * the disconnected hierarchy (cf. reset_to_mount_root).
+				 */
+				memcpy(&_layer_masks_parent1_bkp,
+				       layer_masks_parent1,
+				       sizeof(_layer_masks_parent1_bkp));
+				if (layer_masks_parent2) {
+					memcpy(&_layer_masks_parent2_bkp,
+					       layer_masks_parent2,
+					       sizeof(_layer_masks_parent2_bkp));
+					is_dom_check_bkp = is_dom_check;
+				}
+
 				/* Ignores hidden mount points. */
 				goto jump_up;
 			} else {
@@ -910,20 +964,69 @@ static bool is_access_to_paths_allowed(
 			}
 		}
 		if (unlikely(IS_ROOT(walker_path.dentry))) {
-			/*
-			 * Stops at disconnected root directories.  Only allows
-			 * access to internal filesystems (e.g. nsfs, which is
-			 * reachable through /proc/<pid>/ns/<namespace>).
-			 */
-			if (walker_path.mnt->mnt_flags & MNT_INTERNAL) {
+			if (likely(walker_path.mnt->mnt_flags & MNT_INTERNAL)) {
+				/*
+				 * Stops and allows access when reaching disconnected root
+				 * directories that are part of internal filesystems (e.g. nsfs,
+				 * which is reachable through /proc/<pid>/ns/<namespace>).
+				 */
 				allowed_parent1 = true;
 				allowed_parent2 = true;
+				break;
 			}
-			break;
+
+			/*
+			 * We reached a disconnected root directory from a bind mount, and
+			 * we need to reset the walk to the current mount root.
+			 */
+			goto reset_to_mount_root;
 		}
 		parent_dentry = dget_parent(walker_path.dentry);
 		dput(walker_path.dentry);
 		walker_path.dentry = parent_dentry;
+		continue;
+
+reset_to_mount_root:
+		/*
+		 * At this point, we know that the walk was in a disconnected file
+		 * hierarchy and we need to restore the layer masks from the last known
+		 * good values.  These were either built from the domain (at the
+		 * beginning of the walk) or from the collected access rights up to the
+		 * previous connected mount point.  This ensures we don't use
+		 * potentially invalid access rights from the disconnected path
+		 * traversal.
+		 */
+		memcpy(layer_masks_parent1, &_layer_masks_parent1_bkp,
+		       sizeof(_layer_masks_parent1_bkp));
+		allowed_parent1 =
+			is_layer_masks_allowed(&_layer_masks_parent1_bkp);
+		if (layer_masks_parent2) {
+			memcpy(layer_masks_parent2, &_layer_masks_parent2_bkp,
+			       sizeof(_layer_masks_parent2_bkp));
+			allowed_parent2 = is_layer_masks_allowed(
+				&_layer_masks_parent2_bkp);
+
+			/*
+			 * Restores domain check mode if needed: increases back the scope of
+			 * the access checks to the domain's handled accesses, which are a
+			 * superset of the requested ones.
+			 */
+			if (is_dom_check_bkp) {
+				access_masked_parent1 = access_masked_parent2 =
+					landlock_union_access_masks(domain).fs;
+				is_dom_check = true;
+			}
+		}
+
+		/*
+		 * Jumps to the root of the current mount point to short-circuit the
+		 * disconnected walk with a reachable directory.  This ensures we
+		 * continue the access check from a known good location in the
+		 * filesystem hierarchy.
+		 */
+		dput(walker_path.dentry);
+		walker_path.dentry = walker_path.mnt->mnt_root;
+		dget(walker_path.dentry);
 	}
 	path_put(&walker_path);
 
@@ -1011,15 +1114,18 @@ static access_mask_t maybe_remove(const struct dentry *const dentry)
  * collect_domain_accesses - Walk through a file path and collect accesses
  *
  * @domain: Domain to check against.
- * @mnt_root: Last directory to check.
+ * @mnt_dir: Mount point directory to stop the walk at.
  * @dir: Directory to start the walk from.
  * @layer_masks_dom: Where to store the collected accesses.
  *
- * This helper is useful to begin a path walk from the @dir directory to a
- * @mnt_root directory used as a mount point.  This mount point is the common
- * ancestor between the source and the destination of a renamed and linked
- * file.  While walking from @dir to @mnt_root, we record all the domain's
- * allowed accesses in @layer_masks_dom.
+ * This helper is useful to begin a path walk from the @dir directory to
+ * @mnt_dir.  This mount point is the common ancestor between the source and the
+ * destination of a renamed and linked file.  While walking from @dir to
+ * @mnt_dir, we record all the domain's allowed accesses in @layer_masks_dom.
+ *
+ * Because of disconnected directories, this walk may not reach @mnt_dir.  In
+ * this case, the walk is canceled and the collected accesses are reset to the
+ * domain handled ones.
  *
  * This is similar to is_access_to_paths_allowed() but much simpler because it
  * only handles walking on the same mount point and only checks one set of
@@ -1031,13 +1137,13 @@ static access_mask_t maybe_remove(const struct dentry *const dentry)
  */
 static bool collect_domain_accesses(
 	const struct landlock_ruleset *const domain,
-	const struct dentry *const mnt_root, struct dentry *dir,
+	const struct path *const mnt_dir, struct dentry *dir,
 	layer_mask_t (*const layer_masks_dom)[LANDLOCK_NUM_ACCESS_FS])
 {
-	unsigned long access_dom;
+	access_mask_t access_dom;
 	bool ret = false;
 
-	if (WARN_ON_ONCE(!domain || !mnt_root || !dir || !layer_masks_dom))
+	if (WARN_ON_ONCE(!domain || !mnt_dir || !dir || !layer_masks_dom))
 		return true;
 	if (is_nouser_or_private(dir))
 		return true;
@@ -1054,6 +1160,13 @@ static bool collect_domain_accesses(
 		if (landlock_unmask_layers(find_rule(domain, dir), access_dom,
 					   layer_masks_dom,
 					   ARRAY_SIZE(*layer_masks_dom))) {
+			/*
+			 * Before allowing this side of the access request, checks that the
+			 * walk was not in a disconnected directory.
+			 */
+			if (unlikely(!path_connected(mnt_dir->mnt, dir)))
+				goto cancel_walk;
+
 			/*
 			 * Stops when all handled accesses are allowed by at
 			 * least one rule in each layer.
@@ -1062,13 +1175,27 @@ static bool collect_domain_accesses(
 			break;
 		}
 
-		/* We should not reach a root other than @mnt_root. */
-		if (dir == mnt_root || WARN_ON_ONCE(IS_ROOT(dir)))
+		/* Stops at the mount point. */
+		if (dir == mnt_dir->dentry)
 			break;
 
+		/* Ignores the walk if we end up in a disconnected root directory. */
+		if (unlikely(IS_ROOT(dir)))
+			goto cancel_walk;
+
 		parent_dentry = dget_parent(dir);
 		dput(dir);
 		dir = parent_dentry;
+		continue;
+
+cancel_walk:
+		/*
+		 * Resets the inconsistent collected access rights to the domain's
+		 * handled access rights since we encountered a disconnected directory.
+		 */
+		landlock_init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS,
+					  layer_masks_dom, LANDLOCK_KEY_INODE);
+		break;
 	}
 	dput(dir);
 	return ret;
@@ -1199,13 +1326,11 @@ static int current_check_refer_path(struct dentry *const old_dentry,
 						      old_dentry->d_parent;
 
 	/* new_dir->dentry is equal to new_dentry->d_parent */
-	allow_parent1 = collect_domain_accesses(subject->domain, mnt_dir.dentry,
-						old_parent,
-						&layer_masks_parent1);
-	allow_parent2 = collect_domain_accesses(subject->domain, mnt_dir.dentry,
+	allow_parent1 = collect_domain_accesses(
+		subject->domain, &mnt_dir, old_parent, &layer_masks_parent1);
+	allow_parent2 = collect_domain_accesses(subject->domain, &mnt_dir,
 						new_dir->dentry,
 						&layer_masks_parent2);
-
 	if (allow_parent1 && allow_parent2)
 		return 0;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/4] selftests/landlock: Add tests for access through disconnected paths
  2025-07-19 10:41 [PATCH v3 0/4] Landlock: Disconnected directory handling Mickaël Salaün
  2025-07-19 10:42 ` [PATCH v3 1/4] landlock: Fix cosmetic change Mickaël Salaün
  2025-07-19 10:42 ` [PATCH v3 2/4] landlock: Fix handling of disconnected directories Mickaël Salaün
@ 2025-07-19 10:42 ` Mickaël Salaün
  2025-07-19 10:42 ` [PATCH v3 4/4] selftests/landlock: Add disconnected leafs and branch test suites Mickaël Salaün
  3 siblings, 0 replies; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-19 10:42 UTC (permalink / raw)
  To: Günther Noack, Tingmao Wang
  Cc: Al Viro, Ben Scarlato, Christian Brauner, Daniel Burgener,
	Jann Horn, Jeff Xu, NeilBrown, Paul Moore, Ryan Sullivan,
	Song Liu, linux-fsdevel, linux-security-module,
	Mickaël Salaün

From: Tingmao Wang <m@maowtm.org>

This adds tests for the edge case discussed in [1], with specific ones
for rename and link operations when the operands are through
disconnected paths, as that go through a separate code path in Landlock.

This has resulted in a warning, due to collect_domain_accesses() not
expecting to reach a different root from path->mnt:

  #  RUN           layout1_bind.path_disconnected ...
  #            OK  layout1_bind.path_disconnected
  ok 96 layout1_bind.path_disconnected
  #  RUN           layout1_bind.path_disconnected_rename ...
  [..] ------------[ cut here ]------------
  [..] WARNING: CPU: 3 PID: 385 at security/landlock/fs.c:1065 collect_domain_accesses
  [..] ...
  [..] RIP: 0010:collect_domain_accesses (security/landlock/fs.c:1065 (discriminator 2) security/landlock/fs.c:1031 (discriminator 2))
  [..] current_check_refer_path (security/landlock/fs.c:1205)
  [..] ...
  [..] hook_path_rename (security/landlock/fs.c:1526)
  [..] security_path_rename (security/security.c:2026 (discriminator 1))
  [..] do_renameat2 (fs/namei.c:5264)
  #            OK  layout1_bind.path_disconnected_rename
  ok 97 layout1_bind.path_disconnected_rename

Move the const char definitions a bit above so that we can use the path
for s4d1 in cleanup code.

Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/027d5190-b37a-40a8-84e9-4ccbc352bcdf@maowtm.org [1]
Signed-off-by: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
- Integrate this patch into my patch series, and change the result for
  two tests with updated comments.  Diff here:
  https://lore.kernel.org/r/20250701183812.3201231-2-mic@digikod.net
- Replace most ASSERT with EXPECT, add extra checks, massage commit
  message and comments.
- Squash Tingmao's patches:
  https://lore.kernel.org/r/09b24128f86973a6022e6aa8338945fcfb9a33e4.1749925391.git.m@maowtm.org
  https://lore.kernel.org/r/8ed0bfcd-aefa-44bd-86b6-e12583779187@maowtm.org
  https://lore.kernel.org/r/3080e512-64b0-42cf-b379-8f52cfeff78a@maowtm.org
---
 tools/testing/selftests/landlock/fs_test.c | 405 ++++++++++++++++++++-
 1 file changed, 402 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index fa0f18ec62c4..5312698927ea 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -4561,6 +4561,18 @@ TEST_F_FORK(ioctl, handle_file_access_file)
 FIXTURE(layout1_bind) {};
 /* clang-format on */
 
+static const char bind_dir_s1d3[] = TMP_DIR "/s2d1/s2d2/s1d3";
+static const char bind_file1_s1d3[] = TMP_DIR "/s2d1/s2d2/s1d3/f1";
+
+/* Move targets for disconnected path tests. */
+static const char dir_s4d1[] = TMP_DIR "/s4d1";
+static const char file1_s4d1[] = TMP_DIR "/s4d1/f1";
+static const char file2_s4d1[] = TMP_DIR "/s4d1/f2";
+static const char dir_s4d2[] = TMP_DIR "/s4d1/s4d2";
+static const char file1_s4d2[] = TMP_DIR "/s4d1/s4d2/f1";
+static const char file1_name[] = "f1";
+static const char file2_name[] = "f2";
+
 FIXTURE_SETUP(layout1_bind)
 {
 	prepare_layout(_metadata);
@@ -4576,14 +4588,14 @@ FIXTURE_TEARDOWN_PARENT(layout1_bind)
 {
 	/* umount(dir_s2d2)) is handled by namespace lifetime. */
 
+	remove_path(file1_s4d1);
+	remove_path(file2_s4d1);
+
 	remove_layout1(_metadata);
 
 	cleanup_layout(_metadata);
 }
 
-static const char bind_dir_s1d3[] = TMP_DIR "/s2d1/s2d2/s1d3";
-static const char bind_file1_s1d3[] = TMP_DIR "/s2d1/s2d2/s1d3/f1";
-
 /*
  * layout1_bind hierarchy:
  *
@@ -4806,6 +4818,393 @@ TEST_F_FORK(layout1_bind, reparent_cross_mount)
 	ASSERT_EQ(0, rename(bind_file1_s1d3, file1_s2d2));
 }
 
+/*
+ * Make sure access to file through a disconnected path works as expected.
+ * This test moves s1d3 to s4d1.
+ */
+TEST_F_FORK(layout1_bind, path_disconnected)
+{
+	const struct rule layer1_allow_all[] = {
+		{
+			.path = TMP_DIR,
+			.access = ACCESS_ALL,
+		},
+		{},
+	};
+	const struct rule layer2_allow_just_f1[] = {
+		{
+			.path = file1_s1d3,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{},
+	};
+	const struct rule layer3_only_s1d2[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{},
+	};
+
+	/* Landlock should not deny access just because it is disconnected. */
+	int ruleset_fd_l1 =
+		create_ruleset(_metadata, ACCESS_ALL, layer1_allow_all);
+
+	/* Creates the new ruleset now before we move the dir containing the file. */
+	int ruleset_fd_l2 =
+		create_ruleset(_metadata, ACCESS_RW, layer2_allow_just_f1);
+	int ruleset_fd_l3 =
+		create_ruleset(_metadata, ACCESS_RW, layer3_only_s1d2);
+	int bind_s1d3_fd;
+
+	ASSERT_LE(0, ruleset_fd_l1);
+	ASSERT_LE(0, ruleset_fd_l2);
+	ASSERT_LE(0, ruleset_fd_l3);
+
+	enforce_ruleset(_metadata, ruleset_fd_l1);
+	EXPECT_EQ(0, close(ruleset_fd_l1));
+
+	bind_s1d3_fd = open(bind_dir_s1d3, O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, bind_s1d3_fd);
+
+	/* Tests access is possible before we move. */
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file2_name, O_RDONLY));
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, "..", O_RDONLY | O_DIRECTORY));
+
+	/* Makes it disconnected. */
+	ASSERT_EQ(0, rename(dir_s1d3, dir_s4d1))
+	{
+		TH_LOG("Failed to rename %s to %s: %s", dir_s1d3, dir_s4d1,
+		       strerror(errno));
+	}
+
+	/* Tests that access is still possible. */
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file2_name, O_RDONLY));
+
+	/*
+	 * Tests that ".." is not possible (not because of Landlock, but just
+	 * because it's disconnected).
+	 */
+	EXPECT_EQ(ENOENT,
+		  test_open_rel(bind_s1d3_fd, "..", O_RDONLY | O_DIRECTORY));
+
+	/* This should still work with a narrower rule. */
+	enforce_ruleset(_metadata, ruleset_fd_l2);
+	EXPECT_EQ(0, close(ruleset_fd_l2));
+
+	EXPECT_EQ(0, test_open(file1_s4d1, O_RDONLY));
+	/*
+	 * Accessing a file through a disconnected file descriptor is not allowed
+	 * when it is no longer visible in its mount point.
+	 */
+	EXPECT_EQ(EACCES, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+	EXPECT_EQ(EACCES, test_open_rel(bind_s1d3_fd, file2_name, O_RDONLY));
+
+	/*
+	 * But if we only allow access to under the original dir, then it
+	 * should be denied.
+	 */
+	enforce_ruleset(_metadata, ruleset_fd_l3);
+	EXPECT_EQ(0, close(ruleset_fd_l3));
+	EXPECT_EQ(EACCES, test_open(file1_s4d1, O_RDONLY));
+	EXPECT_EQ(EACCES, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+	EXPECT_EQ(EACCES, test_open_rel(bind_s1d3_fd, file2_name, O_RDONLY));
+}
+
+/*
+ * Test that renameat with disconnected paths works under Landlock.  This test
+ * moves s1d3 to s4d2, so that we can have a rule allowing refers on the move
+ * target's immediate parent.
+ */
+TEST_F_FORK(layout1_bind, path_disconnected_rename)
+{
+	const struct rule layer1[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_REFER |
+				  LANDLOCK_ACCESS_FS_MAKE_DIR |
+				  LANDLOCK_ACCESS_FS_REMOVE_DIR |
+				  LANDLOCK_ACCESS_FS_MAKE_REG |
+				  LANDLOCK_ACCESS_FS_REMOVE_FILE |
+				  LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{
+			.path = dir_s4d1,
+			.access = LANDLOCK_ACCESS_FS_REFER |
+				  LANDLOCK_ACCESS_FS_MAKE_DIR |
+				  LANDLOCK_ACCESS_FS_REMOVE_DIR |
+				  LANDLOCK_ACCESS_FS_MAKE_REG |
+				  LANDLOCK_ACCESS_FS_REMOVE_FILE |
+				  LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{}
+	};
+
+	/* This layer only handles LANDLOCK_ACCESS_FS_READ_FILE. */
+	const struct rule layer2_only_s1d2[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{},
+	};
+	int ruleset_fd_l1, ruleset_fd_l2;
+	pid_t child_pid;
+	int bind_s1d3_fd, status;
+
+	ASSERT_EQ(0, mkdir(dir_s4d1, 0755))
+	{
+		TH_LOG("Failed to create %s: %s", dir_s4d1, strerror(errno));
+	}
+	ruleset_fd_l1 = create_ruleset(_metadata, ACCESS_ALL, layer1);
+	ruleset_fd_l2 = create_ruleset(_metadata, LANDLOCK_ACCESS_FS_READ_FILE,
+				       layer2_only_s1d2);
+	ASSERT_LE(0, ruleset_fd_l1);
+	ASSERT_LE(0, ruleset_fd_l2);
+
+	enforce_ruleset(_metadata, ruleset_fd_l1);
+	EXPECT_EQ(0, close(ruleset_fd_l1));
+
+	bind_s1d3_fd = open(bind_dir_s1d3, O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, bind_s1d3_fd);
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+
+	/* Tests ENOENT priority over EACCES for disconnected directory. */
+	EXPECT_EQ(EACCES, test_open_rel(bind_s1d3_fd, "..", O_DIRECTORY));
+	ASSERT_EQ(0, rename(dir_s1d3, dir_s4d2))
+	{
+		TH_LOG("Failed to rename %s to %s: %s", dir_s1d3, dir_s4d2,
+		       strerror(errno));
+	}
+	EXPECT_EQ(ENOENT, test_open_rel(bind_s1d3_fd, "..", O_DIRECTORY));
+
+	/*
+	 * The file is no longer under s1d2 but we should still be able to access it
+	 * with layer 2 because its mount point is evaluated as the first valid
+	 * directory because it was initially a parent.  Do a fork to test this so
+	 * we don't prevent ourselves from renaming it back later.
+	 */
+	child_pid = fork();
+	ASSERT_LE(0, child_pid);
+	if (child_pid == 0) {
+		enforce_ruleset(_metadata, ruleset_fd_l2);
+		EXPECT_EQ(0, close(ruleset_fd_l2));
+		EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+		EXPECT_EQ(EACCES, test_open(file1_s4d2, O_RDONLY));
+
+		/*
+		 * Tests that access widening checks indeed prevents us from renaming it
+		 * back.
+		 */
+		EXPECT_EQ(-1, rename(dir_s4d2, dir_s1d3));
+		EXPECT_EQ(EXDEV, errno);
+
+		/*
+		 * Including through the now disconnected fd (but it should return
+		 * EXDEV).
+		 */
+		EXPECT_EQ(-1, renameat(bind_s1d3_fd, file1_name, AT_FDCWD,
+				       file1_s2d2));
+		EXPECT_EQ(EXDEV, errno);
+		_exit(_metadata->exit_code);
+		return;
+	}
+
+	EXPECT_EQ(child_pid, waitpid(child_pid, &status, 0));
+	EXPECT_EQ(1, WIFEXITED(status));
+	EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+
+	ASSERT_EQ(0, rename(dir_s4d2, dir_s1d3))
+	{
+		TH_LOG("Failed to rename %s back to %s: %s", dir_s4d1, dir_s1d3,
+		       strerror(errno));
+	}
+
+	/* Now checks that we can access it under l2. */
+	child_pid = fork();
+	ASSERT_LE(0, child_pid);
+	if (child_pid == 0) {
+		enforce_ruleset(_metadata, ruleset_fd_l2);
+		EXPECT_EQ(0, close(ruleset_fd_l2));
+		EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+		EXPECT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+		_exit(_metadata->exit_code);
+		return;
+	}
+
+	EXPECT_EQ(child_pid, waitpid(child_pid, &status, 0));
+	EXPECT_EQ(1, WIFEXITED(status));
+	EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+
+	/*
+	 * Also test that we can rename via a disconnected path.  We move the
+	 * dir back to the disconnected place first, then we rename file1 to
+	 * file2 through our dir fd.
+	 */
+	ASSERT_EQ(0, rename(dir_s1d3, dir_s4d2))
+	{
+		TH_LOG("Failed to rename %s to %s: %s", dir_s1d3, dir_s4d2,
+		       strerror(errno));
+	}
+	ASSERT_EQ(0,
+		  renameat(bind_s1d3_fd, file1_name, bind_s1d3_fd, file2_name))
+	{
+		TH_LOG("Failed to rename %s to %s within disconnected %s: %s",
+		       file1_name, file2_name, bind_dir_s1d3, strerror(errno));
+	}
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file2_name, O_RDONLY));
+	ASSERT_EQ(0, renameat(bind_s1d3_fd, file2_name, AT_FDCWD, file1_s2d2))
+	{
+		TH_LOG("Failed to rename %s to %s through disconnected %s: %s",
+		       file2_name, file1_s2d2, bind_dir_s1d3, strerror(errno));
+	}
+	EXPECT_EQ(0, test_open(file1_s2d2, O_RDONLY));
+	EXPECT_EQ(0, test_open(file1_s1d2, O_RDONLY));
+
+	/* Move it back using the disconnected path as the target. */
+	ASSERT_EQ(0, renameat(AT_FDCWD, file1_s2d2, bind_s1d3_fd, file1_name))
+	{
+		TH_LOG("Failed to rename %s to %s through disconnected %s: %s",
+		       file1_s1d2, file1_name, bind_dir_s1d3, strerror(errno));
+	}
+
+	/* Now make it connected again. */
+	ASSERT_EQ(0, rename(dir_s4d2, dir_s1d3))
+	{
+		TH_LOG("Failed to rename %s back to %s: %s", dir_s4d2, dir_s1d3,
+		       strerror(errno));
+	}
+
+	/* Checks again that we can access it under l2. */
+	enforce_ruleset(_metadata, ruleset_fd_l2);
+	EXPECT_EQ(0, close(ruleset_fd_l2));
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+	EXPECT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+}
+
+/*
+ * Test that linkat(2) with disconnected paths works under Landlock. This
+ * test moves s1d3 to s4d1.
+ */
+TEST_F_FORK(layout1_bind, path_disconnected_link)
+{
+	/* Ruleset to be applied after renaming s1d3 to s4d1. */
+	const struct rule layer1[] = {
+		{
+			.path = dir_s4d1,
+			.access = LANDLOCK_ACCESS_FS_REFER |
+				  LANDLOCK_ACCESS_FS_READ_FILE |
+				  LANDLOCK_ACCESS_FS_MAKE_REG |
+				  LANDLOCK_ACCESS_FS_REMOVE_FILE,
+		},
+		{
+			.path = dir_s2d2,
+			.access = LANDLOCK_ACCESS_FS_REFER |
+				  LANDLOCK_ACCESS_FS_READ_FILE |
+				  LANDLOCK_ACCESS_FS_MAKE_REG |
+				  LANDLOCK_ACCESS_FS_REMOVE_FILE,
+		},
+		{}
+	};
+	int ruleset_fd, bind_s1d3_fd;
+
+	/* Removes unneeded files created by layout1, otherwise it will EEXIST. */
+	ASSERT_EQ(0, unlink(file1_s1d2));
+	ASSERT_EQ(0, unlink(file2_s1d3));
+
+	bind_s1d3_fd = open(bind_dir_s1d3, O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, bind_s1d3_fd);
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+
+	/* Disconnects bind_s1d3_fd. */
+	ASSERT_EQ(0, rename(dir_s1d3, dir_s4d1))
+	{
+		TH_LOG("Failed to rename %s to %s: %s", dir_s1d3, dir_s4d1,
+		       strerror(errno));
+	}
+
+	/* Need this later to test different parent link. */
+	ASSERT_EQ(0, mkdir(dir_s4d2, 0755))
+	{
+		TH_LOG("Failed to create %s: %s", dir_s4d2, strerror(errno));
+	}
+
+	ruleset_fd = create_ruleset(_metadata, ACCESS_ALL, layer1);
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* From disconnected to connected. */
+	ASSERT_EQ(0, linkat(bind_s1d3_fd, file1_name, AT_FDCWD, file1_s2d2, 0))
+	{
+		TH_LOG("Failed to link %s to %s via disconnected %s: %s",
+		       file1_name, file1_s2d2, bind_dir_s1d3, strerror(errno));
+	}
+
+	/* Tests that we can access via the new link... */
+	EXPECT_EQ(0, test_open(file1_s2d2, O_RDONLY))
+	{
+		TH_LOG("Failed to open newly linked %s: %s", file1_s2d2,
+		       strerror(errno));
+	}
+
+	/* ...as well as the old one. */
+	EXPECT_EQ(0, test_open(file1_s4d1, O_RDONLY))
+	{
+		TH_LOG("Failed to open original %s: %s", file1_s4d1,
+		       strerror(errno));
+	}
+
+	/* From connected to disconnected. */
+	ASSERT_EQ(0, unlink(file1_s4d1));
+	ASSERT_EQ(0, linkat(AT_FDCWD, file1_s2d2, bind_s1d3_fd, file2_name, 0))
+	{
+		TH_LOG("Failed to link %s to %s via disconnected %s: %s",
+		       file1_s2d2, file2_name, bind_dir_s1d3, strerror(errno));
+	}
+	EXPECT_EQ(0, test_open(file2_s4d1, O_RDONLY));
+	ASSERT_EQ(0, unlink(file1_s2d2));
+
+	/* From disconnected to disconnected (same parent). */
+	ASSERT_EQ(0,
+		  linkat(bind_s1d3_fd, file2_name, bind_s1d3_fd, file1_name, 0))
+	{
+		TH_LOG("Failed to link %s to %s within disconnected %s: %s",
+		       file2_name, file1_name, bind_dir_s1d3, strerror(errno));
+	}
+	EXPECT_EQ(0, test_open(file1_s4d1, O_RDONLY))
+	{
+		TH_LOG("Failed to open newly linked %s: %s", file1_s4d1,
+		       strerror(errno));
+	}
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY))
+	{
+		TH_LOG("Failed to open %s through newly created link under disconnected path: %s",
+		       file1_name, strerror(errno));
+	}
+	ASSERT_EQ(0, unlink(file2_s4d1));
+
+	/* From disconnected to disconnected (different parent). */
+	ASSERT_EQ(0,
+		  linkat(bind_s1d3_fd, file1_name, bind_s1d3_fd, "s4d2/f1", 0))
+	{
+		TH_LOG("Failed to link %s to %s within disconnected %s: %s",
+		       file1_name, "s4d2/f1", bind_dir_s1d3, strerror(errno));
+	}
+	EXPECT_EQ(0, test_open(file1_s4d2, O_RDONLY))
+	{
+		TH_LOG("Failed to open %s after link: %s", file1_s4d2,
+		       strerror(errno));
+	}
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, "s4d2/f1", O_RDONLY))
+	{
+		TH_LOG("Failed to open %s through disconnected path after link: %s",
+		       "s4d2/f1", strerror(errno));
+	}
+}
+
 #define LOWER_BASE TMP_DIR "/lower"
 #define LOWER_DATA LOWER_BASE "/data"
 static const char lower_fl1[] = LOWER_DATA "/fl1";
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 4/4] selftests/landlock: Add disconnected leafs and branch test suites
  2025-07-19 10:41 [PATCH v3 0/4] Landlock: Disconnected directory handling Mickaël Salaün
                   ` (2 preceding siblings ...)
  2025-07-19 10:42 ` [PATCH v3 3/4] selftests/landlock: Add tests for access through disconnected paths Mickaël Salaün
@ 2025-07-19 10:42 ` Mickaël Salaün
  3 siblings, 0 replies; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-19 10:42 UTC (permalink / raw)
  To: Günther Noack, Tingmao Wang
  Cc: Mickaël Salaün, Al Viro, Ben Scarlato,
	Christian Brauner, Daniel Burgener, Jann Horn, Jeff Xu, NeilBrown,
	Paul Moore, Ryan Sullivan, Song Liu, linux-fsdevel,
	linux-security-module

Test disconnected directories with two test suites and 31 variants to
cover the main corner cases.

These tests are complementary to the previous commit.

Add test_renameat() and test_exchangeat() helpers.

Test coverage for security/landlock is 92.0% of 1962 lines according to
LLVM v20.

Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v2:
- Update test coverage.

Changes since v1:
- Rename layout4_disconnected to layout4_disconnected_leafs.
- Fix variable names.
- Add layout5_disconnected_branch test suite with 19 variants to cover
  potential implementation issues.
---
 tools/testing/selftests/landlock/fs_test.c | 912 +++++++++++++++++++++
 1 file changed, 912 insertions(+)

diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 5312698927ea..21dd95aaf5e4 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -2267,6 +2267,22 @@ static int test_exchange(const char *const oldpath, const char *const newpath)
 	return 0;
 }
 
+static int test_renameat(int olddirfd, const char *oldpath, int newdirfd,
+			 const char *newpath)
+{
+	if (renameat2(olddirfd, oldpath, newdirfd, newpath, 0))
+		return errno;
+	return 0;
+}
+
+static int test_exchangeat(int olddirfd, const char *oldpath, int newdirfd,
+			   const char *newpath)
+{
+	if (renameat2(olddirfd, oldpath, newdirfd, newpath, RENAME_EXCHANGE))
+		return errno;
+	return 0;
+}
+
 TEST_F_FORK(layout1, rename_file)
 {
 	const struct rule rules[] = {
@@ -5205,6 +5221,902 @@ TEST_F_FORK(layout1_bind, path_disconnected_link)
 	}
 }
 
+/*
+ * layout4_disconnected_leafs with bind mount and renames:
+ *
+ * tmp
+ * ├── s1d1
+ * │   └── s1d2 [source of the bind mount]
+ * │       ├── s1d31
+ * │       │   └── s1d41 [now renamed beneath s3d1]
+ * │       │       ├── f1
+ * │       │       └── f2
+ * │       └── s1d32
+ * │           └── s1d42 [now renamed beneath s4d1]
+ * │               ├── f3
+ * │               └── f4
+ * ├── s2d1
+ * │   └── s2d2 [bind mount of s1d2]
+ * │       ├── s1d31
+ * │       │   └── s1d41 [opened FD, now renamed beneath s3d1]
+ * │       │       ├── f1
+ * │       │       └── f2
+ * │       └── s1d32
+ * │           └── s1d42 [opened FD, now renamed beneath s4d1]
+ * │               ├── f3
+ * │               └── f4
+ * ├── s3d1
+ * │   └── s1d41 [renamed here]
+ * │       ├── f1
+ * │       └── f2
+ * └── s4d1
+ *     └── s1d42 [renamed here]
+ *         ├── f3
+ *         └── f4
+ */
+/* clang-format off */
+FIXTURE(layout4_disconnected_leafs) {
+	int s2d2_fd;
+};
+/* clang-format on */
+
+FIXTURE_SETUP(layout4_disconnected_leafs)
+{
+	prepare_layout(_metadata);
+
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d31/s1d41/f1");
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d31/s1d41/f2");
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d32/s1d42/f3");
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d32/s1d42/f4");
+	create_directory(_metadata, TMP_DIR "/s2d1/s2d2");
+	create_directory(_metadata, TMP_DIR "/s3d1");
+	create_directory(_metadata, TMP_DIR "/s4d1");
+
+	self->s2d2_fd =
+		open(TMP_DIR "/s2d1/s2d2", O_DIRECTORY | O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, self->s2d2_fd);
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, mount(TMP_DIR "/s1d1/s1d2", TMP_DIR "/s2d1/s2d2", NULL,
+			   MS_BIND, NULL));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN_PARENT(layout4_disconnected_leafs)
+{
+	/* umount(TMP_DIR "/s2d1") is handled by namespace lifetime. */
+
+	/* Removes files after renames. */
+	remove_path(TMP_DIR "/s3d1/s1d41/f1");
+	remove_path(TMP_DIR "/s3d1/s1d41/f2");
+	remove_path(TMP_DIR "/s4d1/s1d42/f1");
+	remove_path(TMP_DIR "/s4d1/s1d42/f3");
+	remove_path(TMP_DIR "/s4d1/s1d42/f4");
+	remove_path(TMP_DIR "/s4d1/s1d42/f5");
+
+	cleanup_layout(_metadata);
+}
+
+FIXTURE_VARIANT(layout4_disconnected_leafs)
+{
+	/*
+	 * Parent of the bind mount source.  It should always be ignored when
+	 * testing against files under the s1d41 or s1d42 disconnected directories.
+	 */
+	const __u64 allowed_s1d1;
+	/*
+	 * Source of bind mount (to s2d2).  It should always be enforced when
+	 * testing against files under the s1d41 or s1d42 disconnected directories.
+	 */
+	const __u64 allowed_s1d2;
+	/*
+	 * Original parent of s1d41.  It should always be ignored when testing
+	 * against files under the s1d41 disconnected directory.
+	 */
+	const __u64 allowed_s1d31;
+	/*
+	 * Original parent of s1d42.  It should always be ignored when testing
+	 * against files under the s1d42 disconnected directory.
+	 */
+	const __u64 allowed_s1d32;
+	/*
+	 * Opened and disconnected source directory.  It should always be enforced
+	 * when testing against files under the s1d41 disconnected directory.
+	 */
+	const __u64 allowed_s1d41;
+	/*
+	 * Opened and disconnected source directory.  It should always be enforced
+	 * when testing against files under the s1d42 disconnected directory.
+	 */
+	const __u64 allowed_s1d42;
+	/*
+	 * File in the s1d41 disconnected directory.  It should always be enforced
+	 * when testing against itself under the s1d41 disconnected directory.
+	 */
+	const __u64 allowed_f1;
+	/*
+	 * File in the s1d41 disconnected directory.  It should always be enforced
+	 * when testing against itself under the s1d41 disconnected directory.
+	 */
+	const __u64 allowed_f2;
+	/*
+	 * File in the s1d42 disconnected directory.  It should always be enforced
+	 * when testing against itself under the s1d42 disconnected directory.
+	 */
+	const __u64 allowed_f3;
+	/*
+	 * Parent of the bind mount destination.  It should always be enforced when
+	 * testing against files under the s1d41 or s1d42 disconnected directories.
+	 */
+	const __u64 allowed_s2d1;
+	/*
+	 * Directory covered by the bind mount.  It should always be ignored when
+	 * testing against files under the s1d41 or s1d42 disconnected directories.
+	 */
+	const __u64 allowed_s2d2;
+	/*
+	 * New parent of the renamed s1d41.  It should always be ignored when
+	 * testing against files under the s1d41 disconnected directory.
+	 */
+	const __u64 allowed_s3d1;
+	/*
+	 * New parent of the renamed s1d42.  It should always be ignored when
+	 * testing against files under the s1d42 disconnected directory.
+	 */
+	const __u64 allowed_s4d1;
+
+	/* Expected result of the call to open([fd:s1d41]/f1, O_RDONLY). */
+	const int expected_read_result;
+	/* Expected result of the call to renameat([fd:s1d41]/f1, [fd:s1d42]/f1). */
+	const int expected_rename_result;
+	/*
+	 * Expected result of the call to renameat([fd:s1d41]/f2, [fd:s1d42]/f3,
+	 * RENAME_EXCHANGE).
+	 */
+	const int expected_exchange_result;
+	/* Expected result of the call to renameat([fd:s1d42]/f4, [fd:s1d42]/f5). */
+	const int expected_same_dir_rename_result;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s1d1_mount_src_parent) {
+	/* clang-format on */
+	.allowed_s1d1 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s1d2_mount_src_refer) {
+	/* clang-format on */
+	.allowed_s1d2 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s1d2_mount_src_create) {
+	/* clang-format on */
+	.allowed_s1d2 = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = EXDEV,
+	.expected_exchange_result = EXDEV,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s1d2_mount_src_rename) {
+	/* clang-format on */
+	.allowed_s1d2 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = 0,
+	.expected_exchange_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s1d31_s1d32_old_parent) {
+	/* clang-format on */
+	.allowed_s1d31 = LANDLOCK_ACCESS_FS_REFER |
+			 LANDLOCK_ACCESS_FS_READ_FILE |
+			 LANDLOCK_ACCESS_FS_EXECUTE |
+			 LANDLOCK_ACCESS_FS_MAKE_REG,
+	.allowed_s1d32 = LANDLOCK_ACCESS_FS_REFER |
+			 LANDLOCK_ACCESS_FS_READ_FILE |
+			 LANDLOCK_ACCESS_FS_EXECUTE |
+			 LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s1d41_s1d42_disconnected) {
+	/* clang-format on */
+	.allowed_s1d41 = LANDLOCK_ACCESS_FS_REFER |
+			 LANDLOCK_ACCESS_FS_READ_FILE |
+			 LANDLOCK_ACCESS_FS_EXECUTE |
+			 LANDLOCK_ACCESS_FS_MAKE_REG,
+	.allowed_s1d42 = LANDLOCK_ACCESS_FS_REFER |
+			 LANDLOCK_ACCESS_FS_READ_FILE |
+			 LANDLOCK_ACCESS_FS_EXECUTE |
+			 LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s2d1_mount_dst_parent_create) {
+	/* clang-format on */
+	.allowed_s2d1 = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = EXDEV,
+	.expected_exchange_result = EXDEV,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s2d1_mount_dst_parent_refer) {
+	/* clang-format on */
+	.allowed_s2d1 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s2d1_mount_dst_parent_mini) {
+	/* clang-format on */
+	.allowed_s2d1 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = 0,
+	.expected_exchange_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s2d2_covered_by_mount) {
+	/* clang-format on */
+	.allowed_s2d2 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* Tests collect_domain_accesses(). */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, s3d1_s4d1_new_parent) {
+	/* clang-format on */
+	.allowed_s3d1 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.allowed_s4d1 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout4_disconnected_leafs, f1_f2_f3) {
+	/* clang-format on */
+	.allowed_f1 = LANDLOCK_ACCESS_FS_READ_FILE,
+	.allowed_f2 = LANDLOCK_ACCESS_FS_READ_FILE,
+	.allowed_f3 = LANDLOCK_ACCESS_FS_READ_FILE,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+TEST_F_FORK(layout4_disconnected_leafs, read_rename_exchange)
+{
+	const __u64 handled_access =
+		LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE |
+		LANDLOCK_ACCESS_FS_EXECUTE | LANDLOCK_ACCESS_FS_MAKE_REG;
+	const struct rule rules[] = {
+		{
+			.path = TMP_DIR "/s1d1",
+			.access = variant->allowed_s1d1,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2",
+			.access = variant->allowed_s1d2,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d31",
+			.access = variant->allowed_s1d31,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d32",
+			.access = variant->allowed_s1d32,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d31/s1d41",
+			.access = variant->allowed_s1d41,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d32/s1d42",
+			.access = variant->allowed_s1d42,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d31/s1d41/f1",
+			.access = variant->allowed_f1,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d31/s1d41/f2",
+			.access = variant->allowed_f2,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d32/s1d42/f3",
+			.access = variant->allowed_f3,
+		},
+		{
+			.path = TMP_DIR "/s2d1",
+			.access = variant->allowed_s2d1,
+		},
+		/* s2d2_fd */
+		{
+			.path = TMP_DIR "/s3d1",
+			.access = variant->allowed_s3d1,
+		},
+		{
+			.path = TMP_DIR "/s4d1",
+			.access = variant->allowed_s4d1,
+		},
+		{},
+	};
+	int ruleset_fd, s1d41_bind_fd, s1d42_bind_fd;
+
+	ruleset_fd = create_ruleset(_metadata, handled_access, rules);
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Adds rule for the covered directory. */
+	if (variant->allowed_s2d2) {
+		ASSERT_EQ(0, landlock_add_rule(
+				     ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				     &(struct landlock_path_beneath_attr){
+					     .parent_fd = self->s2d2_fd,
+					     .allowed_access =
+						     variant->allowed_s2d2,
+				     },
+				     0));
+	}
+	EXPECT_EQ(0, close(self->s2d2_fd));
+
+	s1d41_bind_fd = open(TMP_DIR "/s2d1/s2d2/s1d31/s1d41",
+			     O_DIRECTORY | O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, s1d41_bind_fd);
+	s1d42_bind_fd = open(TMP_DIR "/s2d1/s2d2/s1d32/s1d42",
+			     O_DIRECTORY | O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, s1d42_bind_fd);
+
+	/* Disconnects and checks source and destination directories. */
+	EXPECT_EQ(0, test_open_rel(s1d41_bind_fd, "..", O_DIRECTORY));
+	EXPECT_EQ(0, test_open_rel(s1d42_bind_fd, "..", O_DIRECTORY));
+	/* Renames to make it accessible through s3d1/s1d41 */
+	ASSERT_EQ(0, test_renameat(AT_FDCWD, TMP_DIR "/s1d1/s1d2/s1d31/s1d41",
+				   AT_FDCWD, TMP_DIR "/s3d1/s1d41"));
+	/* Renames to make it accessible through s4d1/s1d42 */
+	ASSERT_EQ(0, test_renameat(AT_FDCWD, TMP_DIR "/s1d1/s1d2/s1d32/s1d42",
+				   AT_FDCWD, TMP_DIR "/s4d1/s1d42"));
+	EXPECT_EQ(ENOENT, test_open_rel(s1d41_bind_fd, "..", O_DIRECTORY));
+	EXPECT_EQ(ENOENT, test_open_rel(s1d42_bind_fd, "..", O_DIRECTORY));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	EXPECT_EQ(variant->expected_read_result,
+		  test_open_rel(s1d41_bind_fd, "f1", O_RDONLY));
+
+	EXPECT_EQ(variant->expected_rename_result,
+		  test_renameat(s1d41_bind_fd, "f1", s1d42_bind_fd, "f1"));
+	EXPECT_EQ(variant->expected_exchange_result,
+		  test_exchangeat(s1d41_bind_fd, "f2", s1d42_bind_fd, "f3"));
+
+	EXPECT_EQ(variant->expected_same_dir_rename_result,
+		  test_renameat(s1d42_bind_fd, "f4", s1d42_bind_fd, "f5"));
+}
+
+/*
+ * layout5_disconnected_branch before rename:
+ *
+ * tmp
+ * ├── s1d1
+ * │   └── s1d2 [source of the first bind mount]
+ * │       └── s1d3
+ * │           ├── s1d41
+ * │           │   ├── f1
+ * │           │   └── f2
+ * │           └── s1d42
+ * │               ├── f3
+ * │               └── f4
+ * ├── s2d1
+ * │   └── s2d2 [source of the second bind mount]
+ * │       └── s2d3
+ * │           └── s2d4 [first s1d2 bind mount]
+ * │               └── s1d3
+ * │                   ├── s1d41
+ * │                   │   ├── f1
+ * │                   │   └── f2
+ * │                   └── s1d42
+ * │                       ├── f3
+ * │                       └── f4
+ * ├── s3d1
+ * │   └── s3d2 [second s2d2 bind mount]
+ * │       └── s2d3
+ * │           └── s2d4 [first s1d2 bind mount]
+ * │               └── s1d3
+ * │                   ├── s1d41
+ * │                   │   ├── f1
+ * │                   │   └── f2
+ * │                   └── s1d42
+ * │                       ├── f3
+ * │                       └── f4
+ * └── s4d1
+ *
+ * After rename:
+ *
+ * tmp
+ * ├── s1d1
+ * │   └── s1d2 [source of the first bind mount]
+ * │       └── s1d3
+ * │           ├── s1d41
+ * │           │   ├── f1
+ * │           │   └── f2
+ * │           └── s1d42
+ * │               ├── f3
+ * │               └── f4
+ * ├── s2d1
+ * │   └── s2d2 [source of the second bind mount]
+ * ├── s3d1
+ * │   └── s3d2 [second s2d2 bind mount]
+ * └── s4d1
+ *     └── s2d3 [renamed here]
+ *         └── s2d4 [first s1d2 bind mount]
+ *             └── s1d3
+ *                 ├── s1d41
+ *                 │   ├── f1
+ *                 │   └── f2
+ *                 └── s1d42
+ *                     ├── f3
+ *                     └── f4
+ *
+ * Decision path: s1d3 -> s1d2 -> s2d2 -> s3d1 -> tmp
+ * s2d3 is ignored, as well as the directories under the mount points.
+ */
+
+/* clang-format off */
+FIXTURE(layout5_disconnected_branch) {
+	int s2d4_fd, s3d2_fd;
+};
+/* clang-format on */
+
+FIXTURE_SETUP(layout5_disconnected_branch)
+{
+	prepare_layout(_metadata);
+
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d3/s1d41/f1");
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d3/s1d41/f2");
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d3/s1d42/f3");
+	create_file(_metadata, TMP_DIR "/s1d1/s1d2/s1d3/s1d42/f4");
+	create_directory(_metadata, TMP_DIR "/s2d1/s2d2/s2d3/s2d4");
+	create_directory(_metadata, TMP_DIR "/s3d1/s3d2");
+	create_directory(_metadata, TMP_DIR "/s4d1");
+
+	self->s2d4_fd = open(TMP_DIR "/s2d1/s2d2/s2d3/s2d4",
+			     O_DIRECTORY | O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, self->s2d4_fd);
+
+	self->s3d2_fd =
+		open(TMP_DIR "/s3d1/s3d2", O_DIRECTORY | O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, self->s3d2_fd);
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, mount(TMP_DIR "/s1d1/s1d2", TMP_DIR "/s2d1/s2d2/s2d3/s2d4",
+			   NULL, MS_BIND, NULL));
+	ASSERT_EQ(0, mount(TMP_DIR "/s2d1/s2d2", TMP_DIR "/s3d1/s3d2", NULL,
+			   MS_BIND | MS_REC, NULL));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+FIXTURE_TEARDOWN_PARENT(layout5_disconnected_branch)
+{
+	/* Bind mounts are handled by namespace lifetime. */
+
+	/* Removes files after renames. */
+	remove_path(TMP_DIR "/s1d1/s1d2/s1d3/s1d41/f1");
+	remove_path(TMP_DIR "/s1d1/s1d2/s1d3/s1d41/f2");
+	remove_path(TMP_DIR "/s1d1/s1d2/s1d3/s1d42/f1");
+	remove_path(TMP_DIR "/s1d1/s1d2/s1d3/s1d42/f3");
+	remove_path(TMP_DIR "/s1d1/s1d2/s1d3/s1d42/f4");
+	remove_path(TMP_DIR "/s1d1/s1d2/s1d3/s1d42/f5");
+
+	cleanup_layout(_metadata);
+}
+
+FIXTURE_VARIANT(layout5_disconnected_branch)
+{
+	/*
+	 * Parent of all files.  It should always be enforced when testing against
+	 * files under the s1d41 or s1d42 disconnected directories.
+	 */
+	const __u64 allowed_base;
+	/*
+	 * Parent of the first bind mount source.  It should always be ignored when
+	 * testing against files under the s1d41 or s1d42 disconnected directories.
+	 */
+	const __u64 allowed_s1d1;
+	const __u64 allowed_s1d2;
+	const __u64 allowed_s1d3;
+	const __u64 allowed_s2d1;
+	const __u64 allowed_s2d2;
+	const __u64 allowed_s2d3;
+	const __u64 allowed_s2d4;
+	const __u64 allowed_s3d1;
+	const __u64 allowed_s3d2;
+	const __u64 allowed_s4d1;
+
+	/* Expected result of the call to open([fd:s1d3]/s1d41/f1, O_RDONLY). */
+	const int expected_read_result;
+	/*
+	 * Expected result of the call to renameat([fd:s1d3]/s1d41/f1,
+	 * [fd:s1d3]/s1d42/f1).
+	 */
+	const int expected_rename_result;
+	/*
+	 * Expected result of the call to renameat([fd:s1d3]/s1d41/f2,
+	 * [fd:s1d3]/s1d42/f3,  RENAME_EXCHANGE).
+	 */
+	const int expected_exchange_result;
+	/*
+	 * Expected result of the call to renameat([fd:s1d3]/s1d42/f4,
+	 * [fd:s1d3]/s1d42/f5).
+	 */
+	const int expected_same_dir_rename_result;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d1_mount1_src_parent) {
+	/* clang-format on */
+	.allowed_s1d1 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d2_mount1_src_refer) {
+	/* clang-format on */
+	.allowed_s1d2 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d2_mount1_src_create) {
+	/* clang-format on */
+	.allowed_s1d2 = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = EXDEV,
+	.expected_exchange_result = EXDEV,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d2_mount1_src_rename) {
+	/* clang-format on */
+	.allowed_s1d2 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = 0,
+	.expected_exchange_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d3_fd_refer) {
+	/* clang-format on */
+	.allowed_s1d3 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d3_fd_create) {
+	/* clang-format on */
+	.allowed_s1d3 = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = EXDEV,
+	.expected_exchange_result = EXDEV,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d3_fd_rename) {
+	/* clang-format on */
+	.allowed_s1d3 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = 0,
+	.expected_exchange_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s1d3_fd_full) {
+	/* clang-format on */
+	.allowed_s1d3 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = 0,
+	.expected_exchange_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s2d1_mount2_src_parent) {
+	/* clang-format on */
+	.allowed_s2d1 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s2d2_mount2_src_refer) {
+	/* clang-format on */
+	.allowed_s2d2 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s2d2_mount2_src_create) {
+	/* clang-format on */
+	.allowed_s2d2 = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = EXDEV,
+	.expected_exchange_result = EXDEV,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s2d2_mount2_src_rename) {
+	/* clang-format on */
+	.allowed_s2d2 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = 0,
+	.expected_exchange_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s2d3_mount1_dst_parent) {
+	/* clang-format on */
+	.allowed_s2d3 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s2d4_mount1_dst) {
+	/* clang-format on */
+	.allowed_s2d4 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s3d1_mount2_dst_parent_refer) {
+	/* clang-format on */
+	.allowed_s3d1 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s3d1_mount2_dst_parent_create) {
+	/* clang-format on */
+	.allowed_s3d1 = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = 0,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = EXDEV,
+	.expected_exchange_result = EXDEV,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s3d1_mount2_dst_parent_rename) {
+	/* clang-format on */
+	.allowed_s3d1 = LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = 0,
+	.expected_rename_result = 0,
+	.expected_exchange_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s3d2_mount1_dst) {
+	/* clang-format on */
+	.allowed_s3d2 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(layout5_disconnected_branch, s4d1_rename_parent) {
+	/* clang-format on */
+	.allowed_s4d1 = LANDLOCK_ACCESS_FS_REFER |
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_EXECUTE |
+			LANDLOCK_ACCESS_FS_MAKE_REG,
+	.expected_read_result = EACCES,
+	.expected_same_dir_rename_result = EACCES,
+	.expected_rename_result = EACCES,
+	.expected_exchange_result = EACCES,
+};
+
+TEST_F_FORK(layout5_disconnected_branch, read_rename_exchange)
+{
+	const __u64 handled_access =
+		LANDLOCK_ACCESS_FS_REFER | LANDLOCK_ACCESS_FS_READ_FILE |
+		LANDLOCK_ACCESS_FS_EXECUTE | LANDLOCK_ACCESS_FS_MAKE_REG;
+	const struct rule rules[] = {
+		{
+			.path = TMP_DIR "/s1d1",
+			.access = variant->allowed_s1d1,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2",
+			.access = variant->allowed_s1d2,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d3",
+			.access = variant->allowed_s1d3,
+		},
+		{
+			.path = TMP_DIR "/s2d1",
+			.access = variant->allowed_s2d1,
+		},
+		{
+			.path = TMP_DIR "/s2d1/s2d2",
+			.access = variant->allowed_s2d2,
+		},
+		{
+			.path = TMP_DIR "/s2d1/s2d2/s2d3",
+			.access = variant->allowed_s2d3,
+		},
+		/* s2d4_fd */
+		{
+			.path = TMP_DIR "/s3d1",
+			.access = variant->allowed_s3d1,
+		},
+		/* s3d2_fd */
+		{
+			.path = TMP_DIR "/s4d1",
+			.access = variant->allowed_s4d1,
+		},
+		{},
+	};
+	int ruleset_fd, s1d3_bind_fd;
+
+	ruleset_fd = create_ruleset(_metadata, handled_access, rules);
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Adds rules for the covered directories. */
+	if (variant->allowed_s2d4) {
+		ASSERT_EQ(0, landlock_add_rule(
+				     ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				     &(struct landlock_path_beneath_attr){
+					     .parent_fd = self->s2d4_fd,
+					     .allowed_access =
+						     variant->allowed_s2d4,
+				     },
+				     0));
+	}
+	EXPECT_EQ(0, close(self->s2d4_fd));
+
+	if (variant->allowed_s3d2) {
+		ASSERT_EQ(0, landlock_add_rule(
+				     ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+				     &(struct landlock_path_beneath_attr){
+					     .parent_fd = self->s3d2_fd,
+					     .allowed_access =
+						     variant->allowed_s3d2,
+				     },
+				     0));
+	}
+	EXPECT_EQ(0, close(self->s3d2_fd));
+
+	s1d3_bind_fd = open(TMP_DIR "/s3d1/s3d2/s2d3/s2d4/s1d3",
+			    O_DIRECTORY | O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, s1d3_bind_fd);
+
+	/* Disconnects and checks source and destination directories. */
+	EXPECT_EQ(0, test_open_rel(s1d3_bind_fd, "../../..", O_DIRECTORY));
+	/* Renames to make it accessible through s3d1/s1d41 */
+	ASSERT_EQ(0, test_renameat(AT_FDCWD, TMP_DIR "/s2d1/s2d2/s2d3",
+				   AT_FDCWD, TMP_DIR "/s4d1/s2d3"));
+	EXPECT_EQ(ENOENT, test_open_rel(s1d3_bind_fd, "../../..", O_DIRECTORY));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	EXPECT_EQ(variant->expected_read_result,
+		  test_open_rel(s1d3_bind_fd, "s1d41/f1", O_RDONLY));
+
+	EXPECT_EQ(variant->expected_rename_result,
+		  test_renameat(s1d3_bind_fd, "s1d41/f1", s1d3_bind_fd,
+				"s1d42/f1"));
+	EXPECT_EQ(variant->expected_exchange_result,
+		  test_exchangeat(s1d3_bind_fd, "s1d41/f2", s1d3_bind_fd,
+				  "s1d42/f3"));
+
+	EXPECT_EQ(variant->expected_same_dir_rename_result,
+		  test_renameat(s1d3_bind_fd, "s1d42/f4", s1d3_bind_fd,
+				"s1d42/f5"));
+}
+
 #define LOWER_BASE TMP_DIR "/lower"
 #define LOWER_DATA LOWER_BASE "/data"
 static const char lower_fl1[] = LOWER_DATA "/fl1";
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-19 10:42 ` [PATCH v3 2/4] landlock: Fix handling of disconnected directories Mickaël Salaün
@ 2025-07-22 18:04   ` Tingmao Wang
  2025-07-23 21:01     ` Mickaël Salaün
  0 siblings, 1 reply; 12+ messages in thread
From: Tingmao Wang @ 2025-07-22 18:04 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Al Viro, Ben Scarlato, Christian Brauner, Daniel Burgener,
	Jann Horn, Jeff Xu, NeilBrown, Paul Moore, Ryan Sullivan,
	Song Liu, linux-fsdevel, linux-security-module

On 7/19/25 11:42, Mickaël Salaün wrote:
> [...]
> @@ -784,12 +787,18 @@ static bool is_access_to_paths_allowed(
>  	if (WARN_ON_ONCE(!layer_masks_parent1))
>  		return false;
>  
> -	allowed_parent1 = is_layer_masks_allowed(layer_masks_parent1);
> -
>  	if (unlikely(layer_masks_parent2)) {
>  		if (WARN_ON_ONCE(!dentry_child1))
>  			return false;
>  
> +		/*
> +		 * Creates a backup of the initial layer masks to be able to restore
> +		 * them if we find out that we were walking a disconnected directory,
> +		 * which would make the collected access rights inconsistent (cf.
> +		 * reset_to_mount_root).
> +		 */

This comment is duplicate with the one below, is this intentional?

> [...]

On the other hand, I'm still a bit uncertain about the domain check
semantics.  While it would not cause a rename to be allowed if it is
otherwise not allowed by any rules on or above the mountpoint, this gets a
bit weird if we have a situation where renames are allowed on the
mountpoint or everywhere, but not read/writes, however read/writes are
allowed directly on a file, but the dir containing that file gets
disconnected so the sandboxed application can't read or write to it.
(Maybe someone would set up such a policy where renames are allowed,
expecting Landlock to always prevent renames where additional permissions
would be exposed?)

In the above situation, if the file is then moved to a connected
directory, it will become readable/writable again.

Here is an example test, using the layout1_bind fixture for flexibility
for now (and also because I needed to just go to bed yesterday lol) (but
this would probably be better written as an additional
layout5_disconnected_branch variant).

diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 21dd95aaf5e4..2274f165d933 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -5100,6 +5100,118 @@ TEST_F_FORK(layout1_bind, path_disconnected_rename)
 	EXPECT_EQ(0, test_open(file1_s1d3, O_RDONLY));
 }
 
+static void
+path_disconnected_gain_back_rights_via_rename(struct __test_metadata *_metadata,
+					      bool has_read_rule_on_other_d)
+{
+	/*
+	 * This is a ruleset where rename/create/delete rights are allowed
+	 * anywhere under the mount, and so still applies after path gets
+	 * disconnected.  However the only read right is given to the file
+	 * directly, and therefore the file is no longer readable after the
+	 * path to it being disconnected.
+	 */
+	// clang-format off
+	struct rule layer1[] = {
+		{
+			.path = dir_s2d2,
+			.access = LANDLOCK_ACCESS_FS_REFER |
+					LANDLOCK_ACCESS_FS_MAKE_DIR |
+					LANDLOCK_ACCESS_FS_REMOVE_DIR |
+					LANDLOCK_ACCESS_FS_MAKE_REG |
+					LANDLOCK_ACCESS_FS_REMOVE_FILE
+		},
+		{
+			.path = file1_s1d3,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{
+			.path = TMP_DIR "/s1d1/s1d2/s1d3_2",
+			.access = LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{}
+	};
+	// clang-format on
+
+	int ruleset_fd, bind_s1d3_fd, res;
+
+	if (!has_read_rule_on_other_d) {
+		layer1[2].path = NULL;
+		layer1[2].access = 0;
+	}
+
+	ASSERT_EQ(0, mkdir(dir_s4d1, 0755))
+	{
+		TH_LOG("Failed to create %s: %s", dir_s4d1, strerror(errno));
+	}
+
+	/* Directory used to move the file into, in order to try to regain read */
+	ASSERT_EQ(0, mkdir(TMP_DIR "/s1d1/s1d2/s1d3_2", 0755))
+	{
+		TH_LOG("Failed to create %s: %s", TMP_DIR "/s1d1/s1d2/s1d3_2",
+		       strerror(errno));
+	}
+
+	ruleset_fd = create_ruleset(_metadata, ACCESS_ALL, layer1);
+	ASSERT_LE(0, ruleset_fd);
+
+	bind_s1d3_fd = open(bind_dir_s1d3, O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, bind_s1d3_fd);
+	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+
+	/* Make disconnected */
+	ASSERT_EQ(0, rename(dir_s1d3, dir_s4d2))
+	{
+		TH_LOG("Failed to rename %s to %s: %s", dir_s1d3, dir_s4d2,
+		       strerror(errno));
+	}
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* We shouldn't be able to read file1 under disconnected path now */
+	EXPECT_EQ(EACCES, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
+
+	/*
+	 * But can we circumvent it by moving file1 to a connected path when
+	 * either we're allowed to read that move destination, or if we have
+	 * allow rules on the original file, then the move target doesn't even
+	 * need read rules on itself.
+	 *
+	 * This is possible even though the domain check should semantically
+	 * ensure that any path (?) we can't read can't become readable
+	 * (through that path) again by a rename?
+	 */
+	res = renameat(bind_s1d3_fd, file1_name, AT_FDCWD,
+		       TMP_DIR "/s2d1/s2d2/s1d3_2/f1");
+	if (res == 0) {
+		TH_LOG("Renamed file1 to %s, which should not have been allowed.",
+		       TMP_DIR "/s2d1/s2d2/s1d3_2/f1");
+		/* At this point the test has failed, but let's try reading it */
+		res = test_open(TMP_DIR "/s2d1/s2d2/s1d3_2/f1", O_RDONLY);
+		if (res != 0) {
+			TH_LOG("Failed to read file1 after rename: %s",
+			       strerror(res));
+		} else {
+			TH_LOG("file1 is readable after rename!");
+			ASSERT_TRUE(false);
+		}
+		ASSERT_TRUE(false);
+	}
+	ASSERT_EQ(-1, res);
+	EXPECT_EQ(EXDEV, errno);
+}
+
+TEST_F_FORK(layout1_bind, path_disconnected_gain_back_rights_1)
+{
+	path_disconnected_gain_back_rights_via_rename(_metadata, false);
+}
+
+TEST_F_FORK(layout1_bind, path_disconnected_gain_back_rights_2)
+{
+	path_disconnected_gain_back_rights_via_rename(_metadata, true);
+}
+
 /*
  * Test that linkat(2) with disconnected paths works under Landlock. This
  * test moves s1d3 to s4d1.

The behavior is as hypothesized above:

	root@b8f2ef644787 /t/landlock# ./fs_test -t path_disconnected_gain_back_rights_1 -t path_disconnected_gain_back_rights_2
	TAP version 13
	1..2
	# Starting 2 tests from 1 test cases.
	#  RUN           layout1_bind.path_disconnected_gain_back_rights_1 ...
	# fs_test.c:5188:path_disconnected_gain_back_rights_1:Renamed file1 to tmp/s2d1/s2d2/s1d3_2/f1, which should not have been allowed.
	# fs_test.c:5196:path_disconnected_gain_back_rights_1:file1 is readable after rename!
	# fs_test.c:5197:path_disconnected_gain_back_rights_1:Expected 0 (0) != false (0)
	# path_disconnected_gain_back_rights_1: Test terminated by assertion
	#          FAIL  layout1_bind.path_disconnected_gain_back_rights_1
	not ok 1 layout1_bind.path_disconnected_gain_back_rights_1
	#  RUN           layout1_bind.path_disconnected_gain_back_rights_2 ...
	# fs_test.c:5188:path_disconnected_gain_back_rights_2:Renamed file1 to tmp/s2d1/s2d2/s1d3_2/f1, which should not have been allowed.
	# fs_test.c:5196:path_disconnected_gain_back_rights_2:file1 is readable after rename!
	# fs_test.c:5197:path_disconnected_gain_back_rights_2:Expected 0 (0) != false (0)
	# path_disconnected_gain_back_rights_2: Test terminated by assertion
	#          FAIL  layout1_bind.path_disconnected_gain_back_rights_2
	not ok 2 layout1_bind.path_disconnected_gain_back_rights_2
	# FAILED: 0 / 2 tests passed.
	# Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0

Would it be worth it to have the domain check take into account this edge
case?  (But on the other hand, one could argue that if rights are granted
directly to a file, then the policy author intended for access to be
allowed, but in which case shouldn't access, even if through disconnected
path, be allowed?)

Best,
Tingmao

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-22 18:04   ` Tingmao Wang
@ 2025-07-23 21:01     ` Mickaël Salaün
  2025-07-24 14:49       ` Günther Noack
  2025-07-28  0:13       ` Tingmao Wang
  0 siblings, 2 replies; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-23 21:01 UTC (permalink / raw)
  To: Tingmao Wang, Günther Noack, Jann Horn, John Johansen
  Cc: Al Viro, Ben Scarlato, Christian Brauner, Daniel Burgener,
	Jeff Xu, NeilBrown, Paul Moore, Ryan Sullivan, Song Liu,
	linux-fsdevel, linux-security-module

On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:
> On 7/19/25 11:42, Mickaël Salaün wrote:
> > [...]
> > @@ -784,12 +787,18 @@ static bool is_access_to_paths_allowed(
> >  	if (WARN_ON_ONCE(!layer_masks_parent1))
> >  		return false;
> >  
> > -	allowed_parent1 = is_layer_masks_allowed(layer_masks_parent1);
> > -
> >  	if (unlikely(layer_masks_parent2)) {
> >  		if (WARN_ON_ONCE(!dentry_child1))
> >  			return false;
> >  
> > +		/*
> > +		 * Creates a backup of the initial layer masks to be able to restore
> > +		 * them if we find out that we were walking a disconnected directory,
> > +		 * which would make the collected access rights inconsistent (cf.
> > +		 * reset_to_mount_root).
> > +		 */
> 
> This comment is duplicate with the one below, is this intentional?
> 
> > [...]
> 
> On the other hand, I'm still a bit uncertain about the domain check
> semantics.  While it would not cause a rename to be allowed if it is
> otherwise not allowed by any rules on or above the mountpoint, this gets a
> bit weird if we have a situation where renames are allowed on the
> mountpoint or everywhere, but not read/writes, however read/writes are
> allowed directly on a file, but the dir containing that file gets
> disconnected so the sandboxed application can't read or write to it.
> (Maybe someone would set up such a policy where renames are allowed,
> expecting Landlock to always prevent renames where additional permissions
> would be exposed?)
> 
> In the above situation, if the file is then moved to a connected
> directory, it will become readable/writable again.

We can generalize this issue to not only the end file but any component
of the path: disconnected directories.  In fact, the main issue is the
potential inconsistency of access checks over time (e.g. between two
renames).  This could be exploited to bypass the security checks done
for FS_REFER.

I see two solutions:

1. *Always* walk down to the IS_ROOT directory, and then jump to the
   mount point.  This makes it possible to have consistent access checks
   for renames and open/use.  The first downside is that that would
   change the current behavior for bind mounts that could get more
   access rights (if the policy explicitly sets rights for the hidden
   directories).  The second downside is that we'll do more walk.

2. Return -EACCES (or -ENOENT) for actions involving disconnected
   directories, or renames of disconnected opened files.  This second
   solution is simpler and safer but completely disables the use of
   disconnected directories and the rename of disconnected files for
   sandboxed processes.

It would be much better to be able to handle opened directories as
(object) capabilities, but that is not currently possible because of the
way paths are handled by the VFS and LSM hooks.

Tingmao, Günther, Jann, what do you think?

It looks like AppArmor also denies access to disconnected path in some
cases, but it tries to reconstruct the path for known internal
filesystems, and it seems to specifically handle the case of chroot.  I
don't know when PATH_CONNECT_PATH is set though.

John, could you please clarify how disconnected directories and files
are handled by AppArmor?

> 
> Here is an example test, using the layout1_bind fixture for flexibility
> for now (and also because I needed to just go to bed yesterday lol) (but
> this would probably be better written as an additional
> layout5_disconnected_branch variant).
> 
> diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
> index 21dd95aaf5e4..2274f165d933 100644
> --- a/tools/testing/selftests/landlock/fs_test.c
> +++ b/tools/testing/selftests/landlock/fs_test.c
> @@ -5100,6 +5100,118 @@ TEST_F_FORK(layout1_bind, path_disconnected_rename)
>  	EXPECT_EQ(0, test_open(file1_s1d3, O_RDONLY));
>  }
>  
> +static void
> +path_disconnected_gain_back_rights_via_rename(struct __test_metadata *_metadata,
> +					      bool has_read_rule_on_other_d)
> +{
> +	/*
> +	 * This is a ruleset where rename/create/delete rights are allowed
> +	 * anywhere under the mount, and so still applies after path gets
> +	 * disconnected.  However the only read right is given to the file
> +	 * directly, and therefore the file is no longer readable after the
> +	 * path to it being disconnected.
> +	 */
> +	// clang-format off
> +	struct rule layer1[] = {
> +		{
> +			.path = dir_s2d2,
> +			.access = LANDLOCK_ACCESS_FS_REFER |
> +					LANDLOCK_ACCESS_FS_MAKE_DIR |
> +					LANDLOCK_ACCESS_FS_REMOVE_DIR |
> +					LANDLOCK_ACCESS_FS_MAKE_REG |
> +					LANDLOCK_ACCESS_FS_REMOVE_FILE
> +		},
> +		{
> +			.path = file1_s1d3,
> +			.access = LANDLOCK_ACCESS_FS_READ_FILE,
> +		},
> +		{
> +			.path = TMP_DIR "/s1d1/s1d2/s1d3_2",
> +			.access = LANDLOCK_ACCESS_FS_READ_FILE,
> +		},
> +		{}
> +	};
> +	// clang-format on
> +
> +	int ruleset_fd, bind_s1d3_fd, res;
> +
> +	if (!has_read_rule_on_other_d) {
> +		layer1[2].path = NULL;
> +		layer1[2].access = 0;
> +	}
> +
> +	ASSERT_EQ(0, mkdir(dir_s4d1, 0755))
> +	{
> +		TH_LOG("Failed to create %s: %s", dir_s4d1, strerror(errno));
> +	}
> +
> +	/* Directory used to move the file into, in order to try to regain read */
> +	ASSERT_EQ(0, mkdir(TMP_DIR "/s1d1/s1d2/s1d3_2", 0755))
> +	{
> +		TH_LOG("Failed to create %s: %s", TMP_DIR "/s1d1/s1d2/s1d3_2",
> +		       strerror(errno));
> +	}
> +
> +	ruleset_fd = create_ruleset(_metadata, ACCESS_ALL, layer1);
> +	ASSERT_LE(0, ruleset_fd);
> +
> +	bind_s1d3_fd = open(bind_dir_s1d3, O_PATH | O_CLOEXEC);
> +	ASSERT_LE(0, bind_s1d3_fd);
> +	EXPECT_EQ(0, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
> +
> +	/* Make disconnected */
> +	ASSERT_EQ(0, rename(dir_s1d3, dir_s4d2))
> +	{
> +		TH_LOG("Failed to rename %s to %s: %s", dir_s1d3, dir_s4d2,
> +		       strerror(errno));
> +	}
> +
> +	enforce_ruleset(_metadata, ruleset_fd);
> +	EXPECT_EQ(0, close(ruleset_fd));
> +
> +	/* We shouldn't be able to read file1 under disconnected path now */
> +	EXPECT_EQ(EACCES, test_open_rel(bind_s1d3_fd, file1_name, O_RDONLY));
> +
> +	/*
> +	 * But can we circumvent it by moving file1 to a connected path when
> +	 * either we're allowed to read that move destination, or if we have
> +	 * allow rules on the original file, then the move target doesn't even
> +	 * need read rules on itself.
> +	 *
> +	 * This is possible even though the domain check should semantically
> +	 * ensure that any path (?) we can't read can't become readable
> +	 * (through that path) again by a rename?
> +	 */
> +	res = renameat(bind_s1d3_fd, file1_name, AT_FDCWD,
> +		       TMP_DIR "/s2d1/s2d2/s1d3_2/f1");
> +	if (res == 0) {
> +		TH_LOG("Renamed file1 to %s, which should not have been allowed.",
> +		       TMP_DIR "/s2d1/s2d2/s1d3_2/f1");
> +		/* At this point the test has failed, but let's try reading it */
> +		res = test_open(TMP_DIR "/s2d1/s2d2/s1d3_2/f1", O_RDONLY);
> +		if (res != 0) {
> +			TH_LOG("Failed to read file1 after rename: %s",
> +			       strerror(res));
> +		} else {
> +			TH_LOG("file1 is readable after rename!");
> +			ASSERT_TRUE(false);
> +		}
> +		ASSERT_TRUE(false);
> +	}
> +	ASSERT_EQ(-1, res);
> +	EXPECT_EQ(EXDEV, errno);
> +}
> +
> +TEST_F_FORK(layout1_bind, path_disconnected_gain_back_rights_1)
> +{
> +	path_disconnected_gain_back_rights_via_rename(_metadata, false);
> +}
> +
> +TEST_F_FORK(layout1_bind, path_disconnected_gain_back_rights_2)
> +{
> +	path_disconnected_gain_back_rights_via_rename(_metadata, true);
> +}
> +
>  /*
>   * Test that linkat(2) with disconnected paths works under Landlock. This
>   * test moves s1d3 to s4d1.
> 
> The behavior is as hypothesized above:
> 
> 	root@b8f2ef644787 /t/landlock# ./fs_test -t path_disconnected_gain_back_rights_1 -t path_disconnected_gain_back_rights_2
> 	TAP version 13
> 	1..2
> 	# Starting 2 tests from 1 test cases.
> 	#  RUN           layout1_bind.path_disconnected_gain_back_rights_1 ...
> 	# fs_test.c:5188:path_disconnected_gain_back_rights_1:Renamed file1 to tmp/s2d1/s2d2/s1d3_2/f1, which should not have been allowed.
> 	# fs_test.c:5196:path_disconnected_gain_back_rights_1:file1 is readable after rename!
> 	# fs_test.c:5197:path_disconnected_gain_back_rights_1:Expected 0 (0) != false (0)
> 	# path_disconnected_gain_back_rights_1: Test terminated by assertion
> 	#          FAIL  layout1_bind.path_disconnected_gain_back_rights_1
> 	not ok 1 layout1_bind.path_disconnected_gain_back_rights_1
> 	#  RUN           layout1_bind.path_disconnected_gain_back_rights_2 ...
> 	# fs_test.c:5188:path_disconnected_gain_back_rights_2:Renamed file1 to tmp/s2d1/s2d2/s1d3_2/f1, which should not have been allowed.
> 	# fs_test.c:5196:path_disconnected_gain_back_rights_2:file1 is readable after rename!
> 	# fs_test.c:5197:path_disconnected_gain_back_rights_2:Expected 0 (0) != false (0)
> 	# path_disconnected_gain_back_rights_2: Test terminated by assertion
> 	#          FAIL  layout1_bind.path_disconnected_gain_back_rights_2
> 	not ok 2 layout1_bind.path_disconnected_gain_back_rights_2
> 	# FAILED: 0 / 2 tests passed.
> 	# Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> 
> Would it be worth it to have the domain check take into account this edge
> case?  (But on the other hand, one could argue that if rights are granted
> directly to a file, then the policy author intended for access to be
> allowed, but in which case shouldn't access, even if through disconnected
> path, be allowed?)
> 
> Best,
> Tingmao
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-23 21:01     ` Mickaël Salaün
@ 2025-07-24 14:49       ` Günther Noack
  2025-07-24 17:10         ` Mickaël Salaün
  2025-07-28  0:13       ` Tingmao Wang
  1 sibling, 1 reply; 12+ messages in thread
From: Günther Noack @ 2025-07-24 14:49 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Tingmao Wang, Jann Horn, John Johansen, Al Viro, Ben Scarlato,
	Christian Brauner, Daniel Burgener, Jeff Xu, NeilBrown,
	Paul Moore, Ryan Sullivan, Song Liu, linux-fsdevel,
	linux-security-module

On Wed, Jul 23, 2025 at 11:01:42PM +0200, Mickaël Salaün wrote:
> On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:
> > On the other hand, I'm still a bit uncertain about the domain check
> > semantics.  While it would not cause a rename to be allowed if it is
> > otherwise not allowed by any rules on or above the mountpoint, this gets a
> > bit weird if we have a situation where renames are allowed on the
> > mountpoint or everywhere, but not read/writes, however read/writes are
> > allowed directly on a file, but the dir containing that file gets
> > disconnected so the sandboxed application can't read or write to it.
> > (Maybe someone would set up such a policy where renames are allowed,
> > expecting Landlock to always prevent renames where additional permissions
> > would be exposed?)
> > 
> > In the above situation, if the file is then moved to a connected
> > directory, it will become readable/writable again.
> 
> We can generalize this issue to not only the end file but any component
> of the path: disconnected directories.  In fact, the main issue is the
> potential inconsistency of access checks over time (e.g. between two
> renames).  This could be exploited to bypass the security checks done
> for FS_REFER.
> 
> I see two solutions:
> 
> 1. *Always* walk down to the IS_ROOT directory, and then jump to the
>    mount point.  This makes it possible to have consistent access checks
>    for renames and open/use.  The first downside is that that would
>    change the current behavior for bind mounts that could get more
>    access rights (if the policy explicitly sets rights for the hidden
>    directories).  The second downside is that we'll do more walk.
> 
> 2. Return -EACCES (or -ENOENT) for actions involving disconnected
>    directories, or renames of disconnected opened files.  This second
>    solution is simpler and safer but completely disables the use of
>    disconnected directories and the rename of disconnected files for
>    sandboxed processes.
> 
> It would be much better to be able to handle opened directories as
> (object) capabilities, but that is not currently possible because of the
> way paths are handled by the VFS and LSM hooks.
> 
> Tingmao, Günther, Jann, what do you think?

I have to admit that so far, I still failed to wrap my head around the
full patch set and its possible corner cases.  I hope I did not
misunderstand things all too badly below:

As far as I understand the proposed patch, we are "checkpointing" the
intermediate results of the path walk at every mount point boundary,
and in the case where we run into a disconnected directory in one of
the nested mount points, we restore from the intermediate result at
the previous mount point directory and skip to the next mount point.

Visually speaking, if the layout is this (where ":" denotes a
mountpoint boundary between the mountpoints MP1, MP2, MP3):

                          dirfd
                            |
          :                 V         :
	  :       ham <--- spam <--- eggs <--- x.txt
	  :    (disconn.)             :
          :                           :
  / <--- foo <--- bar <--- baz        :
          :                           :
    MP1                 MP2                  MP3

When a process holds a reference to the "spam" directory, which is now
disconnected, and invokes openat(dirfd, "eggs/x.txt", ...), then we
would:

  * traverse x.txt
  * traverse eggs (checkpointing the intermediate result) <-.
  * traverse spam                                           |
  * traverse ham                                            |
  * discover that ham is disconnected:                      |
     * restore the intermediate result from "eggs" ---------'
     * continue the walk at foo
  * end up at the root

So effectively, since the results from "spam" and "ham" are discarded,
we would traverse only the inodes in the outer and inner mountpoints
MP1 and MP3, but effectively return a result that looks like we did
not traverse MP2?

Maybe (likely) I misread the code. :) It's not clear to me what the
thinking behind this is.  Also, if there was another directory in
between "spam" and "eggs" in MP2, wouldn't we be missing the access
rights attached to this directory?


Regarding the capability approach:

I agree that a "capability" approach would be the better solution, but
it seems infeasible with the existing LSM hooks at the moment.  I
would be in favor of it though.

To spell it out a bit more explicitly what that would mean in my mind:

When a path is looked up relative to a dirfd, the path walk upwards
would terminate at the dirfd and use previously calculated access
rights stored in the associated struct file.  These access rights
would be determined at the time of opening the dirfd, similar to how we
are already storing the "truncate" access right today for regular
files.

(Remark: There might still be corner cases where we have to solve it
the hard way, if someone uses ".." together with a dirfd-relative
lookup.)

I also looked at what it would take to change the LSM hooks to pass
the directory that the lookup was done relative to, but it seems that
this would have to be passed through a bunch of VFS callbacks as well,
which seems like a larger change.  I would be curious whether that
would be deemed an acceptable change.

—Günther


P.S. Related to relative directory lookups, there is some movement in
the BSDs as well to use dirfds as capabilities, by adding a flag to
open directories that enforces O_BENEATH on subsequent opens:

 * https://undeadly.org/cgi?action=article;sid=20250529080623
 * https://reviews.freebsd.org/D50371

(both found via https://news.ycombinator.com/item?id=44575361)

If a dirfd had such a flag, that would get rid of the corner case
above.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-24 14:49       ` Günther Noack
@ 2025-07-24 17:10         ` Mickaël Salaün
  2025-07-25  3:54           ` Abhinav Saxena
  0 siblings, 1 reply; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-24 17:10 UTC (permalink / raw)
  To: Günther Noack
  Cc: Tingmao Wang, Jann Horn, John Johansen, Al Viro, Ben Scarlato,
	Christian Brauner, Daniel Burgener, Jeff Xu, NeilBrown,
	Paul Moore, Ryan Sullivan, Song Liu, linux-fsdevel,
	linux-security-module

On Thu, Jul 24, 2025 at 04:49:24PM +0200, Günther Noack wrote:
> On Wed, Jul 23, 2025 at 11:01:42PM +0200, Mickaël Salaün wrote:
> > On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:
> > > On the other hand, I'm still a bit uncertain about the domain check
> > > semantics.  While it would not cause a rename to be allowed if it is
> > > otherwise not allowed by any rules on or above the mountpoint, this gets a
> > > bit weird if we have a situation where renames are allowed on the
> > > mountpoint or everywhere, but not read/writes, however read/writes are
> > > allowed directly on a file, but the dir containing that file gets
> > > disconnected so the sandboxed application can't read or write to it.
> > > (Maybe someone would set up such a policy where renames are allowed,
> > > expecting Landlock to always prevent renames where additional permissions
> > > would be exposed?)
> > > 
> > > In the above situation, if the file is then moved to a connected
> > > directory, it will become readable/writable again.
> > 
> > We can generalize this issue to not only the end file but any component
> > of the path: disconnected directories.  In fact, the main issue is the
> > potential inconsistency of access checks over time (e.g. between two
> > renames).  This could be exploited to bypass the security checks done
> > for FS_REFER.
> > 
> > I see two solutions:
> > 
> > 1. *Always* walk down to the IS_ROOT directory, and then jump to the
> >    mount point.  This makes it possible to have consistent access checks
> >    for renames and open/use.  The first downside is that that would
> >    change the current behavior for bind mounts that could get more
> >    access rights (if the policy explicitly sets rights for the hidden
> >    directories).  The second downside is that we'll do more walk.
> > 
> > 2. Return -EACCES (or -ENOENT) for actions involving disconnected
> >    directories, or renames of disconnected opened files.  This second
> >    solution is simpler and safer but completely disables the use of
> >    disconnected directories and the rename of disconnected files for
> >    sandboxed processes.
> > 
> > It would be much better to be able to handle opened directories as
> > (object) capabilities, but that is not currently possible because of the
> > way paths are handled by the VFS and LSM hooks.
> > 
> > Tingmao, Günther, Jann, what do you think?
> 
> I have to admit that so far, I still failed to wrap my head around the
> full patch set and its possible corner cases.  I hope I did not
> misunderstand things all too badly below:
> 
> As far as I understand the proposed patch, we are "checkpointing" the
> intermediate results of the path walk at every mount point boundary,
> and in the case where we run into a disconnected directory in one of
> the nested mount points, we restore from the intermediate result at
> the previous mount point directory and skip to the next mount point.

Correct

> 
> Visually speaking, if the layout is this (where ":" denotes a
> mountpoint boundary between the mountpoints MP1, MP2, MP3):
> 
>                           dirfd
>                             |
>           :                 V         :
> 	  :       ham <--- spam <--- eggs <--- x.txt
> 	  :    (disconn.)             :
>           :                           :
>   / <--- foo <--- bar <--- baz        :
>           :                           :
>     MP1                 MP2                  MP3
> 
> When a process holds a reference to the "spam" directory, which is now
> disconnected, and invokes openat(dirfd, "eggs/x.txt", ...), then we
> would:
> 
>   * traverse x.txt
>   * traverse eggs (checkpointing the intermediate result) <-.
>   * traverse spam                                           |
>   * traverse ham                                            |
>   * discover that ham is disconnected:                      |
>      * restore the intermediate result from "eggs" ---------'
>      * continue the walk at foo
>   * end up at the root
> 
> So effectively, since the results from "spam" and "ham" are discarded,
> we would traverse only the inodes in the outer and inner mountpoints
> MP1 and MP3, but effectively return a result that looks like we did
> not traverse MP2?

We'd still check MP2's inode, but otherwise yes.

> 
> Maybe (likely) I misread the code. :) It's not clear to me what the
> thinking behind this is.  Also, if there was another directory in
> between "spam" and "eggs" in MP2, wouldn't we be missing the access
> rights attached to this directory?

Yes, we would ignore this access right because we don't know that the
path was resolved from spam.

> 
> 
> Regarding the capability approach:
> 
> I agree that a "capability" approach would be the better solution, but
> it seems infeasible with the existing LSM hooks at the moment.  I
> would be in favor of it though.

Yes, it would be a new feature with potential important changes.

In the meantime, we still need a fix for disconnected directories, and
this fix needs to be backported.  That's why the capability approach is
not part of the two solutions. ;)

> 
> To spell it out a bit more explicitly what that would mean in my mind:
> 
> When a path is looked up relative to a dirfd, the path walk upwards
> would terminate at the dirfd and use previously calculated access
> rights stored in the associated struct file.  These access rights
> would be determined at the time of opening the dirfd, similar to how we
> are already storing the "truncate" access right today for regular
> files.
> 
> (Remark: There might still be corner cases where we have to solve it
> the hard way, if someone uses ".." together with a dirfd-relative
> lookup.)

Yep, real capabilities don't have ".." in their design.  On Linux (and
Landlock), we need to properly handle "..", which is challenging.

> 
> I also looked at what it would take to change the LSM hooks to pass
> the directory that the lookup was done relative to, but it seems that
> this would have to be passed through a bunch of VFS callbacks as well,
> which seems like a larger change.  I would be curious whether that
> would be deemed an acceptable change.
> 
> —Günther
> 
> 
> P.S. Related to relative directory lookups, there is some movement in
> the BSDs as well to use dirfds as capabilities, by adding a flag to
> open directories that enforces O_BENEATH on subsequent opens:
> 
>  * https://undeadly.org/cgi?action=article;sid=20250529080623
>  * https://reviews.freebsd.org/D50371
> 
> (both found via https://news.ycombinator.com/item?id=44575361)
> 
> If a dirfd had such a flag, that would get rid of the corner case
> above.

This would be nice but it would not solve the current issue because we
cannot force all processes to use this flag (which breaks some use
cases).

FYI, Capsicum is a more complete implementation:
https://man.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4
See the vfs.lookup_cap_dotdot sysctl too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-24 17:10         ` Mickaël Salaün
@ 2025-07-25  3:54           ` Abhinav Saxena
  2025-07-25  9:05             ` Mickaël Salaün
  0 siblings, 1 reply; 12+ messages in thread
From: Abhinav Saxena @ 2025-07-25  3:54 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Tingmao Wang, Jann Horn, John Johansen,
	Al Viro, Ben Scarlato, Christian Brauner, Daniel Burgener,
	Jeff Xu, NeilBrown, Paul Moore, Ryan Sullivan, Song Liu,
	linux-fsdevel, linux-security-module

[-- Attachment #1: Type: text/plain, Size: 13396 bytes --]

Hi!

Mickaël Salaün <mic@digikod.net> writes:
> On Thu, Jul 24, 2025 at 04:49:24PM +0200, Günther Noack wrote:
>> On Wed, Jul 23, 2025 at 11:01:42PM +0200, Mickaël Salaün wrote:
>> > On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:
>> > > On the other hand, I’m still a bit uncertain about the domain check
>> > > semantics.  While it would not cause a rename to be allowed if it is
>> > > otherwise not allowed by any rules on or above the mountpoint, this gets a
>> > > bit weird if we have a situation where renames are allowed on the
>> > > mountpoint or everywhere, but not read/writes, however read/writes are
>> > > allowed directly on a file, but the dir containing that file gets
>> > > disconnected so the sandboxed application can’t read or write to it.
>> > > (Maybe someone would set up such a policy where renames are allowed,
>> > > expecting Landlock to always prevent renames where additional permissions
>> > > would be exposed?)
>> > >
>> > > In the above situation, if the file is then moved to a connected
>> > > directory, it will become readable/writable again.
>> >
>> > We can generalize this issue to not only the end file but any component
>> > of the path: disconnected directories.  In fact, the main issue is the
>> > potential inconsistency of access checks over time (e.g. between two
>> > renames).  This could be exploited to bypass the security checks done
>> > for FS_REFER.
>> >
>> > I see two solutions:
>> >
>> > 1. *Always* walk down to the IS_ROOT directory, and then jump to the
>> >    mount point.  This makes it possible to have consistent access checks
>> >    for renames and open/use.  The first downside is that that would
>> >    change the current behavior for bind mounts that could get more
>> >    access rights (if the policy explicitly sets rights for the hidden
>> >    directories).  The second downside is that we’ll do more walk.
>> >
>> > 2. Return -EACCES (or -ENOENT) for actions involving disconnected
>> >    directories, or renames of disconnected opened files.  This second
>> >    solution is simpler and safer but completely disables the use of
>> >    disconnected directories and the rename of disconnected files for
>> >    sandboxed processes.
>> >
>> > It would be much better to be able to handle opened directories as
>> > (object) capabilities, but that is not currently possible because of the
>> > way paths are handled by the VFS and LSM hooks.
>> >
>> > Tingmao, Günther, Jann, what do you think?
>>
>> I have to admit that so far, I still failed to wrap my head around the
>> full patch set and its possible corner cases.  I hope I did not
>> misunderstand things all too badly below:
>>
>> As far as I understand the proposed patch, we are “checkpointing” the
>> intermediate results of the path walk at every mount point boundary,
>> and in the case where we run into a disconnected directory in one of
>> the nested mount points, we restore from the intermediate result at
>> the previous mount point directory and skip to the next mount point.
>
> Correct
>
>>
>> Visually speaking, if the layout is this (where “:” denotes a
>> mountpoint boundary between the mountpoints MP1, MP2, MP3):
>>
>>                           dirfd
>>                             |
>>           :                 V         :
>> 	  :       ham <— spam <— eggs <— x.txt
>> 	  :    (disconn.)             :
>>           :                           :
>>   / <— foo <— bar <— baz        :
>>           :                           :
>>     MP1                 MP2                  MP3
>>
>> When a process holds a reference to the “spam” directory, which is now
>> disconnected, and invokes openat(dirfd, “eggs/x.txt”, …), then we
>> would:
>>
>>   * traverse x.txt
>>   * traverse eggs (checkpointing the intermediate result) <-.
>>   * traverse spam                                           |
>>   * traverse ham                                            |
>>   * discover that ham is disconnected:                      |
>>      * restore the intermediate result from “eggs” ———’
>>      * continue the walk at foo
>>   * end up at the root
>>
>> So effectively, since the results from “spam” and “ham” are discarded,
>> we would traverse only the inodes in the outer and inner mountpoints
>> MP1 and MP3, but effectively return a result that looks like we did
>> not traverse MP2?
>
> We’d still check MP2’s inode, but otherwise yes.
>

I don’t know if it makes sense, but can access rights be cached as part
of the inode security blob? Although I am not sure if the LSM blob would
exist after unlinking.

But if it does, maybe during unlink, keep the cached rights for MP2,
and during openat():
1. Start at disconnected “spam” inode
2. Check spam->i_security->allowed_access  <- Cached MP2 rights
3. Continue normal path walk with preserved access context

>>
>> Maybe (likely) I misread the code. :) It’s not clear to me what the
>> thinking behind this is.  Also, if there was another directory in
>> between “spam” and “eggs” in MP2, wouldn’t we be missing the access
>> rights attached to this directory?
>
> Yes, we would ignore this access right because we don’t know that the
> path was resolved from spam.
>
>>
>>
>> Regarding the capability approach:
>>
>> I agree that a “capability” approach would be the better solution, but
>> it seems infeasible with the existing LSM hooks at the moment.  I
>> would be in favor of it though.
>
> Yes, it would be a new feature with potential important changes.
>
> In the meantime, we still need a fix for disconnected directories, and
> this fix needs to be backported.  That’s why the capability approach is
> not part of the two solutions. ;)
>
>>
>> To spell it out a bit more explicitly what that would mean in my mind:
>>
>> When a path is looked up relative to a dirfd, the path walk upwards
>> would terminate at the dirfd and use previously calculated access
>> rights stored in the associated struct file.  These access rights
>> would be determined at the time of opening the dirfd, similar to how we
>> are already storing the “truncate” access right today for regular
>> files.
>>
>> (Remark: There might still be corner cases where we have to solve it
>> the hard way, if someone uses “..” together with a dirfd-relative
>> lookup.)
>
> Yep, real capabilities don’t have “..” in their design.  On Linux (and
> Landlock), we need to properly handle “..”, which is challenging.
>
>>
>> I also looked at what it would take to change the LSM hooks to pass
>> the directory that the lookup was done relative to, but it seems that
>> this would have to be passed through a bunch of VFS callbacks as well,
>> which seems like a larger change.  I would be curious whether that
>> would be deemed an acceptable change.
>>
>> —Günther
>>
>>
>> P.S. Related to relative directory lookups, there is some movement in
>> the BSDs as well to use dirfds as capabilities, by adding a flag to
>> open directories that enforces O_BENEATH on subsequent opens:
>>
>>  * <https://undeadly.org/cgi?action=article;sid=20250529080623>
>>  * <https://reviews.freebsd.org/D50371>
>>
>> (both found via <https://news.ycombinator.com/item?id=44575361>)
>>
>> If a dirfd had such a flag, that would get rid of the corner case
>> above.
>
> This would be nice but it would not solve the current issue because we
> cannot force all processes to use this flag (which breaks some use
> cases).
>
> FYI, Capsicum is a more complete implementation:
> <https://man.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4>
> See the vfs.lookup_cap_dotdot sysctl too.

Also, my apologies, as this may be tangential to the current
conversation, but since object-based capabilities were mentioned, I had
some design ideas around it while working on the memfd feature [1]. I
don’t know if the design for object-based capabilities has been
internally formalized yet, but since we’re at this juncture, it would
make me glad if any of this is helpful in any way :)

If I understand things correctly, the domain currently applies to ALL
file operations via paths and persists until the process exits.
Therefore, with disconnected directories, once a path component is
unlinked, security policies can be bypassed, as access checks on
previously visible ancestors might get skipped.

Current Landlock Architecture:
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Process -> Landlock Domain -> Access Decision

{Filesystem Rules, Network Rules, Scope Restrictions}

Path/Port Resolution + Domain Boundary Checks


Enhanced Architecture with Object Capabilities:
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Process -> Enhanced Landlock Domain ->   Access Decision
━━
  
━━
{Path Rules, Network Rules,  (AND)   {FD Capabilities}
 Scope Restrictions}                      |
━━━━━━━━━━━━━━━
 Per-FD Rights 
━━━━━━━━━━━━━━━
Traditional Resolution             (calculated)

Unlike SCOPE which provides coarse-grained blocking, object capabilities
provide with the facility to add domain specific fine-grained individual
FD operations. So that we have:

Child Domain = Parent Domain & New Restrictions
             = {
    path_rules: Parent.path_rules & Child.path_rules,
    net_rules: Parent.net_rules & Child.net_rules,
    scope: Parent.scope | Child.scope,  /* Additive */
    fd_caps: path_rules & net_rules & scope & Child.allowed_fd_operations
}

where the Child domain *must* be more restrictive than the parent. Here
is an example:

/* Example */
Parent Domain = {
    path_rules: [“/var/www” -> READ_FILE|READ_DIR, “/var/log” -> WRITE_FILE],
    net_rules: [“80” -> BIND_TCP, “443” -> BIND_TCP],
    scope: [SIGNAL, ABSTRACT_UNIX],

    /* Auto-derived FD capabilities */
    fd_caps: {
        3: READ_FILE,           /* /var/www/index.html */
        7: READ_DIR,            /* /var/www directory */
        12: WRITE_FILE,         /* /var/log/access.log */
        15: BIND_TCP,           /* socket bound to port 80 */
        20: READ_FILE|READ_DIR  /* /var/www/images/ */
    }
}

/* Child creates new domain with additional restrictions */
Child.new_restrictions = {
    path_rules: ["/var/www" -> READ_FILE only], /* Remove READ_DIR */
    net_rules: [],                              /* Remove all network */
    scope: [SIGNAL, ABSTRACT_UNIX, MEMFD_EXEC], /* Add MEMFD restriction */
}

/* Child FD capabilities = Parent & Child restrictions */
Child.fd_caps = {
    3: READ_FILE,     /* READ_FILE & READ_FILE = READ_FILE */
    7: 0,             /* READ_DIR & READ_FILE = none (no access) */
    12: WRITE_FILE,   /* WRITE_FILE unchanged (not restricted) */
    15: 0,            /* BIND_TCP & none = none (network blocked) */
    20: READ_FILE     /* (READ_FILE|READ_DIR) & READ_FILE = READ_FILE */
}

API Design: Reusing Existing Flags
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

/* Extended ruleset - reuse existing flags where possible */
struct landlock_ruleset_attr {
    __u64 handled_access_fs;      /* Existing: also applies to FDs */
    __u64 handled_access_net;     /* Existing: also applies to FDs */
    __u64 scoped;                 /* Existing: domain boundaries */
    __u64 handled_access_fd;      /* NEW: FD-specific operations only */
};

/* New syscalls */
long landlock_set_fd_capability(int fd, __u64 access_rights, __u32 flags);

/* Reuse existing filesystem/network flags for FD operations */
landlock_set_fd_capability(file_fd, LANDLOCK_ACCESS_FS_READ_FILE, 0);
landlock_set_fd_capability(dir_fd, LANDLOCK_ACCESS_FS_READ_DIR, 0);
landlock_set_fd_capability(sock_fd, LANDLOCK_ACCESS_NET_BIND_TCP, 0);

`============'

With object capabilities, we assign access rights to file descriptors
directly, at open/alloc time, eliminating the need for path resolution
during future use.

This solves the core issue because:
• FDs remain valid even when disconnected, and
• Rights are bound to the object rather than its pathname.

Therefore, openat with dirfd should still work.

int dirfd = open(“/tmp/work", O_RDONLY);        // Connected
unlink(”/tmp/work");                            // Now disconnected
openat(dirfd, “file.txt”, O_RDONLY);            // Still works, FD bound

Moreover, no path resolution is needed at a later stage and sandboxed
processes don’t bypass restrictions.

Would love to hear any feedback and thoughts on this.

Best,
Abhinav

[1] - <https://lore.kernel.org/all/20250719-memfd-exec-v1-0-0ef7feba5821@gmail.com/>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-25  3:54           ` Abhinav Saxena
@ 2025-07-25  9:05             ` Mickaël Salaün
  0 siblings, 0 replies; 12+ messages in thread
From: Mickaël Salaün @ 2025-07-25  9:05 UTC (permalink / raw)
  To: Abhinav Saxena
  Cc: Günther Noack, Tingmao Wang, Jann Horn, John Johansen,
	Al Viro, Ben Scarlato, Christian Brauner, Daniel Burgener,
	Jeff Xu, NeilBrown, Paul Moore, Ryan Sullivan, Song Liu,
	linux-fsdevel, linux-security-module

On Thu, Jul 24, 2025 at 09:54:22PM -0600, Abhinav Saxena wrote:
> Hi!
> 
> Mickaël Salaün <mic@digikod.net> writes:
> > On Thu, Jul 24, 2025 at 04:49:24PM +0200, Günther Noack wrote:
> >> On Wed, Jul 23, 2025 at 11:01:42PM +0200, Mickaël Salaün wrote:
> >> > On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:
> >> > > On the other hand, I’m still a bit uncertain about the domain check
> >> > > semantics.  While it would not cause a rename to be allowed if it is
> >> > > otherwise not allowed by any rules on or above the mountpoint, this gets a
> >> > > bit weird if we have a situation where renames are allowed on the
> >> > > mountpoint or everywhere, but not read/writes, however read/writes are
> >> > > allowed directly on a file, but the dir containing that file gets
> >> > > disconnected so the sandboxed application can’t read or write to it.
> >> > > (Maybe someone would set up such a policy where renames are allowed,
> >> > > expecting Landlock to always prevent renames where additional permissions
> >> > > would be exposed?)
> >> > >
> >> > > In the above situation, if the file is then moved to a connected
> >> > > directory, it will become readable/writable again.
> >> >
> >> > We can generalize this issue to not only the end file but any component
> >> > of the path: disconnected directories.  In fact, the main issue is the
> >> > potential inconsistency of access checks over time (e.g. between two
> >> > renames).  This could be exploited to bypass the security checks done
> >> > for FS_REFER.
> >> >
> >> > I see two solutions:
> >> >
> >> > 1. *Always* walk down to the IS_ROOT directory, and then jump to the
> >> >    mount point.  This makes it possible to have consistent access checks
> >> >    for renames and open/use.  The first downside is that that would
> >> >    change the current behavior for bind mounts that could get more
> >> >    access rights (if the policy explicitly sets rights for the hidden
> >> >    directories).  The second downside is that we’ll do more walk.
> >> >
> >> > 2. Return -EACCES (or -ENOENT) for actions involving disconnected
> >> >    directories, or renames of disconnected opened files.  This second
> >> >    solution is simpler and safer but completely disables the use of
> >> >    disconnected directories and the rename of disconnected files for
> >> >    sandboxed processes.
> >> >
> >> > It would be much better to be able to handle opened directories as
> >> > (object) capabilities, but that is not currently possible because of the
> >> > way paths are handled by the VFS and LSM hooks.
> >> >
> >> > Tingmao, Günther, Jann, what do you think?
> >>
> >> I have to admit that so far, I still failed to wrap my head around the
> >> full patch set and its possible corner cases.  I hope I did not
> >> misunderstand things all too badly below:
> >>
> >> As far as I understand the proposed patch, we are “checkpointing” the
> >> intermediate results of the path walk at every mount point boundary,
> >> and in the case where we run into a disconnected directory in one of
> >> the nested mount points, we restore from the intermediate result at
> >> the previous mount point directory and skip to the next mount point.
> >
> > Correct
> >
> >>
> >> Visually speaking, if the layout is this (where “:” denotes a
> >> mountpoint boundary between the mountpoints MP1, MP2, MP3):
> >>
> >>                           dirfd
> >>                             |
> >>           :                 V         :
> >> 	  :       ham <— spam <— eggs <— x.txt
> >> 	  :    (disconn.)             :
> >>           :                           :
> >>   / <— foo <— bar <— baz        :
> >>           :                           :
> >>     MP1                 MP2                  MP3
> >>
> >> When a process holds a reference to the “spam” directory, which is now
> >> disconnected, and invokes openat(dirfd, “eggs/x.txt”, …), then we
> >> would:
> >>
> >>   * traverse x.txt
> >>   * traverse eggs (checkpointing the intermediate result) <-.
> >>   * traverse spam                                           |
> >>   * traverse ham                                            |
> >>   * discover that ham is disconnected:                      |
> >>      * restore the intermediate result from “eggs” ———’
> >>      * continue the walk at foo
> >>   * end up at the root
> >>
> >> So effectively, since the results from “spam” and “ham” are discarded,
> >> we would traverse only the inodes in the outer and inner mountpoints
> >> MP1 and MP3, but effectively return a result that looks like we did
> >> not traverse MP2?
> >
> > We’d still check MP2’s inode, but otherwise yes.
> >
> 
> I don’t know if it makes sense, but can access rights be cached as part
> of the inode security blob? Although I am not sure if the LSM blob would
> exist after unlinking.

We're not talking about unlinked files but renamed directories that are
opened in a bind mount.  There is no issue with unlinked files, only
when opening something new from an open directory.

With Landlock, because it is unprivileged and then has standalone
security policies, we would need to have one cache per sandbox, which
has its own set of issues.  That would be independant from the current
patch series anyway.

> 
> But if it does, maybe during unlink, keep the cached rights for MP2,
> and during openat():
> 1. Start at disconnected “spam” inode
> 2. Check spam->i_security->allowed_access  <- Cached MP2 rights
> 3. Continue normal path walk with preserved access context

As explained, we cannot do that with the current LSM and VFS
APIs

> 
> >>
> >> Maybe (likely) I misread the code. :) It’s not clear to me what the
> >> thinking behind this is.  Also, if there was another directory in
> >> between “spam” and “eggs” in MP2, wouldn’t we be missing the access
> >> rights attached to this directory?
> >
> > Yes, we would ignore this access right because we don’t know that the
> > path was resolved from spam.
> >
> >>
> >>
> >> Regarding the capability approach:
> >>
> >> I agree that a “capability” approach would be the better solution, but
> >> it seems infeasible with the existing LSM hooks at the moment.  I
> >> would be in favor of it though.
> >
> > Yes, it would be a new feature with potential important changes.
> >
> > In the meantime, we still need a fix for disconnected directories, and
> > this fix needs to be backported.  That’s why the capability approach is
> > not part of the two solutions. ;)
> >
> >>
> >> To spell it out a bit more explicitly what that would mean in my mind:
> >>
> >> When a path is looked up relative to a dirfd, the path walk upwards
> >> would terminate at the dirfd and use previously calculated access
> >> rights stored in the associated struct file.  These access rights
> >> would be determined at the time of opening the dirfd, similar to how we
> >> are already storing the “truncate” access right today for regular
> >> files.
> >>
> >> (Remark: There might still be corner cases where we have to solve it
> >> the hard way, if someone uses “..” together with a dirfd-relative
> >> lookup.)
> >
> > Yep, real capabilities don’t have “..” in their design.  On Linux (and
> > Landlock), we need to properly handle “..”, which is challenging.
> >
> >>
> >> I also looked at what it would take to change the LSM hooks to pass
> >> the directory that the lookup was done relative to, but it seems that
> >> this would have to be passed through a bunch of VFS callbacks as well,
> >> which seems like a larger change.  I would be curious whether that
> >> would be deemed an acceptable change.
> >>
> >> —Günther
> >>
> >>
> >> P.S. Related to relative directory lookups, there is some movement in
> >> the BSDs as well to use dirfds as capabilities, by adding a flag to
> >> open directories that enforces O_BENEATH on subsequent opens:
> >>
> >>  * <https://undeadly.org/cgi?action=article;sid=20250529080623>
> >>  * <https://reviews.freebsd.org/D50371>
> >>
> >> (both found via <https://news.ycombinator.com/item?id=44575361>)
> >>
> >> If a dirfd had such a flag, that would get rid of the corner case
> >> above.
> >
> > This would be nice but it would not solve the current issue because we
> > cannot force all processes to use this flag (which breaks some use
> > cases).
> >
> > FYI, Capsicum is a more complete implementation:
> > <https://man.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4>
> > See the vfs.lookup_cap_dotdot sysctl too.
> 
> Also, my apologies, as this may be tangential to the current
> conversation, but since object-based capabilities were mentioned, I had
> some design ideas around it while working on the memfd feature [1]. I

Thanks for your patch series!  It is on my todo list but I'll need a bit
of time to review it.

> don’t know if the design for object-based capabilities has been
> internally formalized yet, but since we’re at this juncture, it would
> make me glad if any of this is helpful in any way :)
> 
> If I understand things correctly, the domain currently applies to ALL
> file operations via paths and persists until the process exits.

Yes (but it doesn't apply on already opened file descriptors).

> Therefore, with disconnected directories, once a path component is
> unlinked, security policies can be bypassed, as access checks on
> previously visible ancestors might get skipped.

There is no security bypass, because Landlock is a deny-by-default (when
handled) access control, and also because filesystem's parent
directories are evaluated (but not necessarily all the visible/mounted
filesystems).

Also, there is no issue with unlinked files.

> 
> Current Landlock Architecture:
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> 
> Process -> Landlock Domain -> Access Decision
> 
> {Filesystem Rules, Network Rules, Scope Restrictions}
> 
> Path/Port Resolution + Domain Boundary Checks
> 
> 
> Enhanced Architecture with Object Capabilities:
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> 
> Process -> Enhanced Landlock Domain ->   Access Decision
> ━━
>   
> ━━
> {Path Rules, Network Rules,  (AND)   {FD Capabilities}
>  Scope Restrictions}                      |
> ━━━━━━━━━━━━━━━
>  Per-FD Rights 
> ━━━━━━━━━━━━━━━
> Traditional Resolution             (calculated)
> 
> Unlike SCOPE which provides coarse-grained blocking, object capabilities
> provide with the facility to add domain specific fine-grained individual
> FD operations. So that we have:
> 
> Child Domain = Parent Domain & New Restrictions
>              = {
>     path_rules: Parent.path_rules & Child.path_rules,
>     net_rules: Parent.net_rules & Child.net_rules,
>     scope: Parent.scope | Child.scope,  /* Additive */
>     fd_caps: path_rules & net_rules & scope & Child.allowed_fd_operations
> }
> 
> where the Child domain *must* be more restrictive than the parent. Here
> is an example:
> 
> /* Example */
> Parent Domain = {
>     path_rules: [“/var/www” -> READ_FILE|READ_DIR, “/var/log” -> WRITE_FILE],
>     net_rules: [“80” -> BIND_TCP, “443” -> BIND_TCP],
>     scope: [SIGNAL, ABSTRACT_UNIX],
> 
>     /* Auto-derived FD capabilities */
>     fd_caps: {
>         3: READ_FILE,           /* /var/www/index.html */
>         7: READ_DIR,            /* /var/www directory */
>         12: WRITE_FILE,         /* /var/log/access.log */
>         15: BIND_TCP,           /* socket bound to port 80 */
>         20: READ_FILE|READ_DIR  /* /var/www/images/ */
>     }
> }
> 
> /* Child creates new domain with additional restrictions */
> Child.new_restrictions = {
>     path_rules: ["/var/www" -> READ_FILE only], /* Remove READ_DIR */
>     net_rules: [],                              /* Remove all network */
>     scope: [SIGNAL, ABSTRACT_UNIX, MEMFD_EXEC], /* Add MEMFD restriction */
> }
> 
> /* Child FD capabilities = Parent & Child restrictions */
> Child.fd_caps = {
>     3: READ_FILE,     /* READ_FILE & READ_FILE = READ_FILE */
>     7: 0,             /* READ_DIR & READ_FILE = none (no access) */
>     12: WRITE_FILE,   /* WRITE_FILE unchanged (not restricted) */
>     15: 0,            /* BIND_TCP & none = none (network blocked) */
>     20: READ_FILE     /* (READ_FILE|READ_DIR) & READ_FILE = READ_FILE */
> }
> 
> API Design: Reusing Existing Flags
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> 
> /* Extended ruleset - reuse existing flags where possible */
> struct landlock_ruleset_attr {
>     __u64 handled_access_fs;      /* Existing: also applies to FDs */
>     __u64 handled_access_net;     /* Existing: also applies to FDs */
>     __u64 scoped;                 /* Existing: domain boundaries */
>     __u64 handled_access_fd;      /* NEW: FD-specific operations only */
> };
> 
> /* New syscalls */
> long landlock_set_fd_capability(int fd, __u64 access_rights, __u32 flags);

Such a syscall could be useful in some scenarios but in most cases
programs should not require to be modified.

> 
> /* Reuse existing filesystem/network flags for FD operations */
> landlock_set_fd_capability(file_fd, LANDLOCK_ACCESS_FS_READ_FILE, 0);
> landlock_set_fd_capability(dir_fd, LANDLOCK_ACCESS_FS_READ_DIR, 0);
> landlock_set_fd_capability(sock_fd, LANDLOCK_ACCESS_NET_BIND_TCP, 0);
> 
> `============'
> 
> With object capabilities, we assign access rights to file descriptors
> directly, at open/alloc time, eliminating the need for path resolution
> during future use.

Except when ".." is used to resolve a path relative to such
capability/FD, and we need to support that for compatibility reasons.

> 
> This solves the core issue because:
> • FDs remain valid even when disconnected, and
> • Rights are bound to the object rather than its pathname.
> 
> Therefore, openat with dirfd should still work.
> 
> int dirfd = open(“/tmp/work", O_RDONLY);        // Connected
> unlink(”/tmp/work");                            // Now disconnected

It's not possible to unlink a directory nor rmdir a non-empty one.

> openat(dirfd, “file.txt”, O_RDONLY);            // Still works, FD bound
> 
> Moreover, no path resolution is needed at a later stage and sandboxed
> processes don’t bypass restrictions.
> 
> Would love to hear any feedback and thoughts on this.

I think we all agree that capabilities would be nice, but the VFS and
LSM's APIs would need to be updated, and that would need to update user
space code, while breaking some use cases.

> 
> Best,
> Abhinav
> 
> [1] - <https://lore.kernel.org/all/20250719-memfd-exec-v1-0-0ef7feba5821@gmail.com/>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
  2025-07-23 21:01     ` Mickaël Salaün
  2025-07-24 14:49       ` Günther Noack
@ 2025-07-28  0:13       ` Tingmao Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Tingmao Wang @ 2025-07-28  0:13 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack, Jann Horn,
	John Johansen
  Cc: Al Viro, Ben Scarlato, Christian Brauner, Daniel Burgener,
	Jeff Xu, NeilBrown, Paul Moore, Ryan Sullivan, Song Liu,
	linux-fsdevel, linux-security-module, Abhinav Saxena

On 7/23/25 22:01, Mickaël Salaün wrote:
> On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:
>> On 7/19/25 11:42, Mickaël Salaün wrote:
>>> [...]
>>> @@ -784,12 +787,18 @@ static bool is_access_to_paths_allowed(
>>>  	if (WARN_ON_ONCE(!layer_masks_parent1))
>>>  		return false;
>>>  
>>> -	allowed_parent1 = is_layer_masks_allowed(layer_masks_parent1);
>>> -
>>>  	if (unlikely(layer_masks_parent2)) {
>>>  		if (WARN_ON_ONCE(!dentry_child1))
>>>  			return false;
>>>  
>>> +		/*
>>> +		 * Creates a backup of the initial layer masks to be able to restore
>>> +		 * them if we find out that we were walking a disconnected directory,
>>> +		 * which would make the collected access rights inconsistent (cf.
>>> +		 * reset_to_mount_root).
>>> +		 */
>>
>> This comment is duplicate with the one below, is this intentional?
>>
>>> [...]
>>
>> On the other hand, I'm still a bit uncertain about the domain check
>> semantics.  While it would not cause a rename to be allowed if it is
>> otherwise not allowed by any rules on or above the mountpoint, this gets a
>> bit weird if we have a situation where renames are allowed on the
>> mountpoint or everywhere, but not read/writes, however read/writes are
>> allowed directly on a file, but the dir containing that file gets
>> disconnected so the sandboxed application can't read or write to it.
>> (Maybe someone would set up such a policy where renames are allowed,
>> expecting Landlock to always prevent renames where additional permissions
>> would be exposed?)
>>
>> In the above situation, if the file is then moved to a connected
>> directory, it will become readable/writable again.
> 
> We can generalize this issue to not only the end file but any component
> of the path: disconnected directories.  In fact, the main issue is the
> potential inconsistency of access checks over time (e.g. between two
> renames).  This could be exploited to bypass the security checks done
> for FS_REFER.

Hi Mickaël,

(replying to this one even though I've read the further replies, since I
don't have anything to address there)

I spent some time thinking about how this could be exploited and I can see
one concrete situation where disconnected directories can be taken
advantaged of, which is when an application has rename access across the
"source" of a bind mount, as well as some place outside of it, but only
has read/write access on certain subdirs of it, in which case it can use
this to move files under the bind mount that it does not have access to,
to a place where it has, by first making the target destination
disconnected, then connecting it back:

	/ # mkdir bind_src bind_dst outside
	/ # mount --bind bind_src bind_dst
	/ # mkdir -p bind_src/d1
	/ # echo secret > /bind_src/foo
	/ # LL_FS_RO=/usr:/bin:/lib:/etc LL_FS_RW=/bind_src/d1:/outside LL_FS_CREATE_DELETE_REFER=/ /sandboxer bash
		## (sandboxer patched with ability to add just create/delete/refer
		    rule on a path - maybe we can add this to the sandboxer but
		    call it something like LL_FS_DIR_RW?)
	Executing the sandboxed command...
	/ # cat /bind_src/foo /bind_dst/foo
	cat: /bind_src/foo: Permission denied
	cat: /bind_dst/foo: Permission denied
	/ # cd /bind_dst/d1
	/bind_dst/d1 # mv -v /bind_src/d1 /outside
	renamed '/bind_src/d1' -> '/outside/d1'
	/bind_dst/d1 # ls ..
	ls: cannot access '..': No such file or directory
	/bind_dst/d1 # mv -v /bind_dst/foo .
	renamed '/bind_dst/foo' -> './foo'
	/bind_dst/d1 # mv -v /outside/d1 /bind_src/d1
	renamed '/outside/d1' -> '/bind_src/d1'
	/bind_dst/d1 # cat foo
	secret

(I think this is probably not an issue with the existing Landlock code, as
the application would need to have rules outside the bind mount in order
for Landlock to not see it when the directory is disconnected, but in that
case it already has access to the target file anyway due to hierarchy
inheritance)

The other way I can think of where this inconsistency could be taken
advantaged of is for regaining access to a file within a disconnected
directory that the application originally had access to (basically the
situation I wrote about earlier), but that is arguably a less worrisome
case and your option 1 would make access in this case allowed anyway.  So
far I can't think of any other way an application could use this to gain
access it didn't have on the file before.

> 
> I see two solutions:
> 
> 1. *Always* walk down to the IS_ROOT directory, and then jump to the
>    mount point.  This makes it possible to have consistent access checks
>    for renames and open/use.  The first downside is that that would
>    change the current behavior for bind mounts that could get more
>    access rights (if the policy explicitly sets rights for the hidden
>    directories).  The second downside is that we'll do more walk.
> 
> 2. Return -EACCES (or -ENOENT) for actions involving disconnected
>    directories, or renames of disconnected opened files.  This second
>    solution is simpler and safer but completely disables the use of
>    disconnected directories and the rename of disconnected files for
>    sandboxed processes.

I think -ENOENT might be confusing if the disconnection is accidentally
triggered (e.g. due to a bug, race etc) (I guess given that ".." doesn't
work in disconnected dirs, nobody should be setting them up deliberately,
so maybe any situation where a disconnected directory exists is always a
bug, or at least a temporary race?), but on the other hand VFS already
returns -ENOENT for ".." when in disconnected directories so maybe it's
OK, as long as this is clarified in the doc, I guess.

Also by "rename" I assume we're implicitly including links as well (and
since one couldn't rename an opened file just from the fd anyway, any
rename would be for files underneath).

> 
> It would be much better to be able to handle opened directories as
> (object) capabilities, but that is not currently possible because of the
> way paths are handled by the VFS and LSM hooks.
> 
> Tingmao, Günther, Jann, what do you think?

I wonder if there is a third option where we do option 1, but only for
disconnected dirs, so basically for any disconnected directories Landlock
would allow rules either on the path from the target to its fs root, or on
the path from its mountpoint to the real root.  In addition, domain check
for rename/links would be stricter, in that it would make sure there are
no rules granting more access on the destination than the source even if
those rules are "hidden" beneath the mountpoint, so that even if the
destination later become disconnected access is not widened.

While this does leaves inconsistency since now some "hidden" rules would
be applied only in the disconnected case, maybe this could still be a
better option since it prevents accidentally widening access checks in the
common case, especially for existing code (after all, currently Landlock
does not apply those "hidden" rules, and if an application can't move the
files into a disconnected directory, for example because write access are
not granted to any bind mounted places, or because it only has access to
the mount destination but not the source (achievable by putting the rule
above the mountpoint, and in that case it can't move a dir outside the
mount so it will always be connected), it will not be able to surface
them).

One could argue that a reasonable policy is not supposed to have rules on
those "hidden" directories in the first place, but in that case whether or
not we do this extra walk to fs root for the non-disconnected case makes
no difference.

The advantage of this option is also that we don't increase the
performance overhead for the non-disconnected case which is the vast
majority of cases (and this perf benefit might be significant since we're
likely not backporting any ref-less pathwalk).  Also, while I'm not sure
whether this option or option 1 has any difference in terms of backporting
difficulty, I think option 1 is a more significant change compared to
this, and maybe in the future if we do get to have opened directory
capabilities, keeping the changes introduced by the current "workaround"
small would be better?  This would probably also make it easier to adopt
Song's path iterator too.

If we do this, then we would probably have a warning in e.g.
Documentation/userspace-api/landlock.rst about the fact that Landlock will
do this walk to the fs root if the directory is disconnected (either
caused by the sandboxed application or by other means).

Just thoughts and suggestions tho, my opinions aren't very strong, and I
probably don't have a good grasp of how Landlock is currently used in
practice.

Best,
Tingmao

> 
> It looks like AppArmor also denies access to disconnected path in some
> cases, but it tries to reconstruct the path for known internal
> filesystems, and it seems to specifically handle the case of chroot.  I
> don't know when PATH_CONNECT_PATH is set though.
> 
> John, could you please clarify how disconnected directories and files
> are handled by AppArmor?
> 
>>
>> Here is an example test, using the layout1_bind fixture for flexibility
>> [...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-28  0:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-19 10:41 [PATCH v3 0/4] Landlock: Disconnected directory handling Mickaël Salaün
2025-07-19 10:42 ` [PATCH v3 1/4] landlock: Fix cosmetic change Mickaël Salaün
2025-07-19 10:42 ` [PATCH v3 2/4] landlock: Fix handling of disconnected directories Mickaël Salaün
2025-07-22 18:04   ` Tingmao Wang
2025-07-23 21:01     ` Mickaël Salaün
2025-07-24 14:49       ` Günther Noack
2025-07-24 17:10         ` Mickaël Salaün
2025-07-25  3:54           ` Abhinav Saxena
2025-07-25  9:05             ` Mickaël Salaün
2025-07-28  0:13       ` Tingmao Wang
2025-07-19 10:42 ` [PATCH v3 3/4] selftests/landlock: Add tests for access through disconnected paths Mickaël Salaün
2025-07-19 10:42 ` [PATCH v3 4/4] selftests/landlock: Add disconnected leafs and branch test suites Mickaël Salaün

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).