Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* [PATCH v9 1/9] landlock: Add a place for flags to layer rules
From: Tingmao Wang @ 2026-05-27  1:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Tingmao Wang, Günther Noack, Justin Suess, Jan Kara,
	Abhinav Saxena, linux-security-module
In-Reply-To: <cover.1779843375.git.m@maowtm.org>

To avoid unnecessarily increasing the size of struct landlock_layer, we
make the layer level a u8 and use the space to store the flags struct.

struct layer_access_masks is renamed to struct layer_masks, and a new
field is added to track whether a quiet flag rule is seen for each
layer.  Through use of bitfields, this does not increase the size of the
struct.

Cc: Justin Suess <utilityemal77@gmail.com>
Assisted-by: GitHub Copilot:claude-opus-4.7 copilot-review
Signed-off-by: Tingmao Wang <m@maowtm.org>
Co-developed-by: Justin Suess <utilityemal77@gmail.com>
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
---

Changes in v9:
- Move a hunk from patch 2 to here
- Fix comment and format
- Renamed struct layer_access_masks to struct layer_masks, and moved the
  content of struct collected_rule_flags into this struct, getting rid
  of the extra struct collected_rule_flags and function parameters.
  This is following a discussion in [3].  The flag is now initialized in
  landlock_init_layer_masks as false.
- Thus also removed now unnecessary layer_mask_t

Changes in v8:
- Rebase on top of mic/next
- Add Co-developed-by: Justin Suess for handling this rebase initially
- layer_mask_t was removed in [1] but we still need it for the
  collected_rule_flags.  Rather than using raw u16, I've chosen to
  re-define it back in ruleset.h (it was in access.h).

Changes in v7:
- Take rule_flags separately from landlock_request in
  is_access_to_paths_allowed to avoid writing to the landlock_request
  variable if CONFIG_AUDIT is disabled (to enable compiler elision).
- Due to the above change, we don't need rule_flags in landlock_request in
  this commit anymore (will be added later).

Changes in v6:
- Rebased to include the revised disconnected directory handling changes
  (without the "reverting" behaviour)

Changes in v5:
- Move rule_flags into landlock_request.  This lets us get rid of the
  extra parameters to is_access_to_paths_allowed (and later on,
  landlock_log_denial), and thus less code changes.

Changes in v3:
- Comment changes, move local variables, simplify if branch

Changes in v2:
- Comment changes
- Rebased to include disconnected directory handling changes on mic/next
  and add backing up of collected_rule_flags.

[1]: https://lore.kernel.org/all/20260125195853.109967-1-gnoack3000@gmail.com/
[2]: https://lore.kernel.org/all/20251221194301.247484-1-utilityemal77@gmail.com/
[3]: https://lore.kernel.org/all/20260524.eFiz4hahrami@digikod.net/

 security/landlock/access.h  |  35 +++++++--
 security/landlock/audit.c   |  20 ++---
 security/landlock/audit.h   |   2 +-
 security/landlock/domain.c  |  19 ++---
 security/landlock/domain.h  |   2 +-
 security/landlock/fs.c      | 147 +++++++++++++++++++-----------------
 security/landlock/limits.h  |   3 +
 security/landlock/net.c     |   2 +-
 security/landlock/ruleset.c |  33 +++++---
 security/landlock/ruleset.h |  17 ++++-
 10 files changed, 170 insertions(+), 110 deletions(-)

diff --git a/security/landlock/access.h b/security/landlock/access.h
index c19d5bc13944..3b8ba6c1300d 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -62,18 +62,37 @@ static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
 	      sizeof(typeof_member(union access_masks_all, all)));
 
 /**
- * struct layer_access_masks - A boolean matrix of layers and access rights
+ * struct layer_mask - The unfulfilled access rights and rule flags for
+ * a layer.
  *
- * This has a bit for each combination of layer numbers and access rights.
- * During access checks, it is used to represent the access rights for each
- * layer which still need to be fulfilled.  When all bits are 0, the access
- * request is considered to be fulfilled.
+ * During access checks, @access is used to represent the access rights
+ * for each layer which still need to be fulfilled.  When all bits in
+ * @access is 0, the access request is allowed by this layer.
+ *
+ * @quiet is used to store whether we have encountered a rule with the
+ * quiet flag for this layer, which will be used to control audit logging.
+ */
+struct layer_mask {
+	access_mask_t access:LANDLOCK_NUM_ACCESS_MAX;
+#ifdef CONFIG_AUDIT
+	bool quiet:1;
+#endif /* CONFIG_AUDIT */
+};
+
+/*
+ * Make sure that we don't increase the size of struct layer_mask when
+ * storing rule flags.
+ */
+static_assert(sizeof(struct layer_mask) == sizeof(access_mask_t));
+
+/**
+ * struct layer_masks - An array of struct layer_mask, one per layer.
  */
-struct layer_access_masks {
+struct layer_masks {
 	/**
-	 * @access: The unfulfilled access rights for each layer.
+	 * @layers: The unfulfilled access rights for each layer.
 	 */
-	access_mask_t access[LANDLOCK_MAX_NUM_LAYERS];
+	struct layer_mask layers[LANDLOCK_MAX_NUM_LAYERS];
 };
 
 /*
diff --git a/security/landlock/audit.c b/security/landlock/audit.c
index 851647197a01..8c56f7f6467a 100644
--- a/security/landlock/audit.c
+++ b/security/landlock/audit.c
@@ -187,11 +187,11 @@ static void test_get_hierarchy(struct kunit *const test)
 /* Get the youngest layer that denied the access_request. */
 static size_t get_denied_layer(const struct landlock_ruleset *const domain,
 			       access_mask_t *const access_request,
-			       const struct layer_access_masks *masks)
+			       const struct layer_masks *masks)
 {
-	for (ssize_t i = ARRAY_SIZE(masks->access) - 1; i >= 0; i--) {
-		if (masks->access[i] & *access_request) {
-			*access_request &= masks->access[i];
+	for (ssize_t i = ARRAY_SIZE(masks->layers) - 1; i >= 0; i--) {
+		if (masks->layers[i].access & *access_request) {
+			*access_request &= masks->layers[i].access;
 			return i;
 		}
 	}
@@ -208,12 +208,12 @@ static void test_get_denied_layer(struct kunit *const test)
 	const struct landlock_ruleset dom = {
 		.num_layers = 5,
 	};
-	const struct layer_access_masks masks = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE |
-			     LANDLOCK_ACCESS_FS_READ_DIR,
-		.access[1] = LANDLOCK_ACCESS_FS_READ_FILE |
-			     LANDLOCK_ACCESS_FS_READ_DIR,
-		.access[2] = LANDLOCK_ACCESS_FS_REMOVE_DIR,
+	const struct layer_masks masks = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE |
+				    LANDLOCK_ACCESS_FS_READ_DIR,
+		.layers[1].access = LANDLOCK_ACCESS_FS_READ_FILE |
+				    LANDLOCK_ACCESS_FS_READ_DIR,
+		.layers[2].access = LANDLOCK_ACCESS_FS_REMOVE_DIR,
 	};
 	access_mask_t access;
 
diff --git a/security/landlock/audit.h b/security/landlock/audit.h
index 56778331b58c..b85d752273ac 100644
--- a/security/landlock/audit.h
+++ b/security/landlock/audit.h
@@ -43,7 +43,7 @@ struct landlock_request {
 	access_mask_t access;
 
 	/* Required fields for requests with layer masks. */
-	const struct layer_access_masks *layer_masks;
+	const struct layer_masks *layer_masks;
 
 	/* Required fields for requests with deny masks. */
 	const access_mask_t all_existing_optional_access;
diff --git a/security/landlock/domain.c b/security/landlock/domain.c
index 5dd06f7c2312..d1a4d8b33ee1 100644
--- a/security/landlock/domain.c
+++ b/security/landlock/domain.c
@@ -184,7 +184,7 @@ static void test_get_layer_deny_mask(struct kunit *const test)
 deny_masks_t
 landlock_get_deny_masks(const access_mask_t all_existing_optional_access,
 			const access_mask_t optional_access,
-			const struct layer_access_masks *const masks)
+			const struct layer_masks *const masks)
 {
 	const unsigned long access_opt = optional_access;
 	unsigned long access_bit;
@@ -201,8 +201,9 @@ landlock_get_deny_masks(const access_mask_t all_existing_optional_access,
 	if (WARN_ON_ONCE(!access_opt))
 		return 0;
 
-	for (ssize_t i = ARRAY_SIZE(masks->access) - 1; i >= 0; i--) {
-		const access_mask_t denied = masks->access[i] & optional_access;
+	for (ssize_t i = ARRAY_SIZE(masks->layers) - 1; i >= 0; i--) {
+		const access_mask_t denied = masks->layers[i].access &
+					     optional_access;
 		const unsigned long newly_denied = denied & ~all_denied;
 
 		if (!newly_denied)
@@ -222,12 +223,12 @@ landlock_get_deny_masks(const access_mask_t all_existing_optional_access,
 
 static void test_landlock_get_deny_masks(struct kunit *const test)
 {
-	const struct layer_access_masks layers1 = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE |
-			     LANDLOCK_ACCESS_FS_IOCTL_DEV,
-		.access[1] = LANDLOCK_ACCESS_FS_TRUNCATE,
-		.access[2] = LANDLOCK_ACCESS_FS_IOCTL_DEV,
-		.access[9] = LANDLOCK_ACCESS_FS_EXECUTE,
+	const struct layer_masks layers1 = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE |
+				    LANDLOCK_ACCESS_FS_IOCTL_DEV,
+		.layers[1].access = LANDLOCK_ACCESS_FS_TRUNCATE,
+		.layers[2].access = LANDLOCK_ACCESS_FS_IOCTL_DEV,
+		.layers[9].access = LANDLOCK_ACCESS_FS_EXECUTE,
 	};
 
 	KUNIT_EXPECT_EQ(test, 0x1,
diff --git a/security/landlock/domain.h b/security/landlock/domain.h
index 35cac8f6daee..af100a8cd939 100644
--- a/security/landlock/domain.h
+++ b/security/landlock/domain.h
@@ -119,7 +119,7 @@ struct landlock_hierarchy {
 deny_masks_t
 landlock_get_deny_masks(const access_mask_t all_existing_optional_access,
 			const access_mask_t optional_access,
-			const struct layer_access_masks *const masks);
+			const struct layer_masks *const masks);
 
 int landlock_init_hierarchy_log(struct landlock_hierarchy *const hierarchy);
 
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index c1ecfe239032..b7357643b8c7 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -406,15 +406,15 @@ static const struct access_masks any_fs = {
  * src_parent would result in having the same or fewer access rights if it were
  * moved under new_parent.
  */
-static bool may_refer(const struct layer_access_masks *const src_parent,
-		      const struct layer_access_masks *const src_child,
-		      const struct layer_access_masks *const new_parent,
+static bool may_refer(const struct layer_masks *const src_parent,
+		      const struct layer_masks *const src_child,
+		      const struct layer_masks *const new_parent,
 		      const bool child_is_dir)
 {
-	for (size_t i = 0; i < ARRAY_SIZE(new_parent->access); i++) {
-		access_mask_t child_access = src_parent->access[i] &
-					     src_child->access[i];
-		access_mask_t parent_access = new_parent->access[i];
+	for (size_t i = 0; i < ARRAY_SIZE(new_parent->layers); i++) {
+		access_mask_t child_access = src_parent->layers[i].access &
+					     src_child->layers[i].access;
+		access_mask_t parent_access = new_parent->layers[i].access;
 
 		if (!child_is_dir) {
 			child_access &= ACCESS_FILE;
@@ -436,11 +436,11 @@ static bool may_refer(const struct layer_access_masks *const src_parent,
  * that child2 may be used from parent2 to parent1 without increasing its access
  * rights), false otherwise.
  */
-static bool no_more_access(const struct layer_access_masks *const parent1,
-			   const struct layer_access_masks *const child1,
+static bool no_more_access(const struct layer_masks *const parent1,
+			   const struct layer_masks *const child1,
 			   const bool child1_is_dir,
-			   const struct layer_access_masks *const parent2,
-			   const struct layer_access_masks *const child2,
+			   const struct layer_masks *const parent2,
+			   const struct layer_masks *const child2,
 			   const bool child2_is_dir)
 {
 	if (!may_refer(parent1, child1, parent2, child1_is_dir))
@@ -459,25 +459,25 @@ static bool no_more_access(const struct layer_access_masks *const parent1,
 
 static void test_no_more_access(struct kunit *const test)
 {
-	const struct layer_access_masks rx0 = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE |
-			     LANDLOCK_ACCESS_FS_READ_FILE,
+	const struct layer_masks rx0 = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE |
+				    LANDLOCK_ACCESS_FS_READ_FILE,
 	};
-	const struct layer_access_masks mx0 = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE |
-			     LANDLOCK_ACCESS_FS_MAKE_REG,
+	const struct layer_masks mx0 = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE |
+				    LANDLOCK_ACCESS_FS_MAKE_REG,
 	};
-	const struct layer_access_masks x0 = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE,
+	const struct layer_masks x0 = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE,
 	};
-	const struct layer_access_masks x1 = {
-		.access[1] = LANDLOCK_ACCESS_FS_EXECUTE,
+	const struct layer_masks x1 = {
+		.layers[1].access = LANDLOCK_ACCESS_FS_EXECUTE,
 	};
-	const struct layer_access_masks x01 = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE,
-		.access[1] = LANDLOCK_ACCESS_FS_EXECUTE,
+	const struct layer_masks x01 = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE,
+		.layers[1].access = LANDLOCK_ACCESS_FS_EXECUTE,
 	};
-	const struct layer_access_masks allows_all = {};
+	const struct layer_masks allows_all = {};
 
 	/* Checks without restriction. */
 	NMA_TRUE(&x0, &allows_all, false, &allows_all, NULL, false);
@@ -565,9 +565,13 @@ static void test_no_more_access(struct kunit *const test)
 #undef NMA_TRUE
 #undef NMA_FALSE
 
-static bool is_layer_masks_allowed(const struct layer_access_masks *masks)
+static bool is_layer_masks_allowed(const struct layer_masks *masks)
 {
-	return mem_is_zero(&masks->access, sizeof(masks->access));
+	for (size_t i = 0; i < ARRAY_SIZE(masks->layers); i++) {
+		if (masks->layers[i].access)
+			return false;
+	}
+	return true;
 }
 
 /*
@@ -576,16 +580,16 @@ static bool is_layer_masks_allowed(const struct layer_access_masks *masks)
  * Returns true if the request is allowed, false otherwise.
  */
 static bool scope_to_request(const access_mask_t access_request,
-			     struct layer_access_masks *masks)
+			     struct layer_masks *masks)
 {
 	bool saw_unfulfilled_access = false;
 
 	if (WARN_ON_ONCE(!masks))
 		return true;
 
-	for (size_t i = 0; i < ARRAY_SIZE(masks->access); i++) {
-		masks->access[i] &= access_request;
-		if (masks->access[i])
+	for (size_t i = 0; i < ARRAY_SIZE(masks->layers); i++) {
+		masks->layers[i].access &= access_request;
+		if (masks->layers[i].access)
 			saw_unfulfilled_access = true;
 	}
 	return !saw_unfulfilled_access;
@@ -596,41 +600,46 @@ static bool scope_to_request(const access_mask_t access_request,
 static void test_scope_to_request_with_exec_none(struct kunit *const test)
 {
 	/* Allows everything. */
-	struct layer_access_masks masks = {};
+	struct layer_masks masks = {};
 
 	/* Checks and scopes with execute. */
 	KUNIT_EXPECT_TRUE(test,
 			  scope_to_request(LANDLOCK_ACCESS_FS_EXECUTE, &masks));
-	KUNIT_EXPECT_EQ(test, 0, masks.access[0]);
+	KUNIT_EXPECT_EQ(test, 0, (access_mask_t)masks.layers[0].access);
 }
 
 static void test_scope_to_request_with_exec_some(struct kunit *const test)
 {
 	/* Denies execute and write. */
-	struct layer_access_masks masks = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE,
-		.access[1] = LANDLOCK_ACCESS_FS_WRITE_FILE,
+	struct layer_masks masks = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE,
+		.layers[1].access = LANDLOCK_ACCESS_FS_WRITE_FILE,
 	};
 
 	/* Checks and scopes with execute. */
 	KUNIT_EXPECT_FALSE(test, scope_to_request(LANDLOCK_ACCESS_FS_EXECUTE,
 						  &masks));
-	KUNIT_EXPECT_EQ(test, LANDLOCK_ACCESS_FS_EXECUTE, masks.access[0]);
-	KUNIT_EXPECT_EQ(test, 0, masks.access[1]);
+	/*
+	 * These casts to access_mask_t are needed because typeof(), used in
+	 * KUNIT_EXPECT_EQ(), does not work on bitfields.
+	 */
+	KUNIT_EXPECT_EQ(test, LANDLOCK_ACCESS_FS_EXECUTE,
+			(access_mask_t)masks.layers[0].access);
+	KUNIT_EXPECT_EQ(test, 0, (access_mask_t)masks.layers[1].access);
 }
 
 static void test_scope_to_request_without_access(struct kunit *const test)
 {
 	/* Denies execute and write. */
-	struct layer_access_masks masks = {
-		.access[0] = LANDLOCK_ACCESS_FS_EXECUTE,
-		.access[1] = LANDLOCK_ACCESS_FS_WRITE_FILE,
+	struct layer_masks masks = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_EXECUTE,
+		.layers[1].access = LANDLOCK_ACCESS_FS_WRITE_FILE,
 	};
 
 	/* Checks and scopes without access request. */
 	KUNIT_EXPECT_TRUE(test, scope_to_request(0, &masks));
-	KUNIT_EXPECT_EQ(test, 0, masks.access[0]);
-	KUNIT_EXPECT_EQ(test, 0, masks.access[1]);
+	KUNIT_EXPECT_EQ(test, 0, (access_mask_t)masks.layers[0].access);
+	KUNIT_EXPECT_EQ(test, 0, (access_mask_t)masks.layers[1].access);
 }
 
 #endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
@@ -639,15 +648,15 @@ static void test_scope_to_request_without_access(struct kunit *const test)
  * Returns true if there is at least one access right different than
  * LANDLOCK_ACCESS_FS_REFER.
  */
-static bool is_eacces(const struct layer_access_masks *masks,
+static bool is_eacces(const struct layer_masks *masks,
 		      const access_mask_t access_request)
 {
 	if (!masks)
 		return false;
 
-	for (size_t i = 0; i < ARRAY_SIZE(masks->access); i++) {
+	for (size_t i = 0; i < ARRAY_SIZE(masks->layers); i++) {
 		/* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */
-		if (masks->access[i] & access_request &
+		if (masks->layers[i].access & access_request &
 		    ~LANDLOCK_ACCESS_FS_REFER)
 			return true;
 	}
@@ -661,7 +670,7 @@ static bool is_eacces(const struct layer_access_masks *masks,
 
 static void test_is_eacces_with_none(struct kunit *const test)
 {
-	const struct layer_access_masks masks = {};
+	const struct layer_masks masks = {};
 
 	IE_FALSE(&masks, 0);
 	IE_FALSE(&masks, LANDLOCK_ACCESS_FS_REFER);
@@ -671,8 +680,8 @@ static void test_is_eacces_with_none(struct kunit *const test)
 
 static void test_is_eacces_with_refer(struct kunit *const test)
 {
-	const struct layer_access_masks masks = {
-		.access[0] = LANDLOCK_ACCESS_FS_REFER,
+	const struct layer_masks masks = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_REFER,
 	};
 
 	IE_FALSE(&masks, 0);
@@ -683,8 +692,8 @@ static void test_is_eacces_with_refer(struct kunit *const test)
 
 static void test_is_eacces_with_write(struct kunit *const test)
 {
-	const struct layer_access_masks masks = {
-		.access[0] = LANDLOCK_ACCESS_FS_WRITE_FILE,
+	const struct layer_masks masks = {
+		.layers[0].access = LANDLOCK_ACCESS_FS_WRITE_FILE,
 	};
 
 	IE_FALSE(&masks, 0);
@@ -743,11 +752,11 @@ static bool
 is_access_to_paths_allowed(const struct landlock_ruleset *const domain,
 			   const struct path *const path,
 			   const access_mask_t access_request_parent1,
-			   struct layer_access_masks *layer_masks_parent1,
+			   struct layer_masks *layer_masks_parent1,
 			   struct landlock_request *const log_request_parent1,
 			   struct dentry *const dentry_child1,
 			   const access_mask_t access_request_parent2,
-			   struct layer_access_masks *layer_masks_parent2,
+			   struct layer_masks *layer_masks_parent2,
 			   struct landlock_request *const log_request_parent2,
 			   struct dentry *const dentry_child2)
 {
@@ -755,9 +764,9 @@ is_access_to_paths_allowed(const struct landlock_ruleset *const domain,
 	     child1_is_directory = true, child2_is_directory = true;
 	struct path walker_path;
 	access_mask_t access_masked_parent1, access_masked_parent2;
-	struct layer_access_masks _layer_masks_child1, _layer_masks_child2;
-	struct layer_access_masks *layer_masks_child1 = NULL,
-				  *layer_masks_child2 = NULL;
+	struct layer_masks _layer_masks_child1, _layer_masks_child2;
+	struct layer_masks *layer_masks_child1 = NULL,
+			   *layer_masks_child2 = NULL;
 
 	if (!access_request_parent1 && !access_request_parent2)
 		return true;
@@ -797,6 +806,10 @@ is_access_to_paths_allowed(const struct landlock_ruleset *const domain,
 	}
 
 	if (unlikely(dentry_child1)) {
+		/*
+		 * Get the layer masks for the child dentries for use by domain
+		 * check later.
+		 */
 		if (landlock_init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS,
 					      &_layer_masks_child1,
 					      LANDLOCK_KEY_INODE))
@@ -952,7 +965,7 @@ static int current_check_access_path(const struct path *const path,
 	};
 	const struct landlock_cred_security *const subject =
 		landlock_get_applicable_subject(current_cred(), masks, NULL);
-	struct layer_access_masks layer_masks;
+	struct layer_masks layer_masks;
 	struct landlock_request request = {};
 
 	if (!subject)
@@ -1029,7 +1042,7 @@ static access_mask_t maybe_remove(const struct dentry *const dentry)
 static bool collect_domain_accesses(const struct landlock_ruleset *const domain,
 				    const struct dentry *const mnt_root,
 				    struct dentry *dir,
-				    struct layer_access_masks *layer_masks_dom)
+				    struct layer_masks *layer_masks_dom)
 {
 	bool ret = false;
 
@@ -1135,8 +1148,7 @@ static int current_check_refer_path(struct dentry *const old_dentry,
 	access_mask_t access_request_parent1, access_request_parent2;
 	struct path mnt_dir;
 	struct dentry *old_parent;
-	struct layer_access_masks layer_masks_parent1 = {},
-				  layer_masks_parent2 = {};
+	struct layer_masks layer_masks_parent1 = {}, layer_masks_parent2 = {};
 	struct landlock_request request1 = {}, request2 = {};
 
 	if (!subject)
@@ -1202,7 +1214,6 @@ static int current_check_refer_path(struct dentry *const old_dentry,
 	allow_parent2 = collect_domain_accesses(subject->domain, mnt_dir.dentry,
 						new_dir->dentry,
 						&layer_masks_parent2);
-
 	if (allow_parent1 && allow_parent2)
 		return 0;
 
@@ -1580,7 +1591,7 @@ static int hook_path_truncate(const struct path *const path)
  */
 static void unmask_scoped_access(const struct landlock_ruleset *const client,
 				 const struct landlock_ruleset *const server,
-				 struct layer_access_masks *const masks,
+				 struct layer_masks *const masks,
 				 const access_mask_t access)
 {
 	int client_layer, server_layer;
@@ -1621,9 +1632,9 @@ static void unmask_scoped_access(const struct landlock_ruleset *const client,
 		server_walker = server_walker->parent;
 
 	for (; client_layer >= 0; client_layer--) {
-		if (masks->access[client_layer] & access &&
+		if (masks->layers[client_layer].access & access &&
 		    client_walker == server_walker)
-			masks->access[client_layer] &= ~access;
+			masks->layers[client_layer].access &= ~access;
 
 		client_walker = client_walker->parent;
 		server_walker = server_walker->parent;
@@ -1635,7 +1646,7 @@ static int hook_unix_find(const struct path *const path, struct sock *other,
 {
 	const struct landlock_ruleset *dom_other;
 	const struct landlock_cred_security *subject;
-	struct layer_access_masks layer_masks;
+	struct layer_masks layer_masks;
 	struct landlock_request request = {};
 	static const struct access_masks fs_resolve_unix = {
 		.fs = LANDLOCK_ACCESS_FS_RESOLVE_UNIX,
@@ -1739,7 +1750,7 @@ static bool is_device(const struct file *const file)
 
 static int hook_file_open(struct file *const file)
 {
-	struct layer_access_masks layer_masks = {};
+	struct layer_masks layer_masks = {};
 	access_mask_t open_access_request, full_access_request, allowed_access,
 		optional_access;
 	const struct landlock_cred_security *const subject =
@@ -1780,8 +1791,8 @@ static int hook_file_open(struct file *const file)
 		 * are still unfulfilled in any of the layers.
 		 */
 		allowed_access = full_access_request;
-		for (size_t i = 0; i < ARRAY_SIZE(layer_masks.access); i++)
-			allowed_access &= ~layer_masks.access[i];
+		for (size_t i = 0; i < ARRAY_SIZE(layer_masks.layers); i++)
+			allowed_access &= ~layer_masks.layers[i].access;
 	}
 
 	/*
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index a4d908b240a2..08d5f2f6d321 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -31,6 +31,9 @@
 #define LANDLOCK_MASK_SCOPE		((LANDLOCK_LAST_SCOPE << 1) - 1)
 #define LANDLOCK_NUM_SCOPE		__const_hweight64(LANDLOCK_MASK_SCOPE)
 
+#define LANDLOCK_NUM_ACCESS_MAX \
+	MAX(MAX(LANDLOCK_NUM_ACCESS_FS, LANDLOCK_NUM_ACCESS_NET), LANDLOCK_NUM_SCOPE)
+
 #define LANDLOCK_LAST_RESTRICT_SELF	LANDLOCK_RESTRICT_SELF_TSYNC
 #define LANDLOCK_MASK_RESTRICT_SELF	((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
 
diff --git a/security/landlock/net.c b/security/landlock/net.c
index db2046a89a9a..981a362c24db 100644
--- a/security/landlock/net.c
+++ b/security/landlock/net.c
@@ -48,7 +48,7 @@ static int current_check_access_socket(struct socket *const sock,
 				       bool connecting)
 {
 	__be16 port;
-	struct layer_access_masks layer_masks = {};
+	struct layer_masks layer_masks = {};
 	const struct landlock_rule *rule;
 	struct landlock_id id = {
 		.type = LANDLOCK_KEY_NET_PORT,
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 181df7736bb9..91948e406e69 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -628,7 +628,7 @@ landlock_find_rule(const struct landlock_ruleset *const ruleset,
  * remaining unfulfilled access rights and masks has no leftover set bits).
  */
 bool landlock_unmask_layers(const struct landlock_rule *const rule,
-			    struct layer_access_masks *masks)
+			    struct layer_masks *masks)
 {
 	if (!masks)
 		return true;
@@ -649,11 +649,17 @@ bool landlock_unmask_layers(const struct landlock_rule *const rule,
 		const struct landlock_layer *const layer = &rule->layers[i];
 
 		/* Clear the bits where the layer in the rule grants access. */
-		masks->access[layer->level - 1] &= ~layer->access;
+		masks->layers[layer->level - 1].access &= ~layer->access;
+
+#ifdef CONFIG_AUDIT
+		/* Collect rule flags for each layer. */
+		if (layer->flags.quiet)
+			masks->layers[layer->level - 1].quiet = true;
+#endif /* CONFIG_AUDIT */
 	}
 
-	for (size_t i = 0; i < ARRAY_SIZE(masks->access); i++) {
-		if (masks->access[i])
+	for (size_t i = 0; i < ARRAY_SIZE(masks->layers); i++) {
+		if (masks->layers[i].access)
 			return false;
 	}
 	return true;
@@ -668,6 +674,7 @@ get_access_mask_t(const struct landlock_ruleset *const ruleset,
  *
  * Populates @masks such that for each access right in @access_request,
  * the bits for all the layers are set where this access right is handled.
+ * Rule flags are also zeroed.
  *
  * @domain: The domain that defines the current restrictions.
  * @access_request: The requested access rights to check.
@@ -680,7 +687,7 @@ get_access_mask_t(const struct landlock_ruleset *const ruleset,
 access_mask_t
 landlock_init_layer_masks(const struct landlock_ruleset *const domain,
 			  const access_mask_t access_request,
-			  struct layer_access_masks *const masks,
+			  struct layer_masks *const masks,
 			  const enum landlock_key_type key_type)
 {
 	access_mask_t handled_accesses = 0;
@@ -709,11 +716,19 @@ landlock_init_layer_masks(const struct landlock_ruleset *const domain,
 	for (size_t i = 0; i < domain->num_layers; i++) {
 		const access_mask_t handled = get_access_mask(domain, i);
 
-		masks->access[i] = access_request & handled;
-		handled_accesses |= masks->access[i];
+		masks->layers[i].access = access_request & handled;
+		handled_accesses |= masks->layers[i].access;
+#ifdef CONFIG_AUDIT
+		masks->layers[i].quiet = false;
+#endif /* CONFIG_AUDIT */
+	}
+	for (size_t i = domain->num_layers; i < ARRAY_SIZE(masks->layers);
+	     i++) {
+		masks->layers[i].access = 0;
+#ifdef CONFIG_AUDIT
+		masks->layers[i].quiet = false;
+#endif /* CONFIG_AUDIT */
 	}
-	for (size_t i = domain->num_layers; i < ARRAY_SIZE(masks->access); i++)
-		masks->access[i] = 0;
 
 	return handled_accesses;
 }
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 889f4b30301a..8eec5dbf28a3 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -29,7 +29,18 @@ struct landlock_layer {
 	/**
 	 * @level: Position of this layer in the layer stack.  Starts from 1.
 	 */
-	u16 level;
+	u8 level;
+	/**
+	 * @flags: Bitfield for special flags attached to this rule.
+	 */
+	struct {
+		/**
+		 * @quiet: Suppresses denial audit logs for the object covered by
+		 * this rule in this domain.  For filesystem rules, this inherits
+		 * down the file hierarchy.
+		 */
+		bool quiet:1;
+	} flags;
 	/**
 	 * @access: Bitfield of allowed actions on the kernel object.  They are
 	 * relative to the object type (e.g. %LANDLOCK_ACTION_FS_READ).
@@ -302,12 +313,12 @@ landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
 }
 
 bool landlock_unmask_layers(const struct landlock_rule *const rule,
-			    struct layer_access_masks *masks);
+			    struct layer_masks *masks);
 
 access_mask_t
 landlock_init_layer_masks(const struct landlock_ruleset *const domain,
 			  const access_mask_t access_request,
-			  struct layer_access_masks *masks,
+			  struct layer_masks *masks,
 			  const enum landlock_key_type key_type);
 
 #endif /* _SECURITY_LANDLOCK_RULESET_H */
-- 
2.54.0

^ permalink raw reply related

* [PATCH v9 0/9] Implement LANDLOCK_ADD_RULE_QUIET
From: Tingmao Wang @ 2026-05-27  1:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Tingmao Wang, Günther Noack, Justin Suess, Jan Kara,
	Abhinav Saxena, linux-security-module

Hi,

This is the v9 of the "quiet flag" series, implementing the feature as
proposed in [1].

v8: https://lore.kernel.org/all/cover.1775490344.git.m@maowtm.org/
v7: https://lore.kernel.org/all/cover.1766330134.git.m@maowtm.org/
v6: https://lore.kernel.org/all/cover.1765040503.git.m@maowtm.org/
v5: https://lore.kernel.org/all/cover.1763931318.git.m@maowtm.org/
v4: https://lore.kernel.org/all/cover.1763330228.git.m@maowtm.org/
v3: https://lore.kernel.org/all/cover.1761511023.git.m@maowtm.org/
v2: https://lore.kernel.org/all/cover.1759686613.git.m@maowtm.org/
v1: https://lore.kernel.org/all/cover.1757376311.git.m@maowtm.org/

v8..v9:
- Refactor to store the collected rule flags in layer_masks instead
  (renamed from layer_access_masks).  Got rid of layer_mask_t again.
- Rebase sandboxer and net_tests on top of UDP support, resolving
  conflicts
- Additional small changes, noted in each patch

(Kept ABI version at 10)

All text following this line is unchanged
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

v7..v8:

- Rebase to mic/next
- Re-introduced layer_mask_t due to need in first patch
- Plumb through rule flags in hook_unix_find()
- Some selftests patches were not properly clang-format'd, fixed now.
- Minor env var handling change in sandboxer
- Fix selftests use of audit_count_records() without EXPECT_EQ

v6..v7:

- Remove "landlock: Fix wrong type usage" (merged)
- Revert back to taking rule_flags separately from landlock_request until
  we call landlock_log_denial (https://lore.kernel.org/all/20251219.ahn3aiJuKahb@digikod.net/)
- Rebase to mic/next

v5..v6 rebases on top of the new simpler disconnected directory handling,
change some bools into u32, and fix some typo and style.

v4..v5 addresses review feedbacks, most significantly:
  - reduces code changes by pushing rule_flags into landlock_request.
  - adding test cases for two layers handling different access bits.

v3..v4 is a one-character formatting change, plus more tests.

We now have 5 patches for the selftest - I'm happy to squash it into one
depending on preference (and happy for Mickaël to do the squash if no
other feedback):
- selftests/landlock: Replace hard-coded 16 with a constant
- selftests/landlock: add tests for quiet flag with fs rules
- selftests/landlock: add tests for quiet flag with net rules
- selftests/landlock: Add tests for quiet flag with scope
- selftests/landlock: Add tests for invalid use of quiet flag

v2..v3:
Not much has changed in the actual functionality except various comment,
typing, asserts and general style fixes based on feedback.  The major new
thing here is tests (a bit of KUnit squashed into the optional access
commit, a lot of selftests especially in fs_tests.c).

The added fs_tests should exercise code path for optional and non-optional
access, renames, and mountpoint and disconnected directory handling.  I
will add the above missing bits to v4.

Removed:
- "Implement quiet for optional accesses"
    (squashed into "landlock: Suppress logging when quiet flag is present")


Old feature summary below:

The quiet flag allows a sandboxer to suppress audit logs for uninteresting
denials.  The flag can be set on objects and inherits downward in the
filesystem hierarchy.  On a denial, the youngest denying layer's quiet
flag setting decides whether to audit.  The motivation for this feature is
to reduce audit noise, and also prepare for a future supervisor feature
which will use this bit to suppress supervisor notifications.

This patch introduces a new quiet access mask in the ruleset_attr, which
gets eventually stored in the hierarchy. This allows the user to specify
which access should be affected by quiet bits.  One can then, for example,
make it such that read accesses to certain files are not audited (but
still denied), but all writes are still audited, regardless of location.

The sandboxer is extended to show example usage of this feature,
supporting quieting filesystem, network and scope accesses.

Demo:

    /# LL_FS_RO=/usr LL_FS_RW= LL_FORCE_LOG=1 LL_FS_QUIET=/dev:/tmp:/etc LL_FS_QUIET_ACCESS=r ./sandboxer bash
    ...
    audit: type=1423 audit(1759680175.562:195): domain=15bb25f6b blockers=fs.write_file,fs.read_file path="/dev/tty" dev="devtmpfs" ino=11
    ^^^^^^^^
    # note: because write is not quieted, we see the above line. blockers
    # contains read as well since that's the originally requested access.
    audit: type=1424 audit(1759680175.562:195): domain=15bb25f6b status=allocated mode=enforcing pid=616 uid=0 exe="/sandboxer" comm="sandboxer"
    audit: type=1300 audit(1759680175.562:195): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=5565c86113d1 a2=802 a3=0 items=0 ppid=605 pid=616 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null)
    audit: type=1327 audit(1759680175.562:195): proctitle="bash"
    bash: cannot set terminal process group (605): Inappropriate ioctl for device
    bash: no job control in this shell
    bash: /etc/bash.bashrc: Permission denied
    audit: type=1423 audit(1759680175.570:196): domain=15bb25f6b blockers=fs.read_file path="/.bash_history" dev="virtiofs" ino=36963
    ^^^^^^^^
    # read outside /dev:/tmp:/etc - not quieted
    audit: type=1300 audit(1759680175.570:196): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=5565c868e400 a2=0 a3=0 items=0 ppid=605 pid=616 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null)
    audit: type=1327 audit(1759680175.570:196): proctitle="bash"
    audit: type=1423 audit(1759680175.570:197): domain=15bb25f6b blockers=fs.read_file path="/.bash_history" dev="virtiofs" ino=36963
    audit: type=1300 audit(1759680175.570:197): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=5565c868e400 a2=0 a3=0 items=0 ppid=605 pid=616 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null)
    audit: type=1327 audit(1759680175.570:197): proctitle="bash"

    bash-5.2# head /etc/passwd
    head: cannot open '/etc/passwd' for reading: Permission denied
    ^^^^^^^^
    # reads to /etc are quieted

    bash-5.2# echo evil >> /etc/passwd
    bash: /etc/passwd: Permission denied
    audit: type=1423 audit(1759680227.030:198): domain=15bb25f6b blockers=fs.write_file path="/etc/passwd" dev="virtiofs" ino=790
    ^^^^^^^^
    # writes are not quieted
    audit: type=1300 audit(1759680227.030:198): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=5565c86ab030 a2=441 a3=1b6 items=0 ppid=605 pid=616 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null)
    audit: type=1327 audit(1759680227.030:198): proctitle="bash"

Design:

- The user can set the quiet flag for a layer on any part of the fs
  hierarchy (whether it allows any access on it or not), and the flag
  inherits down (no support for "cancelling" the inheritance of the flag
  in specific subdirectories).

- The youngest layer that denies a request gets to decide whether the
  denial is audited or not.  This means that a compromised binary, for
  example, cannot "turn off" Landlock auditing when it tries to access
  files, unless it denies access to the files itself.  There is some
  debate to be had on whether, if a parent layer sets the quiet flag, but
  the request is denied by a deeper layer, whether Landlock should still
  audit anyway (since the rule author of the child layer likely did not
  expect the denial, so it would be good diagnostic).  The current
  approach is to ignore the quiet on the parent layer and audit anyway.

[1]: https://github.com/landlock-lsm/linux/issues/44#issuecomment-2876500918

Kind regards,
Tingmao

Tingmao Wang (9):
  landlock: Add a place for flags to layer rules
  landlock: Add API support and docs for the quiet flags
  landlock: Suppress logging when quiet flag is present
  samples/landlock: Add quiet flag support to sandboxer
  selftests/landlock: Replace hard-coded 16 with a constant
  selftests/landlock: add tests for quiet flag with fs rules
  selftests/landlock: add tests for quiet flag with net rules
  selftests/landlock: Add tests for quiet flag with scope
  selftests/landlock: Add tests for invalid use of quiet flag

 Documentation/admin-guide/LSM/landlock.rst    |    9 +-
 Documentation/userspace-api/landlock.rst      |   11 +
 include/uapi/linux/landlock.h                 |   61 +
 samples/landlock/sandboxer.c                  |  134 +-
 security/landlock/access.h                    |   40 +-
 security/landlock/audit.c                     |  281 +-
 security/landlock/audit.h                     |    3 +-
 security/landlock/domain.c                    |   52 +-
 security/landlock/domain.h                    |   12 +-
 security/landlock/fs.c                        |  180 +-
 security/landlock/fs.h                        |   19 +-
 security/landlock/limits.h                    |    3 +
 security/landlock/net.c                       |   22 +-
 security/landlock/net.h                       |    5 +-
 security/landlock/ruleset.c                   |   45 +-
 security/landlock/ruleset.h                   |   29 +-
 security/landlock/syscalls.c                  |   71 +-
 tools/testing/selftests/landlock/audit_test.c |   27 +-
 tools/testing/selftests/landlock/base_test.c  |   59 +-
 tools/testing/selftests/landlock/common.h     |    2 +
 tools/testing/selftests/landlock/fs_test.c    | 2450 ++++++++++++++++-
 tools/testing/selftests/landlock/net_test.c   |  138 +-
 .../landlock/scoped_abstract_unix_test.c      |   77 +-
 23 files changed, 3512 insertions(+), 218 deletions(-)


base-commit: fe7832557561ed6312563368854d5f8df1fa55e3
-- 
2.54.0

^ permalink raw reply

* Re: [PATCH] tomoyo: Fix NULL pointer dereference in tomoyo_init_request_info() when domain is NULL
From: Jiakai Xu @ 2026-05-27  0:57 UTC (permalink / raw)
  To: penguin-kernel
  Cc: jmorris, linux-kernel, linux-security-module, paul, serge,
	takedakn, xujiakai24
In-Reply-To: <973e7bd1-6919-46ad-aa3b-f4e02737462e@I-love.SAKURA.ne.jp>

> >> Thank you for a patch, but I don't think we need this change.
> > 
> > Thanks for your review! I understand your perspective, but I believe
> > the crash is a real NULL pointer dereference, and I'd like to explain
> > why the defensive check is warranted.
> > 
> >> TOMOYO's initial domain is &tomoyo_kernel_domain, and each thread belongs to
> >> a non-NULL domain. Therefore, tomoyo_domain() is not supposed to return NULL.
> > 
> > While tomoyo_domain() is not supposed to return NULL under normal
> > operation, there are code paths that leave s->domain_info == NULL:
> > 
> >   a) Pre-init window (security/tomoyo/tomoyo.c, lines 598-612):
> >      The task security blob is zero-allocated via kzalloc(), and
> >      security_add_hooks() at line 603 is called BEFORE
> >      s->domain_info = &tomoyo_kernel_domain at line 606. If any LSM
> >      hook fires during that window, tomoyo_domain() returns NULL.
> 
> This code is executed during early boot stage. Other LSM hooks are not
> supposed to fire.
> 
> > 
> >   b) tomoyo_task_free() (tomoyo.c, lines 533-545) explicitly sets
> >      s->domain_info = NULL after decrementing the refcount.
> 
> This code is executed when a "struct task_struct" is about to be released.
> Nobody can find this "struct task_struct". Also, this "struct task_struct"
> cannot be the current thread.
> 
> > 
> >   c) tomoyo_find_next_domain() (domain.c, lines 876-883) writes
> >      s->domain_info = NULL when the domain transition fails.
> 
> I couldn't catch, but old_domain is initialized as
> 
>   struct tomoyo_domain_info *old_domain = tomoyo_domain();
> 
> which cannot be NULL.
> 
> domain is guaranteed to be non-NULL because old_domain cannot be NULL.
> 
> 	if (!domain)
> 		domain = old_domain;
> 
> Therefore, s->domain_info is guaranteed to be non-NULL because domain cannot be NULL.
> 
> 	s->domain_info = domain;
> 
> If domain were NULL, the kernel should have already crashed at line 884.

Thank you for the thorough explanation! You are absolutely right,
and I really appreciate you taking the time to walk through each path.

> > 
> > I think adding a NULL check makes the code more robust. What do you 
> > think?
> 
> Then, this will be NULL pointer dereference.
> But fixing the location that is setting NULL is the correct approach.

I fully agree. The NULL check I proposed would only mask the symptom.
The real bug is that something corrupted the task_struct's security blob
and zeroed out domain_info before the ioctl hook fired.

Unfortunately, I don't have a reliable reproducer. The fuzzer triggered
this only once on riscv, so I can't easily track down the source of the 
corruption.

Either way, thank you again for the review. I learned a lot about
TOMOYO's domain lifecycle from your explanation.

Best regards,
Jiakai


^ permalink raw reply

* Re: security_task_prctl: why -ENOSYS
From: Casey Schaufler @ 2026-05-26 23:42 UTC (permalink / raw)
  To: William Roberts, LSM, SElinux list; +Cc: Casey Schaufler
In-Reply-To: <CAFftDdqp8b5n21hdEP6w0-PK+CoG7mmQxgrnhpBLU7-1GZxniw@mail.gmail.com>

On 5/26/2026 4:21 PM, William Roberts wrote:
> On Tue, May 26, 2026 at 5:39 PM William Roberts
> <bill.c.roberts@gmail.com> wrote:
>> Hello,
>>
>> I am trying to understand the motivation behind having
>> security_task_prctl only continue if the return value is -ENOSYS. This
>> seems to be very different from other LSM hooks I have investigated.
>> For example, in other hooks, the value from SE Linux avc_has_perms is
>> used directly. This essentially means that a 0 will cause the check to
>> pass, and anything < 0 usually an error.
>>
>> In commit:
>> ----
>> commit d84f4f992cbd76e8f39c488cf0c5d123843923b1 ("CRED: Inaugurate COW
>> credentials")
>>
>> (8) security_task_prctl() and cap_task_prctl().
>>
>>          security_task_prctl() has been modified to return -ENOSYS if it doesn't
>>          want to handle a function, or otherwise return the return
>> value directly
>>          rather than through an argument.
>>
>>          Additionally, cap_task_prctl() now prepares a new set of
>> credentials, even
>>          if it doesn't end up using it.
>> ----
>>
>> The check in kernel/sys.c is currently:
>>         error = security_task_prctl(option, arg2, arg3, arg4, arg5);
>>         if (error != -ENOSYS)
>>                 return error;
>>
>> Should this be something like, "error && error != -ENOSYS"?
>>
>> I ask because I am looking to leverage this hook in SE Linux, and it's
>> annoying to have to coerce all 0 returns to -ENOSYS.
> Of course after hours of banging my head and one email sent, it's more clear to
> me now WHY. This hook isn't meant for making yes or no decisions on an operation
> but rather to also handle special prctl flags for an LSM in question.
>
> I guess with the said, do we want this interface to be used for both
> a, let the lsm handle
> this prctl flag directed to me, as well as a yes/no security decision
> or do we want to split
> this out into two hooks?

The task_prctl hook is used in capability and yama. It is only used to
provide a place to process LSM specific prctl options. It is not used to
make security decisions outside of processing the LSM's options. If you
want to make security decisions on general prctl options you will need
a new hook.

>
>> Thanks,
>> Bill

^ permalink raw reply

* Re: security_task_prctl: why -ENOSYS
From: William Roberts @ 2026-05-26 23:21 UTC (permalink / raw)
  To: LSM, SElinux list
In-Reply-To: <CAFftDdrzV2xC7HBcgS8EPb-YewJyCK9uos4crXOCh5ONm6SxSg@mail.gmail.com>

On Tue, May 26, 2026 at 5:39 PM William Roberts
<bill.c.roberts@gmail.com> wrote:
>
> Hello,
>
> I am trying to understand the motivation behind having
> security_task_prctl only continue if the return value is -ENOSYS. This
> seems to be very different from other LSM hooks I have investigated.
> For example, in other hooks, the value from SE Linux avc_has_perms is
> used directly. This essentially means that a 0 will cause the check to
> pass, and anything < 0 usually an error.
>
> In commit:
> ----
> commit d84f4f992cbd76e8f39c488cf0c5d123843923b1 ("CRED: Inaugurate COW
> credentials")
>
> (8) security_task_prctl() and cap_task_prctl().
>
>          security_task_prctl() has been modified to return -ENOSYS if it doesn't
>          want to handle a function, or otherwise return the return
> value directly
>          rather than through an argument.
>
>          Additionally, cap_task_prctl() now prepares a new set of
> credentials, even
>          if it doesn't end up using it.
> ----
>
> The check in kernel/sys.c is currently:
>         error = security_task_prctl(option, arg2, arg3, arg4, arg5);
>         if (error != -ENOSYS)
>                 return error;
>
> Should this be something like, "error && error != -ENOSYS"?
>
> I ask because I am looking to leverage this hook in SE Linux, and it's
> annoying to have to coerce all 0 returns to -ENOSYS.

Of course after hours of banging my head and one email sent, it's more clear to
me now WHY. This hook isn't meant for making yes or no decisions on an operation
but rather to also handle special prctl flags for an LSM in question.

I guess with the said, do we want this interface to be used for both
a, let the lsm handle
this prctl flag directed to me, as well as a yes/no security decision
or do we want to split
this out into two hooks?

>
> Thanks,
> Bill

^ permalink raw reply

* security_task_prctl: why -ENOSYS
From: William Roberts @ 2026-05-26 22:39 UTC (permalink / raw)
  To: LSM, SElinux list

Hello,

I am trying to understand the motivation behind having
security_task_prctl only continue if the return value is -ENOSYS. This
seems to be very different from other LSM hooks I have investigated.
For example, in other hooks, the value from SE Linux avc_has_perms is
used directly. This essentially means that a 0 will cause the check to
pass, and anything < 0 usually an error.

In commit:
----
commit d84f4f992cbd76e8f39c488cf0c5d123843923b1 ("CRED: Inaugurate COW
credentials")

(8) security_task_prctl() and cap_task_prctl().

         security_task_prctl() has been modified to return -ENOSYS if it doesn't
         want to handle a function, or otherwise return the return
value directly
         rather than through an argument.

         Additionally, cap_task_prctl() now prepares a new set of
credentials, even
         if it doesn't end up using it.
----

The check in kernel/sys.c is currently:
        error = security_task_prctl(option, arg2, arg3, arg4, arg5);
        if (error != -ENOSYS)
                return error;

Should this be something like, "error && error != -ENOSYS"?

I ask because I am looking to leverage this hook in SE Linux, and it's
annoying to have to coerce all 0 returns to -ENOSYS.

Thanks,
Bill

^ permalink raw reply

* Re: [PATCH v2 2/2] security: smack: fix spelling mistake
From: Casey Schaufler @ 2026-05-26 21:52 UTC (permalink / raw)
  To: fffsqian, paul, jmorris, serge
  Cc: linux-security-module, linux-kernel, Qingshuang Fu,
	Casey Schaufler
In-Reply-To: <20260526013834.399816-1-fffsqian@163.com>

On 5/25/2026 6:38 PM, fffsqian@163.com wrote:
> From: Qingshuang Fu <fuqingshuang@kylinos.cn>
>
> Fix misspelling: overriden → overridden
>
> Signed-off-by: Qingshuang Fu <fuqingshuang@kylinos.cn>

Thank you. I will be taking this in the Smack tree.

>
> Changes since v1:
> - Split original single patch into two standalone patches,
>   separate AppArmor and Smack changes for different maintainer trees.
> ---
>  security/smack/smackfs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
> index 6e62dcb36f74..2820bd3ee72e 100644
> --- a/security/smack/smackfs.c
> +++ b/security/smack/smackfs.c
> @@ -115,7 +115,7 @@ struct smack_known *smack_syslog_label;
>  /*
>   * Ptrace current rule
>   * SMACK_PTRACE_DEFAULT    regular smack ptrace rules (/proc based)
> - * SMACK_PTRACE_EXACT      labels must match, but can be overriden with
> + * SMACK_PTRACE_EXACT      labels must match, but can be overridden with
>   *			   CAP_SYS_PTRACE
>   * SMACK_PTRACE_DRACONIAN  labels must match, CAP_SYS_PTRACE has no effect
>   */

^ permalink raw reply

* Re: [PATCH v2 06/17] landlock: Add create_ruleset and free_ruleset tracepoints
From: Justin Suess @ 2026-05-26 21:34 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Günther Noack, Steven Rostedt, Jann Horn,
	Jeff Xu, Kees Cook, Masami Hiramatsu, Mathieu Desnoyers,
	Matthieu Buffet, Mikhail Ivanov, Tingmao Wang, kernel-team,
	linux-fsdevel, linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-7-mic@digikod.net>

On Mon, Apr 06, 2026 at 04:37:04PM +0200, Mickaël Salaün wrote:
> Add tracepoints for ruleset lifecycle events: landlock_create_ruleset
> fires from the landlock_create_ruleset() syscall handler, logging the
> ruleset Landlock ID and handled access masks; landlock_free_ruleset
> fires in free_ruleset() before the ruleset is freed, so eBPF programs
> can access the full ruleset state via BTF.
> 
> The create_ruleset TP_PROTO takes only the ruleset pointer.  The handled
> access masks are read from the ruleset in TP_fast_assign rather than
> passed as scalar arguments, so eBPF programs can access the full ruleset
> state (rules, access masks) via BTF on a single pointer.  No lock is
> needed because the ruleset is not yet shared (the file descriptor has
> not been installed).
> 
> Create the trace header with a DOC comment documenting the consistency
> guarantees, locking conventions, TP_PROTO safety, and security
> considerations shared by all Landlock tracepoints.  Add
> CREATE_TRACE_POINTS in log.c to generate the tracepoint implementations.
> 
> Add an id field to struct landlock_ruleset, assigned from
> landlock_get_id_range() at creation time.  Extend the CONFIG guard on
> landlock_get_id_range() from CONFIG_AUDIT to
> CONFIG_SECURITY_LANDLOCK_LOG so that IDs are available for tracing even
> without audit support.
> 
> The deallocation events use the "free_" prefix (rather than "drop_")
> because they fire when the object is actually freed.  There is no need
> for allocated/deallocated symmetry because ruleset creation happens with
> the landlock_create_ruleset tracepoint.
> 
> landlock_create_ruleset tracepoint.
> 
> Unlike audit records which share a record type and need a "status="
> field to distinguish allocation from deallocation, tracepoints provide
> one event type per lifecycle transition, each with a type-safe TP_PROTO
> matching the specific transition.  This enables type-safe eBPF BTF
> access and precise ftrace filtering by event name.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: Justin Suess <utilityemal77@gmail.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tingmao Wang <m@maowtm.org>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> - New patch (split from the v1 add_rule_fs tracepoint patch).
> ---
>  MAINTAINERS                     |  1 +
>  include/trace/events/landlock.h | 94 +++++++++++++++++++++++++++++++++
>  security/landlock/id.h          |  6 +--
>  security/landlock/log.c         |  5 ++
>  security/landlock/ruleset.c     |  8 +++
>  security/landlock/ruleset.h     |  9 ++++
>  security/landlock/syscalls.c    |  5 ++
>  7 files changed, 125 insertions(+), 3 deletions(-)
>  create mode 100644 include/trace/events/landlock.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c3fe46d7c4bc..51104faa3951 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14389,6 +14389,7 @@ F:	Documentation/admin-guide/LSM/landlock.rst
>  F:	Documentation/security/landlock.rst
>  F:	Documentation/userspace-api/landlock.rst
>  F:	fs/ioctl.c
> +F:	include/trace/events/landlock.h
>  F:	include/uapi/linux/landlock.h
>  F:	samples/landlock/
>  F:	security/landlock/
> diff --git a/include/trace/events/landlock.h b/include/trace/events/landlock.h
> new file mode 100644
> index 000000000000..5e847844fbf7
> --- /dev/null
> +++ b/include/trace/events/landlock.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright © 2025 Microsoft Corporation
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM landlock
> +
> +#if !defined(_TRACE_LANDLOCK_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_LANDLOCK_H
> +
> +#include <linux/tracepoint.h>
> +
> +struct landlock_ruleset;
> +
> +/**
> + * DOC: Landlock trace events
> + *
> + * Consistency guarantee: every trace event corresponds to an operation
> + * that has irrevocably succeeded.  Lifecycle events fire only after
> + * the point of no return; denial events fire only for denials that
> + * actually happen.  This guarantees that eBPF programs observing the
> + * trace stream can build a faithful model of Landlock state without
> + * reconciliation logic.
> + *
> + * Mutable object pointers in TP_PROTO (e.g., struct landlock_ruleset
> + * for add_rule events) are passed while the caller holds the object's
> + * lock, so that TP_fast_assign and eBPF programs reading via BTF see a
> + * consistent snapshot.  For objects that are immutable at the emission
> + * site (e.g., a domain after creation), no lock is needed.
> + *
> + * All pointer arguments in TP_PROTO are guaranteed non-NULL by the
> + * caller.  eBPF programs can access these pointers via BTF for richer
> + * introspection than the TP_STRUCT__entry fields provide.
> + *
> + * TP_STRUCT__entry fields serve TP_printk display only.  eBPF programs
> + * access the raw TP_PROTO arguments directly.
> + *
> + * Security: as for audit, Landlock trace events may expose sensitive
> + * information about all sandboxed processes on the system.  See
> + * Documentation/admin-guide/LSM/landlock.rst for security considerations
> + * and privilege requirements.
> + */
> +
> +/**
> + * landlock_create_ruleset - new ruleset created
> + * @ruleset: Newly created ruleset (never NULL); not yet shared via an fd,
> + *           so no lock is needed.  eBPF programs can read the full ruleset
> + *           state via BTF.
> + */
> +TRACE_EVENT(
> +	landlock_create_ruleset,
> +
> +	TP_PROTO(const struct landlock_ruleset *ruleset),
> +
> +	TP_ARGS(ruleset),
> +
> +	TP_STRUCT__entry(__field(__u64, ruleset_id) __field(access_mask_t,
> +							    handled_fs)
> +				 __field(access_mask_t, handled_net)
> +					 __field(access_mask_t, scoped)),
> +
> +	TP_fast_assign(__entry->ruleset_id = ruleset->id;
> +		       __entry->handled_fs = ruleset->layer.fs;
> +		       __entry->handled_net = ruleset->layer.net;
> +		       __entry->scoped = ruleset->layer.scope;),
> +
> +	TP_printk("ruleset=%llx handled_fs=0x%x handled_net=0x%x scoped=0x%x",
> +		  __entry->ruleset_id, __entry->handled_fs,
> +		  __entry->handled_net, __entry->scoped));
> +
> +/**
> + * landlock_free_ruleset - Ruleset freed
> + *
> + * Emitted when a ruleset's last reference is dropped (typically when
> + * the creating process closes the ruleset file descriptor).
> + */
> +TRACE_EVENT(landlock_free_ruleset,
> +
> +	    TP_PROTO(const struct landlock_ruleset *ruleset),
> +
> +	    TP_ARGS(ruleset),
> +
> +	    TP_STRUCT__entry(__field(__u64, ruleset_id)),
> +
> +	    TP_fast_assign(__entry->ruleset_id = ruleset->id;),
> +
> +	    TP_printk("ruleset=%llx", __entry->ruleset_id));
> +
> +#endif /* _TRACE_LANDLOCK_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/security/landlock/id.h b/security/landlock/id.h
> index 45dcfb9e9a8b..2a43c2b523a8 100644
> --- a/security/landlock/id.h
> +++ b/security/landlock/id.h
> @@ -8,18 +8,18 @@
>  #ifndef _SECURITY_LANDLOCK_ID_H
>  #define _SECURITY_LANDLOCK_ID_H
>  
> -#ifdef CONFIG_AUDIT
> +#ifdef CONFIG_SECURITY_LANDLOCK_LOG
>  
>  void __init landlock_init_id(void);
>  
>  u64 landlock_get_id_range(size_t number_of_ids);
>  
> -#else /* CONFIG_AUDIT */
> +#else /* CONFIG_SECURITY_LANDLOCK_LOG */
>  
>  static inline void __init landlock_init_id(void)
>  {
>  }
>  
> -#endif /* CONFIG_AUDIT */
> +#endif /* CONFIG_SECURITY_LANDLOCK_LOG */
>  
>  #endif /* _SECURITY_LANDLOCK_ID_H */
> diff --git a/security/landlock/log.c b/security/landlock/log.c
> index c9b506707af0..ef79e4ed0037 100644
> --- a/security/landlock/log.c
> +++ b/security/landlock/log.c
> @@ -174,6 +174,11 @@ static void audit_denial(const struct landlock_cred_security *const subject,
>  
>  #endif /* CONFIG_AUDIT */
>  
> +#ifdef CONFIG_TRACEPOINTS
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/landlock.h>
> +#endif /* CONFIG_TRACEPOINTS */
> +
>  static struct landlock_hierarchy *
>  get_hierarchy(const struct landlock_domain *const domain, const size_t layer)
>  {
> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
> index c220e0f9cf5f..0d1e3dadb318 100644
> --- a/security/landlock/ruleset.c
> +++ b/security/landlock/ruleset.c
> @@ -22,10 +22,13 @@
>  #include <linux/spinlock.h>
>  
>  #include "access.h"
> +#include "id.h"
>  #include "limits.h"
>  #include "object.h"
>  #include "ruleset.h"
>  
> +#include <trace/events/landlock.h>
> +
>  struct landlock_ruleset *
>  landlock_create_ruleset(const access_mask_t fs_access_mask,
>  			const access_mask_t net_access_mask,
> @@ -49,6 +52,10 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
>  	new_ruleset->rules.root_net_port = RB_ROOT;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +#ifdef CONFIG_SECURITY_LANDLOCK_LOG
> +	new_ruleset->id = landlock_get_id_range(1);
> +#endif /* CONFIG_SECURITY_LANDLOCK_LOG */
The addition of IDs to rulesets for logging makes sense.

But it is limited in usefulness without some form of introspection to be
able to correlate it to a specific userspace ruleset.

If a program creates multiple Landlock rulesets, and wishes to trace and
correlate which ruleset FD corresponds to the log/tracepoint, it is
difficult when no form of introspection exists.

Maybe a syscall flag or ioctl to retrieve the identifier for correlation
would be useful?

Justin
> +
>  	/* Should already be checked in sys_landlock_create_ruleset(). */
>  	if (fs_access_mask) {
>  		WARN_ON_ONCE(fs_access_mask !=
> @@ -312,6 +319,7 @@ void landlock_free_rules(struct landlock_rules *const rules)
>  static void free_ruleset(struct landlock_ruleset *const ruleset)
> [...]

^ permalink raw reply

* Re: [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Aaron Tomlin @ 2026-05-26 19:53 UTC (permalink / raw)
  To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny
  Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	kprateek.nayak, omosnace, kees, neelx, sean, chjohnst, steve,
	mproche, nick.lange, cgroups, bpf, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel
In-Reply-To: <20260526142838.774711-1-atomlin@atomlin.com>

On Tue, May 26, 2026 at 10:28:38AM -0400, Aaron Tomlin wrote:
> At present, the task_setscheduler LSM hook provides security modules
> with the opportunity to mediate changes to a task's scheduling policy.
> However, when invoked via sched_setaffinity(), the hook lacks
> visibility into the actual CPU affinity mask being requested.
> Consequently, BPF-based security modules are entirely blind to the
> target CPUs and cannot make granular access control decisions based on
> spatial isolation.
> 
> In modern multi-tenant and real-time environments, CPU isolation is a
> critical boundary. The inability to audit or restrict specific CPU
> pinning requests limits the effectiveness of eBPF-driven security
> policies, particularly when attempting to shield isolated or
> cryptographic cores from unprivileged or compromised tasks.
> 
> This patch expands the security_task_setscheduler() hook signature to
> include a pointer to the requested cpumask. Because this is a shared
> hook used for multiple scheduling attribute changes, call sites that do
> not modify CPU affinity are updated to safely pass NULL.
> To protect against unverified dereferences, the parameter is annotated
> with __nullable in the LSM hook definition, ensuring the BPF verifier
> mandates explicit NULL checks for attached eBPF programs.
> 
> This change updates all in-tree security modules (SELinux and Smack) to
> accommodate the new parameter mechanically, whilst providing BPF LSMs
> with the necessary context to enforce strict affinity policies.


Adding BPF Core to review the use of annotation "__nullable" in the LSM
hook definition.



Kind regards,
-- 
Aaron Tomlin

^ permalink raw reply

* Re: [PATCH] firmware: arm_ffa: Treat missing FF-A feature on a platform as a probe miss
From: Nathan Chancellor @ 2026-05-26 19:35 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, Yeoreum Yun
In-Reply-To: <20260526103649.5684-1-sudeep.holla@kernel.org>

On Tue, May 26, 2026 at 11:36:49AM +0100, Sudeep Holla wrote:
> When FF-A initialisation is driven from a platform device probe, systems
> that do not implement FF-A can return -EOPNOTSUPP from the early transport
> or version discovery paths. Driver core treats that as a matched probe
> failure and prints:
> 
>   |  arm-ffa arm-ffa: probe with driver arm-ffa failed with error -95
> 
> That is noisy for a firmware interface that can be absent on otherwise
> valid systems. Driver core already treats -ENODEV and -ENXIO as quiet
> rejected matches, so translate only the early unsupported discovery cases
> to -ENODEV. Keep later setup failures unchanged so real FF-A
> initialisation problems are still reported as probe failures.
> 
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Closes: https://lore.kernel.org/all/20260523001148.GA1319283@ax162
> Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>

Appears to work for me.

Tested-by: Nathan Chancellor <nathan@kernel.org>

> ---
>  drivers/firmware/arm_ffa/driver.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index 54984e1b9741..0f468362c288 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -2109,7 +2109,7 @@ static int ffa_probe(struct platform_device *pdev)
>  
>  	ret = ffa_transport_init(&invoke_ffa_fn);
>  	if (ret)
> -		return ret;
> +		return ret == -EOPNOTSUPP ? -ENODEV : ret;
>  
>  	drv_info = kzalloc_obj(*drv_info);
>  	if (!drv_info)
> @@ -2117,8 +2117,11 @@ static int ffa_probe(struct platform_device *pdev)
>  	platform_set_drvdata(pdev, drv_info);
>  
>  	ret = ffa_version_check(&drv_info->version);
> -	if (ret)
> +	if (ret) {
> +		if (ret == -EOPNOTSUPP)
> +			ret = -ENODEV;
>  		goto free_drv_info;
> +	}
>  
>  	if (ffa_id_get(&drv_info->vm_id)) {
>  		pr_err("failed to obtain VM id for self\n");
> -- 
> 2.43.0
> 

-- 
Cheers,
Nathan

^ permalink raw reply

* Re: [PATCH bpf-next 00/13] Signed BPF + IPE Policies
From: KP Singh @ 2026-05-26 16:23 UTC (permalink / raw)
  To: Blaise Boscaccy
  Cc: Linux Security Module list, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kumar Kartikeya Dwivedi, James Bottomley,
	Paul Moore
In-Reply-To: <87zf1qxeyl.fsf@microsoft.com>

On Sat, 23 May 2026, 18:34 Blaise Boscaccy,
<bboscaccy@linux.microsoft.com> wrote:
>
> KP Singh <kpsingh@kernel.org> writes:
>
> > This series continues the "Signed BPF programs" work and adds
> > the missing pieces needed for an LSM to do policy enforcement
> > and addresses the concerns raised by the developers of Hornet.
> >
> > One signing scheme, please.
> >
> > BPF does not need a second signing scheme. It needs a policy
> > framework that consumes the verdict the existing signing pipeline
> > produces. Two parallel signing stacks is harmful UX for Cilium,
> > bpftrace, systemd, distros, and everyone shipping signed lskels.
> > Hornet has been NACK'd repeatedly by the BPF maintainers [1][2]
> > on layering and TOCTOU grounds.
> >
> > What this series adds
> >
> > - prog->aux->sig (verdict + keyring) and prog->aux->is_kernel,
> >   populated by the syscall path before security_bpf_prog_load
> >   fires.
> > - bpf_loader_verify_metadata kfunc -- the metadata check is now
> >   kernel C code, not BPF bytecode. The verifier injects the
> >   calling prog->aux as an implicit argument via KF_IMPLICIT_ARGS.
> > - Loader-side prog BTF with BPF_PSEUDO_KFUNC_CALL_PROG_BTF so
> >   the kfunc CALL is reproducible across build hosts and resolved
> >   at load time.
> > - security_bpf_prog_load_post_integrity LSM hook, fired by the
> >   kfunc on a successful metadata check.
> > - IPE properties (bpf_signature, bpf_keyring, bpf_kernel) and
> >   two ops (BPF_PROG_LOAD, BPF_PROG_LOAD_POST_INTEGRITY).
> >
> > This series address concerns raised by the Hornet developers:
> >
> > * The metadata hash check should be in kernel C, not BPF
> >   bytecode -- Blaise Boscaccy [3]:
> >
>
> That's a gross misrepresentation of some of my previous statements on
> the subject. We can go back and forth on this until the cows home with
> increasing vitriolic rhetoric, but that's really just a waste of
>
> everyone's time. Your "trusted loader" design flat-out doesn't work for

I totally agree, I have wasted a lot of time arguing with you and it's
up to the maintainers and Linus to decide here.

> our security requirements, and those of others. You keep screaming that
> we need to "write our own trusted loader" and that isn't really solving
> anything.
>
> You just posted a trusted loader bugfix here.
> https://lore.kernel.org/linux-security-module/20260522215337.662271-1-kpsingh@kernel.org/
>
> What's your path for that now and in the future? How are you getting
> people to rebuild their out-of-tree trusted loaders if there is a bug in
> them? Are you expecting sysadmins to subscribe to the bpf mailing list
> and watch for patches to libbpf and then rebuild an entire corpus of
> eBPF lskel programs?

How do you get people to update their software? Don't you folks update
libraries? If users write their own loaders, they are responsible for
vulnerability management, just as they are for any other piece of
software. Not all trusted software lives in the kernel (eg. systemd,
privileged daemons, software with access to sensitive credentials).

Your arguments are illogical and abrasive. Furthermore, your
requirements are a moving target and you haven't explained the threat
model supporting them, despite repeated requests.

>
> What if there is a security vulnerability or a CVE in the generated code
> that gets emitted, how are you handling that? We have processes in place
> to handle updates, bugfixes and vulnerabilities in the kernel. None
> exist for your "trusted loader" paradigm. You can publish a CVE for
> libbpf, but there is no way to publish a CVE for an infinite number of
> random unknown bpf program in the wild or to notify users that their
> programs are effected, or for them to know which programs are actually
> effected and which ones aren't.
>
> Also as an aside, it looks like some of this patchset is copy-pasted
> from https://lore.kernel.org/linux-security-module/20260507191416.2984054-11-bboscaccy@linux.microsoft.com/
> Which is fine of course, since this is open source software and all, but
> attribution would be appreciated if you use my code in the future :)

There are only so many ways to write IPE policies, but I am happy to
add attribution, I will add it in the next rev.

- KP

>
>
> -blaise
>
>
>
> >   The bpf_loader_verify_metadata kfunc moves the hash check from
> >   inline BPF instructions into kernel C code.
> >
> > * LSMs cannot observe the verification result at hook time --
> >   Paul Moore [4]:
> >
> >   prog->aux->sig.verdict and sig.keyring are populated before any
> >   LSM hook runs. Furthermore, security_bpf_prog_load_post_integrity
> >   hook fires after the in-kernel hash check for consumers that want
> >   to observe or gate the post-integrity transition.
> >
> >
> > [1] Alexei Starovoitov, NACK on Hornet (TOCTOU + layering),
> >     https://lore.kernel.org/all/CAADnVQJ1CRvTXBU771KaYzrx-vRaWF+k164DcFOqOsCxmuL+ig@mail.gmail.com/
> > [2] Daniel Borkmann, NACK on Hornet v3,
> >     https://lore.kernel.org/all/798dba24-b5a7-4584-a1f6-793883fe9b5e@iogearbox.net/
> > [3] Blaise Boscaccy, Hornet v6 (C-side hash verification rationale),
> >     https://lore.kernel.org/all/20260429191431.2345448-1-bboscaccy@linux.microsoft.com/
> > [4] Paul Moore, push for post-verifier observability,
> >     https://lore.kernel.org/all/CACYkzJ4+=3owK+ELD9Nw7Rrm-UajxXEw8kVtOTJJ+SNAXpsOpw@mail.gmail.com/
> >
> >
> > KP Singh (13):
> >   bpf: expose signature verdict to LSMs via bpf_prog_aux
> >   bpf: include prog BTF in the signed loader signature scope
> >   bpf, libbpf: load prog BTF in the skel_internal loader
> >   bpf: add bpf_loader_verify_metadata kfunc
> >   bpf: compute prog->digest at BPF_PROG_LOAD entry
> >   bpf: resolve loader-style kfunc CALLs against prog BTF
> >   libbpf: generate prog BTF for loader programs
> >   bpftool gen: embed loader prog BTF in the lskel header
> >   lsm: add bpf_prog_load_post_integrity hook
> >   bpf: invoke security_bpf_prog_load_post_integrity from the metadata
> >     kfunc
> >   ipe: add BPF program signature properties
> >   ipe: gate post-integrity BPF program loads
> >   selftests/bpf: add IPE BPF policy integration tests
> >
> >  include/linux/bpf.h                           |  19 +++
> >  include/linux/bpf_verifier.h                  |   6 +
> >  include/linux/btf.h                           |   1 +
> >  include/linux/lsm_hook_defs.h                 |   1 +
> >  include/linux/security.h                      |   6 +
> >  include/uapi/linux/bpf.h                      |   5 +
> >  kernel/bpf/btf.c                              |   8 +
> >  kernel/bpf/check_btf.c                        |  18 +-
> >  kernel/bpf/helpers.c                          |  65 ++++++++
> >  kernel/bpf/syscall.c                          |  76 ++++++++-
> >  kernel/bpf/verifier.c                         |  58 ++++++-
> >  security/ipe/Kconfig                          |  14 ++
> >  security/ipe/audit.c                          |  13 ++
> >  security/ipe/eval.c                           |  57 +++++++
> >  security/ipe/eval.h                           |   5 +
> >  security/ipe/hooks.c                          |  42 +++++
> >  security/ipe/hooks.h                          |   9 +
> >  security/ipe/ipe.c                            |   4 +
> >  security/ipe/policy.h                         |  11 ++
> >  security/ipe/policy_parser.c                  |  20 +++
> >  security/security.c                           |  17 ++
> >  tools/bpf/bpftool/gen.c                       |  21 +++
> >  tools/bpf/bpftool/sign.c                      |  17 +-
> >  tools/include/uapi/linux/bpf.h                |   5 +
> >  tools/lib/bpf/bpf_gen_internal.h              |   2 +
> >  tools/lib/bpf/gen_loader.c                    | 127 +++++++++++---
> >  tools/lib/bpf/libbpf.h                        |   4 +-
> >  tools/lib/bpf/skel_internal.h                 |  67 +++++---
> >  .../selftests/bpf/test_signed_bpf_ipe.sh      | 156 ++++++++++++++++++
> >  tools/testing/selftests/bpf/vmtest.sh         |   4 +-
> >  30 files changed, 775 insertions(+), 83 deletions(-)
> >  create mode 100755 tools/testing/selftests/bpf/test_signed_bpf_ipe.sh
> >
> > --
> > 2.53.0

^ permalink raw reply

* Re: [PATCH v5 13/13] doc: security: Add documentation of the IMA staging mechanism
From: Mimi Zohar @ 2026-05-26 15:53 UTC (permalink / raw)
  To: Roberto Sassu, corbet, skhan, dmitry.kasatkin, eric.snowberg,
	paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, chenste, nramas, Roberto Sassu
In-Reply-To: <20260429160319.4162918-14-roberto.sassu@huaweicloud.com>

"staging" is a method of exporting and deleting IMA measurement list records
from kernel memory. The Subject line and document should be more generic. Please
update the Subject line to something like "ima: exporting and deleting IMA
measurement records from kernel memory".

On Wed, 2026-04-29 at 18:03 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>
> 
> Add the documentation of the IMA staging mechanism in
> Documentation/security/IMA-staging.rst.

Please update the name.

> 
> Also add the missing Documentation/security/IMA-templates.rst file in
> MAINTAINERS.
> 
> Link: https://github.com/linux-integrity/linux/issues/1
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
>  Documentation/security/IMA-staging.rst | 163 +++++++++++++++++++++++++
>  Documentation/security/index.rst       |   1 +
>  MAINTAINERS                            |   2 +
>  3 files changed, 166 insertions(+)
>  create mode 100644 Documentation/security/IMA-staging.rst
> 
> diff --git a/Documentation/security/IMA-staging.rst b/Documentation/security/IMA-staging.rst
> new file mode 100644
> index 000000000000..de6428893f0e
> --- /dev/null
> +++ b/Documentation/security/IMA-staging.rst
> @@ -0,0 +1,163 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==================================
> +IMA Measurements Staging Mechanism
> +==================================
> +

Please update the name here as well.  The introduction describes the motivation
for the feature of exporting and deleting measurement records.  The concept of
"staging" measurement records to be deleted should be described later.

> +
> +Introduction
> +============
> +
> +The IMA measurements list is currently stored in the kernel memory. Memory
> +occupation grows linearly with the number of entries, and can become a
> +problem especially in environments with reduced resources.
> +
> +While there is an advantage in keeping the IMA measurements list in kernel
> +memory, so that it is always available for reading from the securityfs
> +interfaces, storing it elsewhere would make it possible to free precious
> +memory for other kernel components.
> +
> +Storing the IMA measurements list outside the kernel does not introduce
> +security issues, since its integrity is anyway protected by the TPM.
> +
> +Hence, the new IMA staging mechanism is introduced to allow user space
> +to remove the desired portion of the measurements list from the kernel.

Please incorporate my previous comments on 00/13 here and, perhaps, add more
details (e.g. userspace application for saving and returning the complete
measurement list).

Introduce the concept of "staging" here. (There's quite a bit of code needed for
staging the measurement list.)  Please include an explanation of the staging
benefits (e.g. locking).

> +
> +
> +Usage
> +=====

Readers need to understand the implications of enabling the CONFIG_IMA_STAGING
feature, before telling them how to enable it.  Either move the "Usage" section
to after "Management of Staged Measurements" and "Remote Attestation Agent
Workflow" or introduce the concepts here would be better.

> +
> +The IMA staging mechanism can be enabled from the kernel configuration with
> +the CONFIG_IMA_STAGING option.

Referring to exporting and deleting mechanism from here on as "staging" is fine.

thanks,

Mimi

> +
> +If it is enabled, IMA duplicates the current measurements interfaces (both
> +binary and ASCII), by adding the ``_staged`` file suffix. Both the original
> +and the staging interfaces gain the write permission for the root user and
> +group, but require the process to have CAP_SYS_ADMIN set.
> +
> +The staging mechanism supports two flavors.
> +
> +
> +Staging with prompt
> +~~~~~~~~~~~~~~~~~~~
> +
> +The current measurements list is moved to a temporary staging area, and
> +staged measurements are deleted upon confirmation.
> +
> +This staging process is achieved with the following steps.
> +
> + 1. ``echo A > <original interface>``: the user requests IMA to stage the
> +    entire measurements list;
> + 2. ``cat <_staged interface>``: the user reads the staged measurements;
> + 3. ``echo D > <_staged interface>``: the user requests IMA to delete
> +    staged measurements.
> +
> +
> +Staging and deleting
> +~~~~~~~~~~~~~~~~~~~~
> +
> +N measurements are staged to a temporary staging area, and immediately
> +deleted without further confirmation.
> +
> +This staging process is achieved with the following steps.
> +
> + 1. ``cat <original interface>``: the user reads the current measurements
> +    list and determines what the value N for staging should be;
> + 2. ``echo N > <original interface>``: the user requests IMA to delete N
> +    measurements from the current measurements list.
> +
> +
> +Interface Access
> +================
> +
> +In order to avoid the IMA measurements list be suddenly truncated by the
> +staging mechanism during a read, or having multiple concurrent staging, a
> +semaphore-like locking scheme has been implemented on all the measurements
> +list interfaces.
> +
> +Multiple readers can access concurrently the original and staged
> +interfaces, and they can be in mutual exclusion with one writer.
> +
> +If an illegal access occurs, the open to the measurements list interface is
> +denied.
> +
> +
> +Management of Staged Measurements
> +=================================
> +
> +Since with the staging mechanism measurement entries are removed from the
> +kernel, the user needs to save the staged ones in a storage and concatenate
> +them together, so that it can present them to remote attestation agents as
> +if staging was never done.
> +
> +Coordination is necessary in the case where there are multiple actors
> +requesting measurements to be staged.
> +
> +In the staging with prompt case, the measurement interfaces can be accessed
> +only by one actor (writer) at a time, so the others will get an error until
> +the former closes it. Since the actors don't care about N, when they gain
> +access to the interface, they will get all the staged measurements at the
> +time of their request.
> +
> +In the case of staging and deleting, coordination is more important, since
> +there is the risk that two actors unaware of each other compute the value N
> +on the current measurements list and request IMA to stage N twice.
> +
> +
> +Remote Attestation Agent Workflow
> +=================================
> +
> +Users can choose the staging method they find more appropriate for their
> +workflow.
> +
> +If, as an example, a remote attestation agent would like to present to the
> +remote attestation server only the measurements that are required to
> +verify the TPM quote, its workflow would be the following.
> +
> +With staging with prompt, the agent stages the current measurements list,
> +reads and stores the measurements in a storage and immediately requests
> +IMA to delete the staged measurements from kernel memory. Afterwards, it
> +calculates N by replaying the PCR extend on the stored measurements until
> +the calculated PCRs match the quoted PCRs. It then keeps the measurements
> +in excess for the next attestation request.
> +
> +At the next attestation request, the agent performs the same steps above,
> +and concatenates the new measurements to the ones in excess from the
> +previous request. Also in this case, the agent replays the PCR extend until
> +it matches the currently quoted PCRs, keeps the measurements in excess and
> +presents the new N measurements entries to the remote attestation server.
> +
> +With the staging and deleting method, the agent reads the current
> +measurements list, calculates N and requests IMA to delete only those. The
> +measurements in excess are kept in the IMA measurements list and can be
> +retrieved at the next remote attestation request.
> +
> +Kexec
> +=====
> +
> +In the event a kexec() system call occurs between staging and deleting, the
> +staged measurements entries are prepended to the current measurements list,
> +so that they are both available when the secondary kernel starts. In that
> +case, IMA returns an error to the user when attempting to delete staged
> +measurements to notify about their copy to the kexec buffer, so that the
> +user does not store them twice.
> +
> +
> +Hash table
> +==========
> +
> +By default, the template digest of staged measurement entries are kept in
> +kernel memory (only template data are freed), to be able to detect
> +duplicate entries independently of staging.
> +
> +The new kernel option ``ima_flush_htable`` has been introduced to
> +explicitly request a complete deletion of the staged measurements, for
> +maximum kernel memory saving. If the option has been specified, duplicate
> +entries are still avoided on entries of the current measurements list,
> +but there can be duplicates between different groups of staged
> +measurements.
> +
> +Flushing the hash table is supported only for the staging with prompt
> +flavor. For the staging and deleting flavor, it would have been necessary
> +to lock the hot path adding new measurements for the time needed to remove
> +each selected measurement individually.
> diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
> index 3e0a7114a862..cec064dc1c83 100644
> --- a/Documentation/security/index.rst
> +++ b/Documentation/security/index.rst
> @@ -8,6 +8,7 @@ Security Documentation
>     credentials
>     snp-tdx-threat-model
>     IMA-templates
> +   IMA-staging
>     keys/index
>     lsm
>     lsm-development
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2fb1c75afd16..5bc816ab4a5b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12740,6 +12740,8 @@ R:	Eric Snowberg <eric.snowberg@oracle.com>
>  L:	linux-integrity@vger.kernel.org
>  S:	Supported
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
> +F:	Documentation/security/IMA-staging.rst
> +F:	Documentation/security/IMA-templates.rst
>  F:	include/linux/secure_boot.h
>  F:	security/integrity/
>  F:	security/integrity/ima/

^ permalink raw reply

* [PATCH RESEND 1/1] yama: clean-up ptrace relations upon activating YAMA_SCOPE_NO_ATTACH
From: Ethan Ferguson @ 2026-05-26 15:35 UTC (permalink / raw)
  To: kees, paul, jmorris, serge
  Cc: linux-security-module, linux-kernel, Ethan Ferguson
In-Reply-To: <20260526153542.105483-1-ethan.ferguson@zetier.com>

Clean up ptracer_relations upon YAMA_SCOPE_NO_ATTACH, and prevent
further modification by processes.

Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com>

---
 security/yama/yama_lsm.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index cef3776cf3b2..3b7c5384e6bc 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -26,6 +26,7 @@
 #define YAMA_SCOPE_NO_ATTACH	3
 
 static int ptrace_scope = YAMA_SCOPE_RELATIONAL;
+static int max_scope = YAMA_SCOPE_NO_ATTACH;
 
 /* describe a ptrace relationship for potential exception */
 struct ptrace_relation {
@@ -119,7 +120,7 @@ static void yama_relation_cleanup(struct work_struct *work)
 	spin_lock(&ptracer_relations_lock);
 	rcu_read_lock();
 	list_for_each_entry_rcu(relation, &ptracer_relations, node) {
-		if (relation->invalid) {
+		if (relation->invalid || ptrace_scope == max_scope) {
 			list_del_rcu(&relation->node);
 			kfree_rcu(relation, rcu);
 		}
@@ -204,7 +205,8 @@ static void yama_ptracer_del(struct task_struct *tracer,
  */
 static void yama_task_free(struct task_struct *task)
 {
-	yama_ptracer_del(task, task);
+	if (ptrace_scope <= max_scope)
+		yama_ptracer_del(task, task);
 }
 
 /**
@@ -224,6 +226,9 @@ static int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 	int rc = -ENOSYS;
 	struct task_struct *myself;
 
+	if (ptrace_scope == max_scope)
+		return -EPERM;
+
 	switch (option) {
 	case PR_SET_PTRACER:
 		/* Since a thread can call prctl(), find the group leader
@@ -432,6 +437,7 @@ static struct security_hook_list yama_hooks[] __ro_after_init = {
 static int yama_dointvec_minmax(const struct ctl_table *table, int write,
 				void *buffer, size_t *lenp, loff_t *ppos)
 {
+	int ret;
 	struct ctl_table table_copy;
 
 	if (write && !capable(CAP_SYS_PTRACE))
@@ -442,10 +448,17 @@ static int yama_dointvec_minmax(const struct ctl_table *table, int write,
 	if (*(int *)table_copy.data == *(int *)table_copy.extra2)
 		table_copy.extra1 = table_copy.extra2;
 
-	return proc_dointvec_minmax(&table_copy, write, buffer, lenp, ppos);
-}
+	ret = proc_dointvec_minmax(&table_copy, write, buffer, lenp, ppos);
+	if (ret < 0)
+		return ret;
 
-static int max_scope = YAMA_SCOPE_NO_ATTACH;
+	/* If max_scope was just activated in this call */
+	if (*(int *)table_copy.data == *(int *)table_copy.extra2 &&
+	    table_copy.extra1 != table_copy.extra2)
+		schedule_work(&yama_relation_work);
+
+	return 0;
+}
 
 static const struct ctl_table yama_sysctl_table[] = {
 	{
-- 
2.43.0


^ permalink raw reply related

* [PATCH RESEND 0/1] yama: clean-up ptrace relations upon activating YAMA_SCOPE_NO_ATTACH
From: Ethan Ferguson @ 2026-05-26 15:35 UTC (permalink / raw)
  To: kees, paul, jmorris, serge
  Cc: linux-security-module, linux-kernel, Ethan Ferguson

Once yama's ptrace_scope gets set to it's max value (currently
YAMA_SCOPE_NO_ATTACH), all ptrace actions will forever be denied.
However, processes may still add ptrace relations, and the memory
to store these relations is still allocated, even though it is never
used again.

This patch cleans up all memory related to ptracer_relations upon
YAMA_SCOPE_NO_ATTACH, and additionally disallows further modification
of ptracer_relations from processes.

Ethan Ferguson (1):
  yama: clean-up ptrace relations upon activating YAMA_SCOPE_NO_ATTACH

 security/yama/yama_lsm.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

base-commit: cf2f06f7152d
-- 
2.43.0

^ permalink raw reply

* Re: [PATCH] tomoyo: Fix NULL pointer dereference in tomoyo_init_request_info() when domain is NULL
From: Tetsuo Handa @ 2026-05-26 14:37 UTC (permalink / raw)
  To: Jiakai Xu
  Cc: jmorris, linux-kernel, linux-security-module, paul, serge,
	takedakn
In-Reply-To: <20260526135859.3211799-1-xujiakai24@mails.ucas.ac.cn>

On 2026/05/26 22:58, Jiakai Xu wrote:
>> Thank you for a patch, but I don't think we need this change.
> 
> Thanks for your review! I understand your perspective, but I believe
> the crash is a real NULL pointer dereference, and I'd like to explain
> why the defensive check is warranted.
> 
>> TOMOYO's initial domain is &tomoyo_kernel_domain, and each thread belongs to
>> a non-NULL domain. Therefore, tomoyo_domain() is not supposed to return NULL.
> 
> While tomoyo_domain() is not supposed to return NULL under normal
> operation, there are code paths that leave s->domain_info == NULL:
> 
>   a) Pre-init window (security/tomoyo/tomoyo.c, lines 598-612):
>      The task security blob is zero-allocated via kzalloc(), and
>      security_add_hooks() at line 603 is called BEFORE
>      s->domain_info = &tomoyo_kernel_domain at line 606. If any LSM
>      hook fires during that window, tomoyo_domain() returns NULL.

This code is executed during early boot stage. Other LSM hooks are not
supposed to fire.

> 
>   b) tomoyo_task_free() (tomoyo.c, lines 533-545) explicitly sets
>      s->domain_info = NULL after decrementing the refcount.

This code is executed when a "struct task_struct" is about to be released.
Nobody can find this "struct task_struct". Also, this "struct task_struct"
cannot be the current thread.

> 
>   c) tomoyo_find_next_domain() (domain.c, lines 876-883) writes
>      s->domain_info = NULL when the domain transition fails.

I couldn't catch, but old_domain is initialized as

  struct tomoyo_domain_info *old_domain = tomoyo_domain();

which cannot be NULL.

domain is guaranteed to be non-NULL because old_domain cannot be NULL.

	if (!domain)
		domain = old_domain;

Therefore, s->domain_info is guaranteed to be non-NULL because domain cannot be NULL.

	s->domain_info = domain;

If domain were NULL, the kernel should have already crashed at line 884.

> 
> I think adding a NULL check makes the code more robust. What do you 
> think?

Then, this will be NULL pointer dereference.
But fixing the location that is setting NULL is the correct approach.


^ permalink raw reply

* [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Aaron Tomlin @ 2026-05-26 14:28 UTC (permalink / raw)
  To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny
  Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel

At present, the task_setscheduler LSM hook provides security modules
with the opportunity to mediate changes to a task's scheduling policy.
However, when invoked via sched_setaffinity(), the hook lacks
visibility into the actual CPU affinity mask being requested.
Consequently, BPF-based security modules are entirely blind to the
target CPUs and cannot make granular access control decisions based on
spatial isolation.

In modern multi-tenant and real-time environments, CPU isolation is a
critical boundary. The inability to audit or restrict specific CPU
pinning requests limits the effectiveness of eBPF-driven security
policies, particularly when attempting to shield isolated or
cryptographic cores from unprivileged or compromised tasks.

This patch expands the security_task_setscheduler() hook signature to
include a pointer to the requested cpumask. Because this is a shared
hook used for multiple scheduling attribute changes, call sites that do
not modify CPU affinity are updated to safely pass NULL.
To protect against unverified dereferences, the parameter is annotated
with __nullable in the LSM hook definition, ensuring the BPF verifier
mandates explicit NULL checks for attached eBPF programs.

This change updates all in-tree security modules (SELinux and Smack) to
accommodate the new parameter mechanically, whilst providing BPF LSMs
with the necessary context to enforce strict affinity policies.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
This patch is strictly dependent on the prior acceptance of "mips:
sched: Fix CPUMASK_OFFSTACK memory corruption in MT fpaff" (Message-ID:
20260526141651.773306-1-atomlin@atomlin.com), as expanding the LSM hook
signature requires passing the mask pointer from
mipsmt_sys_sched_setaffinity().

Changes since v2 [1]:
 - Dropped patch 1. This is to be addressed by the cgroup cpuset
   maintainer (Waiman Long)

 - Dropped patch 3. Will be submitted as a separate patch (Paul Moore)

Changes since v1 [2]:
 - Reordered the allocation and user-copy of new_mask in the MIPS
   architecture's mipsmt_sys_sched_setaffinity() to occur before the
   LSM hook is invoked. This ensures the security modules evaluate a fully
   populated mask rather than uninitialised memory, while cleanly handling
   error unwinding

 - Updated cpuset_can_fork() to pass the destination cpuset's effective CPU
   mask instead of NULL

[1]: https://lore.kernel.org/lkml/20260509213803.968464-1-atomlin@atomlin.com/
[2]: https://lore.kernel.org/lkml/20260509164847.939294-1-atomlin@atomlin.com/
---
 arch/mips/kernel/mips-mt-fpaff.c |  2 +-
 fs/proc/base.c                   |  2 +-
 include/linux/lsm_hook_defs.h    |  3 ++-
 include/linux/security.h         | 11 +++++++----
 kernel/cgroup/cpuset.c           |  4 ++--
 kernel/sched/syscalls.c          |  4 ++--
 security/commoncap.c             |  7 +++++--
 security/security.c              | 11 ++++++-----
 security/selinux/hooks.c         |  3 ++-
 security/smack/smack_lsm.c       | 11 +++++++++--
 10 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c
index 4fead87d2f43..c68d1676350e 100644
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -110,7 +110,7 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
 		goto out_unlock;
 	}
 
-	retval = security_task_setscheduler(p);
+	retval = security_task_setscheduler(p, new_mask);
 	if (retval)
 		goto out_unlock;
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d9acfa89c894..ac4096958a00 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2619,7 +2619,7 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
 		}
 		rcu_read_unlock();
 
-		err = security_task_setscheduler(p);
+		err = security_task_setscheduler(p, NULL);
 		if (err) {
 			count = err;
 			goto out;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..6ec7bc04a1b7 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -255,7 +255,8 @@ LSM_HOOK(int, 0, task_prlimit, const struct cred *cred,
 	 const struct cred *tcred, unsigned int flags)
 LSM_HOOK(int, 0, task_setrlimit, struct task_struct *p, unsigned int resource,
 	 struct rlimit *new_rlim)
-LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p)
+LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p,
+	 const struct cpumask *in_mask__nullable)
 LSM_HOOK(int, 0, task_getscheduler, struct task_struct *p)
 LSM_HOOK(int, 0, task_movememory, struct task_struct *p)
 LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info,
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..8b74153daa43 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -196,7 +196,8 @@ extern int cap_mmap_addr(unsigned long addr);
 extern int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags);
 extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 			  unsigned long arg4, unsigned long arg5);
-extern int cap_task_setscheduler(struct task_struct *p);
+extern int cap_task_setscheduler(struct task_struct *p,
+				 const struct cpumask *in_mask);
 extern int cap_task_setioprio(struct task_struct *p, int ioprio);
 extern int cap_task_setnice(struct task_struct *p, int nice);
 extern int cap_vm_enough_memory(struct mm_struct *mm, long pages);
@@ -531,7 +532,8 @@ int security_task_prlimit(const struct cred *cred, const struct cred *tcred,
 			  unsigned int flags);
 int security_task_setrlimit(struct task_struct *p, unsigned int resource,
 		struct rlimit *new_rlim);
-int security_task_setscheduler(struct task_struct *p);
+int security_task_setscheduler(struct task_struct *p,
+			       const struct cpumask *in_mask);
 int security_task_getscheduler(struct task_struct *p);
 int security_task_movememory(struct task_struct *p);
 int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
@@ -1392,9 +1394,10 @@ static inline int security_task_setrlimit(struct task_struct *p,
 	return 0;
 }
 
-static inline int security_task_setscheduler(struct task_struct *p)
+static inline int security_task_setscheduler(struct task_struct *p,
+					     const struct cpumask *in_mask)
 {
-	return cap_task_setscheduler(p);
+	return cap_task_setscheduler(p, in_mask);
 }
 
 static inline int security_task_getscheduler(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5c33ab20cc20..7b3dfccb77d8 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3033,7 +3033,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 			goto out_unlock;
 
 		if (setsched_check) {
-			ret = security_task_setscheduler(task);
+			ret = security_task_setscheduler(task, cs->effective_cpus);
 			if (ret)
 				goto out_unlock;
 		}
@@ -3591,7 +3591,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
 	if (ret)
 		goto out_unlock;
 
-	ret = security_task_setscheduler(task);
+	ret = security_task_setscheduler(task, cs->effective_cpus);
 	if (ret)
 		goto out_unlock;
 
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index b215b0ead9a6..68bc7e466fb1 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -540,7 +540,7 @@ int __sched_setscheduler(struct task_struct *p,
 		if (attr->sched_flags & SCHED_FLAG_SUGOV)
 			return -EINVAL;
 
-		retval = security_task_setscheduler(p);
+		retval = security_task_setscheduler(p, NULL);
 		if (retval)
 			return retval;
 	}
@@ -1213,7 +1213,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 			return -EPERM;
 	}
 
-	retval = security_task_setscheduler(p);
+	retval = security_task_setscheduler(p, in_mask);
 	if (retval)
 		return retval;
 
diff --git a/security/commoncap.c b/security/commoncap.c
index 3399535808fe..d86f1c2b9210 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1222,13 +1222,16 @@ static int cap_safe_nice(struct task_struct *p)
 /**
  * cap_task_setscheduler - Determine if scheduler policy change is permitted
  * @p: The task to affect
+ * @in_mask: Requested CPU affinity mask (ignored)
  *
  * Determine if the requested scheduler policy change is permitted for the
- * specified task.
+ * specified task. The capabilities security module does not evaluate the
+ * @in_mask parameter, relying solely on cap_safe_nice().
  *
  * Return: 0 if permission is granted, -ve if denied.
  */
-int cap_task_setscheduler(struct task_struct *p)
+int cap_task_setscheduler(struct task_struct *p,
+			  const struct cpumask *in_mask __always_unused)
 {
 	return cap_safe_nice(p);
 }
diff --git a/security/security.c b/security/security.c
index 4e999f023651..53804ee40df5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -3240,17 +3240,18 @@ int security_task_setrlimit(struct task_struct *p, unsigned int resource,
 }
 
 /**
- * security_task_setscheduler() - Check if setting sched policy/param is allowed
+ * security_task_setscheduler() - Check if setting sched policy/param/affinity is allowed
  * @p: target task
+ * @in_mask: requested CPU affinity mask, or NULL if not changing affinity
  *
- * Check permission before setting scheduling policy and/or parameters of
- * process @p.
+ * Check permission before setting the scheduling policy, parameters, and/or
+ * CPU affinity of process @p.
  *
  * Return: Returns 0 if permission is granted.
  */
-int security_task_setscheduler(struct task_struct *p)
+int security_task_setscheduler(struct task_struct *p, const struct cpumask *in_mask)
 {
-	return call_int_hook(task_setscheduler, p);
+	return call_int_hook(task_setscheduler, p, in_mask);
 }
 
 /**
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0f704380a8c8..5f0914db23f6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4557,7 +4557,8 @@ static int selinux_task_setrlimit(struct task_struct *p, unsigned int resource,
 	return 0;
 }
 
-static int selinux_task_setscheduler(struct task_struct *p)
+static int selinux_task_setscheduler(struct task_struct *p,
+				     const struct cpumask *in_mask __always_unused)
 {
 	return avc_has_perm(current_sid(), task_sid_obj(p), SECCLASS_PROCESS,
 			    PROCESS__SETSCHED, NULL);
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 3f9ae05039a2..a77143beff44 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2343,10 +2343,17 @@ static int smack_task_getioprio(struct task_struct *p)
 /**
  * smack_task_setscheduler - Smack check on setting scheduler
  * @p: the task object
+ * @in_mask: Requested CPU affinity mask (ignored)
  *
- * Return 0 if read access is permitted
+ * Evaluate whether the current task has write access to the target task @p
+ * to change its scheduling policy. The Smack security module relies
+ * strictly on label-based access control and does not evaluate CPU
+ * affinity masks.
+ *
+ * Return: 0 if write access is permitted
  */
-static int smack_task_setscheduler(struct task_struct *p)
+static int smack_task_setscheduler(struct task_struct *p,
+				   const struct cpumask *in_mask __always_unused)
 {
 	return smk_curacc_on_task(p, MAY_WRITE, __func__);
 }

base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
prerequisite-patch-id: f9200d420002c9fd0663d0ec00c83db866889c19
-- 
2.51.0


^ permalink raw reply related

* Re: [PATCH v5 00/13] ima: Introduce staging mechanism
From: Mimi Zohar @ 2026-05-26 14:10 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian, steven chen, Roberto Sassu, corbet,
	skhan, dmitry.kasatkin, eric.snowberg, paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, Roberto Sassu
In-Reply-To: <aaed52cf-26e1-4c40-812d-3788024ce5b5@linux.microsoft.com>

On Mon, 2026-05-11 at 10:29 -0700, Lakshmi Ramasubramanian wrote:
> Roberto, Mimi:
> 
> I want to add on to the point Steven has brought up.
> 
> With "Stage and Delete N" approach, we have the following sequence of 
> tasks for trimming the IMA log:
> 
> 	1. User mode locks the IMA measurement list through the "write interface".
> 		a. While this prevents any other user mode process from updating the 
> IMA log, kernel can still add new IMA events to the measurement log
> 	2. User mode reads the TPM Quote and the IMA measurement events and 
> sends it to the remote attestation service
> 	3. Once the remote service has successfully processed the IMA events, 
> the user mode determines the number of IMA events "N" to be removed from 
> the measurement list maintained in the kernel
> 	4. User mode provides the value "N" to the kernel
> 	5. Kernel now determines the point at which to snap the IMA measurement 
> list using "N" - without holding a lock
> 	6. Then, the kernel lock is held and the list is snapped at the point 
> determined in the previous step thus keeping the kernel lock time to the 
> minimum.
> 	7. Now, user mode removes the "write" lock on the IMA measurement list
> 
> With the above, we believe "Stage and Delete N" alone is sufficient to 
> trim IMA log.

Prior versions of removing measurement records (aka "trimming") were rejected
for being overly complicated, locking, requiring a new record type, and code
quality.  Patch 11 ("stage and delete N") is much better, but the level of
precision in removing only those measurements records needed for the TPM quote
seems necessary only if the records are not being saved.

The reason for the two methods might be the same — removing measurement records
from the IMA measurement list — but the motivation for the two methods does not
appear to be the same. The motivation for Patch 9 ("stage and delete") is
clearly to free kernel memory by exporting and saving the measurement records.

Remember, the only reason for upstreaming a feature to remove measurement
records from the IMA measurement list is to address the kernel memory issue —
clearly not to drop measurement records and break attestation.

Upstreaming patch 11 (stage and delete N) would be a concession for your
environment, but is definitely not the recommended solution.

Mimi

^ permalink raw reply

* Re: [PATCH v5 12/13] ima: Return error on deleting measurements already copied during kexec
From: Mimi Zohar @ 2026-05-26 14:02 UTC (permalink / raw)
  To: Roberto Sassu, corbet, skhan, dmitry.kasatkin, eric.snowberg,
	paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, chenste, nramas, Roberto Sassu
In-Reply-To: <20260429160319.4162918-13-roberto.sassu@huaweicloud.com>

On Wed, 2026-04-29 at 18:03 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>
> 
> Refuse to delete staged or active list measurements, if a kexec racing with
> the deletion already copied those measurements in the kexec buffer. In this
> way, user space becomes aware that those measurements are going to appear
> in the secondary kernel, and thus they don't have to be saved twice.

There are two reboot notifiers: one to prevent additional measurements extending
the TPM, while the other copies the measurements for kexec.  This patch prevents
deleting the staged measurements after the latter notifier.

Instead of introducing a specific method for detecting whether the measurement
list has been copied, rely on one of the two existing reboot notifiers. The
simplest method would test "ima_measurements_suspended", which would prevent
deleting the staged measurements a bit earlier.

Mimi

^ permalink raw reply

* Re: [PATCH] tomoyo: Fix NULL pointer dereference in tomoyo_init_request_info() when domain is NULL
From: Jiakai Xu @ 2026-05-26 13:58 UTC (permalink / raw)
  To: penguin-kernel
  Cc: jmorris, linux-kernel, linux-security-module, paul, serge,
	takedakn, xujiakai24
In-Reply-To: <814a7f61-67b2-49e9-b5bf-fd049b458079@I-love.SAKURA.ne.jp>

> Thank you for a patch, but I don't think we need this change.

Thanks for your review! I understand your perspective, but I believe
the crash is a real NULL pointer dereference, and I'd like to explain
why the defensive check is warranted.

> TOMOYO's initial domain is &tomoyo_kernel_domain, and each thread belongs to
> a non-NULL domain. Therefore, tomoyo_domain() is not supposed to return NULL.

While tomoyo_domain() is not supposed to return NULL under normal
operation, there are code paths that leave s->domain_info == NULL:

  a) Pre-init window (security/tomoyo/tomoyo.c, lines 598-612):
     The task security blob is zero-allocated via kzalloc(), and
     security_add_hooks() at line 603 is called BEFORE
     s->domain_info = &tomoyo_kernel_domain at line 606. If any LSM
     hook fires during that window, tomoyo_domain() returns NULL.

  b) tomoyo_task_free() (tomoyo.c, lines 533-545) explicitly sets
     s->domain_info = NULL after decrementing the refcount.

  c) tomoyo_find_next_domain() (domain.c, lines 876-883) writes
     s->domain_info = NULL when the domain transition fails.

> > Found by fuzzing. Here is the report:
> > 
> > Unable to handle kernel paging request at virtual address dfffffff00000003
> 
> Is this a NULL pointer dereference?
> It seems to me that this is just a random memory corruption.

This address is the KASAN shadow byte for memory access at offset 0x18
(24), not a random corrupted value. On RISC-V with sv57 page table,
KASAN_SHADOW_BASE is `0xdfffffff00000000`, and the shadow address is
computed as:

    shadow_addr = (access_addr >> 3) + KASAN_SHADOW_BASE
                = (24 >> 3) + 0xdfffffff00000000
                = 0xdfffffff00000003

In `struct tomoyo_domain_info` (security/tomoyo/common.h, lines
680-693), the layout is:

    offset 0:  struct list_head list;          // 16 bytes
    offset 16: struct list_head acl_info_list; // 16 bytes (next at 16, prev at 24)
    offset 32: domainname;                     // 8 bytes
    ...

Offset 24 from NULL is `domain->acl_info_list.prev`, which is
dereferenced by the `list_for_each_entry_rcu()` loop in
`tomoyo_check_acl()` at security/tomoyo/domain.c:171 when `domain` is
NULL. This is KASAN catching a NULL pointer dereference in action, not
random memory corruption.

I think adding a NULL check makes the code more robust. What do you 
think?

Best regards,
Jiakai

^ permalink raw reply

* Re: [PATCH v5 10/14] module: Prepare for additional module authentication mechanisms
From: Petr Pavlu @ 2026-05-26 13:14 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Nathan Chancellor,
	Nicolas Schier, Arnd Bergmann, Luis Chamberlain, Sami Tolvanen,
	Daniel Gomez, Paul Moore, James Morris, Serge E. Hallyn,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Naveen N Rao, Mimi Zohar, Roberto Sassu,
	Dmitry Kasatkin, Eric Snowberg, Nicolas Schier, Daniel Gomez,
	Aaron Tomlin, Christophe Leroy (CS GROUP), Nicolas Bouchinet,
	Xiu Jianfeng, Martin KaFai Lau, Song Liu, Yonghong Song,
	Jiri Olsa, bpf, Fabian Grünbichler, Arnout Engelen,
	Mattia Rizzolo, kpcyrd, Christian Heusel, Câju Mihai-Drosi,
	Eric Biggers, Sebastian Andrzej Siewior, linux-kbuild,
	linux-kernel, linux-arch, linux-modules, linux-security-module,
	linux-doc, linuxppc-dev, linux-integrity, debian-kernel
In-Reply-To: <20260505-module-hashes-v5-10-e174a5a49fce@weissschuh.net>

On 5/5/26 11:05 AM, Thomas Weißschuh wrote:
> Reorganize the code to make it easier to add the new hash-based module
> authentication.
> 
> Also drop the now unnecessary stub for module_sig_check().
> 
> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>

Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>

-- Petr

^ permalink raw reply

* Re: [PATCH v5 09/14] module: Move signature type check out of mod_check_sig()
From: Petr Pavlu @ 2026-05-26 13:03 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Nathan Chancellor,
	Nicolas Schier, Arnd Bergmann, Luis Chamberlain, Sami Tolvanen,
	Daniel Gomez, Paul Moore, James Morris, Serge E. Hallyn,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Naveen N Rao, Mimi Zohar, Roberto Sassu,
	Dmitry Kasatkin, Eric Snowberg, Nicolas Schier, Daniel Gomez,
	Aaron Tomlin, Christophe Leroy (CS GROUP), Nicolas Bouchinet,
	Xiu Jianfeng, Martin KaFai Lau, Song Liu, Yonghong Song,
	Jiri Olsa, bpf, Fabian Grünbichler, Arnout Engelen,
	Mattia Rizzolo, kpcyrd, Christian Heusel, Câju Mihai-Drosi,
	Eric Biggers, Sebastian Andrzej Siewior, linux-kbuild,
	linux-kernel, linux-arch, linux-modules, linux-security-module,
	linux-doc, linuxppc-dev, linux-integrity, debian-kernel
In-Reply-To: <20260505-module-hashes-v5-9-e174a5a49fce@weissschuh.net>

On 5/5/26 11:05 AM, Thomas Weißschuh wrote:
> Additional signature types are about to be added.
> As each caller of mod_check_sig() can have different support for these,
> move the type validation into the callers.
> 
> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>

Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>

-- Petr

^ permalink raw reply

* Re: [PATCH v5 07/14] module: Make module authentication usable without MODULE_SIG
From: kpcyrd @ 2026-05-26 12:27 UTC (permalink / raw)
  To: Thomas Weißschuh, Petr Pavlu
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Nathan Chancellor,
	Nicolas Schier, Arnd Bergmann, Luis Chamberlain, Sami Tolvanen,
	Daniel Gomez, Paul Moore, James Morris, Serge E. Hallyn,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Naveen N Rao, Mimi Zohar, Roberto Sassu,
	Dmitry Kasatkin, Eric Snowberg, Nicolas Schier, Daniel Gomez,
	Aaron Tomlin, Christophe Leroy (CS GROUP), Nicolas Bouchinet,
	Xiu Jianfeng, Martin KaFai Lau, Song Liu, Yonghong Song,
	Jiri Olsa, bpf, Fabian Grünbichler, Arnout Engelen,
	Mattia Rizzolo, Christian Heusel, Câju Mihai-Drosi,
	Eric Biggers, Sebastian Andrzej Siewior, linux-kbuild,
	linux-kernel, linux-arch, linux-modules, linux-security-module,
	linux-doc, linuxppc-dev, linux-integrity, debian-kernel,
	Holger Levsen
In-Reply-To: <4ee3c775-1fbf-45e1-8b77-5f9034f45125@t-8ch.de>

On 5/26/26 1:38 PM, Thomas Weißschuh wrote:
> On 2026-05-26 12:53:22+0200, Petr Pavlu wrote:
>> Should MODULE_SIG_FORCE be renamed to MODULE_AUTH_FORCE, along with
>> renaming the sig_enforce functionality in kernel/module/auth.c to
>> auth_enforce?
> 
> Given that it is a user-visible symbol we'll need to be a bit careful
> not to break existing configurations.
> I'll try to use the new "transitional" kconfig attribute.
A slightly softer worded alternative (yet semantically equivalent) name could be 
MODULE_AUTH_REQUIRE. No strong opinion though, I think MODULE_AUTH_* does make 
sense.

I initially shared the concern about renaming well established config options, 
but the transitional feature does seem to be a good fit for this.

Sincerely,
kpcyrd

^ permalink raw reply

* Re: [PATCH v5 08/14] module: Move authentication logic into dedicated new file
From: Petr Pavlu @ 2026-05-26 11:58 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Nathan Chancellor,
	Nicolas Schier, Arnd Bergmann, Luis Chamberlain, Sami Tolvanen,
	Daniel Gomez, Paul Moore, James Morris, Serge E. Hallyn,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Naveen N Rao, Mimi Zohar, Roberto Sassu,
	Dmitry Kasatkin, Eric Snowberg, Nicolas Schier, Daniel Gomez,
	Aaron Tomlin, Christophe Leroy (CS GROUP), Nicolas Bouchinet,
	Xiu Jianfeng, Martin KaFai Lau, Song Liu, Yonghong Song,
	Jiri Olsa, bpf, Fabian Grünbichler, Arnout Engelen,
	Mattia Rizzolo, kpcyrd, Christian Heusel, Câju Mihai-Drosi,
	Eric Biggers, Sebastian Andrzej Siewior, linux-kbuild,
	linux-kernel, linux-arch, linux-modules, linux-security-module,
	linux-doc, linuxppc-dev, linux-integrity, debian-kernel
In-Reply-To: <20260505-module-hashes-v5-8-e174a5a49fce@weissschuh.net>

On 5/5/26 11:05 AM, Thomas Weißschuh wrote:
> The module authentication functionality will also be used by the
> hash-based module authentication. To make it usable even if
> CONFIG_MODULE_SIG is disabled, move it to a new file.
> 
> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
> ---
> [...]
> diff --git a/kernel/module/auth.c b/kernel/module/auth.c
> index 956ac63d9d33..831a13eb0c9b 100644
> --- a/kernel/module/auth.c
> +++ b/kernel/module/auth.c
> @@ -5,10 +5,16 @@
>   * Written by David Howells (dhowells@redhat.com)
>   */
>  
> +#include <linux/errno.h>
>  #include <linux/export.h>
>  #include <linux/module.h>
> +#include <linux/module_signature.h>
>  #include <linux/moduleparam.h>
> +#include <linux/security.h>
> +#include <linux/string.h>
>  #include <linux/types.h>
> +#include <uapi/linux/module.h>
> +#include "internal.h"
>  
>  #undef MODULE_PARAM_PREFIX
>  #define MODULE_PARAM_PREFIX "module."
> @@ -30,3 +36,82 @@ void set_module_sig_enforced(void)
>  {
>  	sig_enforce = true;
>  }
> +
> +static int mod_verify_sig(const void *mod, struct load_info *info)
> +{
> +	struct module_signature ms;
> +	size_t sig_len, modlen = info->len;
> +	int ret;
> +
> +	if (modlen <= sizeof(ms))
> +		return -EBADMSG;
> +
> +	memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms));
> +
> +	ret = mod_check_sig(&ms, modlen, "module");
> +	if (ret)
> +		return ret;
> +
> +	sig_len = be32_to_cpu(ms.sig_len);
> +	modlen -= sig_len + sizeof(ms);
> +	info->len = modlen;
> +
> +	return module_sig_check(mod, modlen, mod + modlen, sig_len);
> +}
> +
> +int module_auth_check(struct load_info *info, int flags)
> +{
> +	int err = -ENODATA;
> +	const unsigned long markerlen = sizeof(MODULE_SIGNATURE_MARKER) - 1;
> +	const char *reason;
> +	const void *mod = info->hdr;
> +	bool mangled_module = flags & (MODULE_INIT_IGNORE_MODVERSIONS |
> +				       MODULE_INIT_IGNORE_VERMAGIC);
> +	/*
> +	 * Do not allow mangled modules as a module with version information
> +	 * removed is no longer the module that was signed.
> +	 */
> +	if (!mangled_module &&
> +	    info->len > markerlen &&
> +	    memcmp(mod + info->len - markerlen, MODULE_SIGNATURE_MARKER, markerlen) == 0) {
> +		/* We truncate the module to discard the signature */
> +		info->len -= markerlen;
> +		err = mod_verify_sig(mod, info);
> +		if (!err) {
> +			info->auth_ok = true;
> +			return 0;
> +		}
> +	}
> +
> +	/*
> +	 * We don't permit modules to be loaded into the trusted kernels
> +	 * without a valid signature on them, but if we're not enforcing,
> +	 * certain errors are non-fatal.
> +	 */
> +	switch (err) {
> +	case -ENODATA:
> +		reason = "unsigned module";
> +		break;
> +	case -ENOPKG:
> +		reason = "module with unsupported crypto";
> +		break;
> +	case -ENOKEY:
> +		reason = "module with unavailable key";
> +		break;
> +
> +	default:
> +		/*
> +		 * All other errors are fatal, including lack of memory,
> +		 * unparseable signatures, and signature check failures --
> +		 * even if signatures aren't required.
> +		 */
> +		return err;
> +	}
> +
> +	if (is_module_sig_enforced()) {
> +		pr_notice("Loading of %s is rejected\n", reason);
> +		return -EKEYREJECTED;
> +	}
> +
> +	return security_locked_down(LOCKDOWN_MODULE_SIGNATURE);
> +}

The resulting call chain of the module authentication/signature
functions is as follows:

ima_read_modsig() -----------------------------,
                                               v
module_auth_check() -> mod_verify_sig() -> mod_check_sig()
                             |
                             |-> module_sig_check()
                             '-> module_hash_check()

I think this logic is quite hard to follow because mod_verify_sig(),
mod_check_sig() and module_sig_check() have very similar names.

The naming of module_auth_check(), module_sig_check() and
module_hash_check() looks good to me, but I would prefer to rename
mod_check_sig() and mod_verify_sig(). Perhaps mod_check_sig() could be
renamed to mod_check_sig_header(), and mod_verify_sig() to
mod_dispatch_auth_check()?

Otherwise, the patch looks ok to me. Feel free to add:

Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH v5 07/14] module: Make module authentication usable without MODULE_SIG
From: Thomas Weißschuh @ 2026-05-26 11:38 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Nathan Chancellor,
	Nicolas Schier, Arnd Bergmann, Luis Chamberlain, Sami Tolvanen,
	Daniel Gomez, Paul Moore, James Morris, Serge E. Hallyn,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Naveen N Rao, Mimi Zohar, Roberto Sassu,
	Dmitry Kasatkin, Eric Snowberg, Nicolas Schier, Daniel Gomez,
	Aaron Tomlin, Christophe Leroy (CS GROUP), Nicolas Bouchinet,
	Xiu Jianfeng, Martin KaFai Lau, Song Liu, Yonghong Song,
	Jiri Olsa, bpf, Fabian Grünbichler, Arnout Engelen,
	Mattia Rizzolo, kpcyrd, Christian Heusel, Câju Mihai-Drosi,
	Eric Biggers, Sebastian Andrzej Siewior, linux-kbuild,
	linux-kernel, linux-arch, linux-modules, linux-security-module,
	linux-doc, linuxppc-dev, linux-integrity, debian-kernel
In-Reply-To: <0a0736a4-2cdd-49f2-9062-e2f18d769fc0@suse.com>

On 2026-05-26 12:53:22+0200, Petr Pavlu wrote:
> On 5/5/26 11:05 AM, Thomas Weißschuh wrote:
> > The module authentication functionality will also be used by the
> > hash-based module authentication. Split it out from CONFIG_MODULE_SIG
> > so it is usable by both.
> > 
> > Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
> > [...]
> > diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> > index f535181e0d98..84297da666ff 100644
> > --- a/kernel/module/Kconfig
> > +++ b/kernel/module/Kconfig
> > @@ -271,9 +271,12 @@ config MODULE_SIG
> >  	  debuginfo strip done by some packagers (such as rpmbuild) and
> >  	  inclusion into an initramfs that wants the module size reduced.
> >  
> > +config MODULE_AUTH
> > +	def_bool MODULE_SIG
> > +
> >  config MODULE_SIG_FORCE
> >  	bool "Require modules to be validly signed"
> > -	depends on MODULE_SIG
> > +	depends on MODULE_AUTH
> >  	help
> >  	  Reject unsigned modules or signed modules for which we don't have a
> >  	  key.  Without this, such modules will simply taint the kernel.
> 
> Should MODULE_SIG_FORCE be renamed to MODULE_AUTH_FORCE, along with
> renaming the sig_enforce functionality in kernel/module/auth.c to
> auth_enforce?

Given that it is a user-visible symbol we'll need to be a bit careful
not to break existing configurations.
I'll try to use the new "transitional" kconfig attribute.


Thomas

^ permalink raw reply

* Re: [PATCH v5 06/14] module: Switch load_info::len to size_t
From: Thomas Weißschuh @ 2026-05-26 11:35 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, Nathan Chancellor,
	Nicolas Schier, Arnd Bergmann, Luis Chamberlain, Sami Tolvanen,
	Daniel Gomez, Paul Moore, James Morris, Serge E. Hallyn,
	Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Naveen N Rao, Mimi Zohar, Roberto Sassu,
	Dmitry Kasatkin, Eric Snowberg, Nicolas Schier, Daniel Gomez,
	Aaron Tomlin, Christophe Leroy (CS GROUP), Nicolas Bouchinet,
	Xiu Jianfeng, Martin KaFai Lau, Song Liu, Yonghong Song,
	Jiri Olsa, bpf, Fabian Grünbichler, Arnout Engelen,
	Mattia Rizzolo, kpcyrd, Christian Heusel, Câju Mihai-Drosi,
	Eric Biggers, Sebastian Andrzej Siewior, linux-kbuild,
	linux-kernel, linux-arch, linux-modules, linux-security-module,
	linux-doc, linuxppc-dev, linux-integrity, debian-kernel
In-Reply-To: <8de0e6ad-987a-4729-bbd0-8399968dbb48@suse.com>

On 2026-05-26 11:47:09+0200, Petr Pavlu wrote:
> On 5/5/26 11:05 AM, Thomas Weißschuh wrote:
> > Switching the types will make some later changes cleaner.
> 
> Since the updated version drops the patch "module: Deduplicate signature
> extraction", I believe this change is no longer necessary.

Ack.

(...)

Thomas

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox