Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* [PATCH v4 07/12] KEYS: Introduce link restriction to include builtin, secondary and mok keys
From: Eric Snowberg @ 2021-08-19  0:21 UTC (permalink / raw)
  To: keyrings, linux-integrity, zohar, dhowells, dwmw2, herbert, davem,
	jarkko, jmorris, serge
  Cc: eric.snowberg, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, konrad.wilk
In-Reply-To: <20210819002109.534600-1-eric.snowberg@oracle.com>

Introduce a new link restriction that includes the trusted builtin,
secondary and mok keys. The restriction is based on the key to be added
being vouched for by a key in any of these three keyrings.

Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
---
v3: Initial version
v4: moved code under CONFIG_INTEGRITY_MOK_KEYRING
---
 certs/system_keyring.c        | 23 +++++++++++++++++++++++
 include/keys/system_keyring.h |  6 ++++++
 2 files changed, 29 insertions(+)

diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index 94af3fe107b4..a75c815a42c8 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -99,6 +99,29 @@ void __init set_mok_trusted_keys(struct key *keyring)
 {
 	mok_trusted_keys = keyring;
 }
+
+/**
+ * restrict_link_by_builtin_secondary_and_ca_trusted
+ *
+ * Restrict the addition of keys into a keyring based on the key-to-be-added
+ * being vouched for by a key in either the built-in, the secondary, or
+ * the mok keyrings.
+ */
+int restrict_link_by_builtin_secondary_and_ca_trusted(
+	struct key *dest_keyring,
+	const struct key_type *type,
+	const union key_payload *payload,
+	struct key *restrict_key)
+{
+	if (mok_trusted_keys && type == &key_type_keyring &&
+	    dest_keyring == secondary_trusted_keys &&
+	    payload == &mok_trusted_keys->payload)
+		/* Allow the mok keyring to be added to the secondary */
+		return 0;
+
+	return restrict_link_by_builtin_and_secondary_trusted(dest_keyring, type,
+							      payload, restrict_key);
+}
 #endif
 
 /*
diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h
index 059e32e36b3a..8cc9606a6cab 100644
--- a/include/keys/system_keyring.h
+++ b/include/keys/system_keyring.h
@@ -39,8 +39,14 @@ extern int restrict_link_by_builtin_and_secondary_trusted(
 #endif
 
 #ifdef CONFIG_INTEGRITY_MOK_KEYRING
+extern int restrict_link_by_builtin_secondary_and_ca_trusted(
+	struct key *dest_keyring,
+	const struct key_type *type,
+	const union key_payload *payload,
+	struct key *restrict_key);
 extern void __init set_mok_trusted_keys(struct key *keyring);
 #else
+#define restrict_link_by_builtin_secondary_and_ca_trusted restrict_link_by_builtin_trusted
 static inline void __init set_mok_trusted_keys(struct key *keyring)
 {
 }
-- 
2.18.4


^ permalink raw reply related

* [PATCH v4 06/12] KEYS: add a reference to mok keyring
From: Eric Snowberg @ 2021-08-19  0:21 UTC (permalink / raw)
  To: keyrings, linux-integrity, zohar, dhowells, dwmw2, herbert, davem,
	jarkko, jmorris, serge
  Cc: eric.snowberg, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, konrad.wilk
In-Reply-To: <20210819002109.534600-1-eric.snowberg@oracle.com>

Expose the .mok keyring created in integrity code by adding
a reference.  This makes the mok keyring accessible for keyring
restrictions in the future.

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
---
v2: Initial version
v3: set_mok_trusted_keys only available when secondary is enabled
v4: Moved code under CONFIG_INTEGRITY_MOK_KEYRING
---
 certs/system_keyring.c        | 9 +++++++++
 include/keys/system_keyring.h | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index 692365dee2bd..94af3fe107b4 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -22,6 +22,9 @@ static struct key *builtin_trusted_keys;
 #ifdef CONFIG_SECONDARY_TRUSTED_KEYRING
 static struct key *secondary_trusted_keys;
 #endif
+#ifdef CONFIG_INTEGRITY_MOK_KEYRING
+static struct key *mok_trusted_keys;
+#endif
 #ifdef CONFIG_INTEGRITY_PLATFORM_KEYRING
 static struct key *platform_trusted_keys;
 #endif
@@ -91,6 +94,12 @@ static __init struct key_restriction *get_builtin_and_secondary_restriction(void
 	return restriction;
 }
 #endif
+#ifdef CONFIG_INTEGRITY_MOK_KEYRING
+void __init set_mok_trusted_keys(struct key *keyring)
+{
+	mok_trusted_keys = keyring;
+}
+#endif
 
 /*
  * Create the trusted keyrings
diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h
index 6acd3cf13a18..059e32e36b3a 100644
--- a/include/keys/system_keyring.h
+++ b/include/keys/system_keyring.h
@@ -38,6 +38,14 @@ extern int restrict_link_by_builtin_and_secondary_trusted(
 #define restrict_link_by_builtin_and_secondary_trusted restrict_link_by_builtin_trusted
 #endif
 
+#ifdef CONFIG_INTEGRITY_MOK_KEYRING
+extern void __init set_mok_trusted_keys(struct key *keyring);
+#else
+static inline void __init set_mok_trusted_keys(struct key *keyring)
+{
+}
+#endif
+
 extern struct pkcs7_message *pkcs7;
 #ifdef CONFIG_SYSTEM_BLACKLIST_KEYRING
 extern int mark_hash_blacklisted(const char *hash);
-- 
2.18.4


^ permalink raw reply related

* [PATCH v4 03/12] KEYS: CA link restriction
From: Eric Snowberg @ 2021-08-19  0:21 UTC (permalink / raw)
  To: keyrings, linux-integrity, zohar, dhowells, dwmw2, herbert, davem,
	jarkko, jmorris, serge
  Cc: eric.snowberg, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, konrad.wilk
In-Reply-To: <20210819002109.534600-1-eric.snowberg@oracle.com>

Add a new link restriction.  Restrict the addition of keys in a keyring
based on the key to be added being a CA (self-signed).

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
---
v1: Initial version
v2: Removed secondary keyring references
v3: Removed restrict_link_by_system_trusted_or_ca
    Simplify restrict_link_by_ca - only see if the key is a CA
    Did not add __init in front of restrict_link_by_ca in case
      restriction could be resued in the future
v4: Unmodified from v3
---
 crypto/asymmetric_keys/restrict.c | 40 +++++++++++++++++++++++++++++++
 include/crypto/public_key.h       |  5 ++++
 2 files changed, 45 insertions(+)

diff --git a/crypto/asymmetric_keys/restrict.c b/crypto/asymmetric_keys/restrict.c
index 84cefe3b3585..9ae43d3f862b 100644
--- a/crypto/asymmetric_keys/restrict.c
+++ b/crypto/asymmetric_keys/restrict.c
@@ -108,6 +108,46 @@ int restrict_link_by_signature(struct key *dest_keyring,
 	return ret;
 }
 
+/**
+ * restrict_link_by_ca - Restrict additions to a ring of CA keys
+ * @dest_keyring: Keyring being linked to.
+ * @type: The type of key being added.
+ * @payload: The payload of the new key.
+ * @trusted: Unused.
+ *
+ * Check if the new certificate is a CA. If it is a CA, then mark the new
+ * certificate as being ok to link.
+ *
+ * Returns 0 if the new certificate was accepted, -ENOKEY if we could not find
+ * a matching parent certificate in the trusted list.  -ENOPKG if the signature
+ * uses unsupported crypto, or some other error if there is a matching
+ * certificate  but the signature check cannot be performed.
+ */
+int restrict_link_by_ca(struct key *dest_keyring,
+			const struct key_type *type,
+			const union key_payload *payload,
+			struct key *trust_keyring)
+{
+	const struct public_key_signature *sig;
+	const struct public_key *pkey;
+
+	if (type != &key_type_asymmetric)
+		return -EOPNOTSUPP;
+
+	sig = payload->data[asym_auth];
+	if (!sig)
+		return -ENOPKG;
+
+	if (!sig->auth_ids[0] && !sig->auth_ids[1])
+		return -ENOKEY;
+
+	pkey = payload->data[asym_crypto];
+	if (!pkey)
+		return -ENOPKG;
+
+	return public_key_verify_signature(pkey, sig);
+}
+
 static bool match_either_id(const struct asymmetric_key_ids *pair,
 			    const struct asymmetric_key_id *single)
 {
diff --git a/include/crypto/public_key.h b/include/crypto/public_key.h
index 47accec68cb0..545af1ea57de 100644
--- a/include/crypto/public_key.h
+++ b/include/crypto/public_key.h
@@ -71,6 +71,11 @@ extern int restrict_link_by_key_or_keyring_chain(struct key *trust_keyring,
 						 const union key_payload *payload,
 						 struct key *trusted);
 
+extern int restrict_link_by_ca(struct key *dest_keyring,
+			       const struct key_type *type,
+			       const union key_payload *payload,
+			       struct key *trust_keyring);
+
 extern int query_asymmetric_key(const struct kernel_pkey_params *,
 				struct kernel_pkey_query *);
 
-- 
2.18.4


^ permalink raw reply related

* [PATCH v4 00/12] Enroll kernel keys thru MOK
From: Eric Snowberg @ 2021-08-19  0:20 UTC (permalink / raw)
  To: keyrings, linux-integrity, zohar, dhowells, dwmw2, herbert, davem,
	jarkko, jmorris, serge
  Cc: eric.snowberg, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, konrad.wilk

Many UEFI Linux distributions boot using shim.  The UEFI shim provides
what is called Machine Owner Keys (MOK).  Shim uses both the UEFI Secure
Boot DB and MOK keys to validate the next step in the boot chain.  The
MOK facility can be used to import user generated keys.  These keys can
be used to sign an end-user development kernel build.  When Linux boots,
pre-boot keys (both UEFI Secure Boot DB and MOK keys) get loaded in the
Linux .platform keyring.  

Currently, pre-boot keys are not trusted within the Linux trust boundary
[1]. These platform keys can only be used for kexec. If an end-user
wants to use their own key within the Linux trust boundary, they must
either compile it into the kernel themselves or use the insert-sys-cert
script. Both options present a problem. Many end-users do not want to
compile their own kernels. With the insert-sys-cert option, there are
missing upstream changes [2].  Also, with the insert-sys-cert option,
the end-user must re-sign their kernel again with their own key, and
then insert that key into the MOK db. Another problem with
insert-sys-cert is that only a single key can be inserted into a
compressed kernel.

Having the ability to insert a key into the Linux trust boundary opens
up various possibilities.  The end-user can use a pre-built kernel and
sign their own kernel modules.  It also opens up the ability for an
end-user to more easily use digital signature based IMA-appraisal.  To
get a key into the ima keyring, it must be signed by a key within the
Linux trust boundary.

Downstream Linux distros try to have a single signed kernel for each
architecture.  Each end-user may use this kernel in entirely different
ways.  Some downstream kernels have chosen to always trust platform keys
within the Linux trust boundary for kernel module signing.  These
kernels have no way of using digital signature base IMA appraisal.

This series introduces a new Linux kernel keyring containing the Machine
Owner Keys (MOK) called .mok. It also adds a new MOK variable to shim.
This variable allows the end-user to decide if they want to trust keys
enrolled in the MOK within the Linux trust boundary.  By default,
nothing changes; MOK keys are not trusted within the Linux kernel.  They
are only trusted after the end-user makes the decision themselves.  The
end-user would set this through mokutil using a new --trust-mok option
[3]. This would work similar to how the kernel uses MOK variables to
enable/disable signature validation as well as use/ignore the db.

When shim boots, it mirrors the new MokTML Boot Services variable to a
new MokListTrustedRT Runtime Services variable and extends PCR14.
MokListTrustedRT is written without EFI_VARIABLE_NON_VOLATILE set,
preventing an end-user from setting it after booting and doing a kexec.

When the kernel boots, if MokListTrustedRT is set and
EFI_VARIABLE_NON_VOLATILE is not set, the MokListRT is loaded into the
mok keyring instead of the platform keyring. Mimi has suggested that
only CA keys be loaded into this keyring. All other certs will load 
into the platform keyring instead.

The .mok keyring contains a new keyring permission that only allows CA
keys to be loaded. If the permission fails, the key is later loaded into
the platform keyring.  After all keys are added into the .mok keyring,
they are linked to the secondary trusted keyring.  After the link is 
created, keys contained in the .mok keyring will automatically be 
searched when searching the secondary trusted keys.

Secure Boot keys will never be trusted.  They will always be loaded into
the platform keyring.  If an end-user wanted to trust one, they would
need to enroll it into the MOK.

I have included links to both the mokutil [3] and shim [4] changes I
have made to support this new functionality.

V2 changes:
- The .mok keyring persists past boot
- Removed the unrestricted move into the secondary keyring
- Removed the keyring move bypass patch
- Added restrictions to allow the .mok to be linked to either the
  builtin or secondary keyrings
- Secondary keyring dependency has been removed

V3 changes:
- Only CA keys contained in the MOKList are loaded, nothing else
- Support for kernels built without the secondary trusted keyring
  has been dropped.

V4 changes:
- Add new Kconfig INTEGRITY_MOK_KEYRING and move all mok keyring
  code behind it
- Changed patch series ordering
- Consolidated a few patches

[1] https://lore.kernel.org/lkml/1556221605.24945.3.camel@HansenPartnership.com/
[2] https://lore.kernel.org/patchwork/cover/902768/
[3] https://github.com/esnowberg/mokutil/tree/0.3.0-mokvars-v2
[4] https://github.com/esnowberg/shim/tree/mokvars-v2

Eric Snowberg (12):
  integrity: Introduce a Linux keyring for the Machine Owner Key (MOK)
  integrity: Do not allow mok keyring updates following init
  KEYS: CA link restriction
  integrity: restrict INTEGRITY_KEYRING_MOK to restrict_link_by_ca
  integrity: add new keyring handler for mok keys
  KEYS: add a reference to mok keyring
  KEYS: Introduce link restriction to include builtin, secondary and mok
    keys
  KEYS: integrity: change link restriction to trust the mok keyring
  KEYS: link secondary_trusted_keys to mok trusted keys
  integrity: store reference to mok keyring
  integrity: Trust MOK keys if MokListTrustedRT found
  integrity: Only use mok keyring when uefi_check_trust_mok_keys is true

 certs/system_keyring.c                        | 40 ++++++++-
 crypto/asymmetric_keys/restrict.c             | 40 +++++++++
 include/crypto/public_key.h                   |  5 ++
 include/keys/system_keyring.h                 | 14 +++
 security/integrity/Kconfig                    | 11 +++
 security/integrity/Makefile                   |  1 +
 security/integrity/digsig.c                   | 18 +++-
 security/integrity/integrity.h                | 17 +++-
 .../platform_certs/keyring_handler.c          | 17 +++-
 .../platform_certs/keyring_handler.h          |  5 ++
 security/integrity/platform_certs/load_uefi.c |  4 +-
 .../integrity/platform_certs/mok_keyring.c    | 85 +++++++++++++++++++
 12 files changed, 249 insertions(+), 8 deletions(-)
 create mode 100644 security/integrity/platform_certs/mok_keyring.c

base-commit: 7c60610d476766e128cc4284bb6349732cbd6606
-- 
2.18.4

^ permalink raw reply

* Re: [PATCH v28 22/25] Audit: Add record for multiple process LSM attributes
From: Casey Schaufler @ 2021-08-18 21:59 UTC (permalink / raw)
  To: Paul Moore
  Cc: casey.schaufler, James Morris, linux-security-module, selinux,
	linux-audit, keescook, john.johansen, penguin-kernel,
	Stephen Smalley, Casey Schaufler
In-Reply-To: <CAHC9VhQCN2_MsCoXfU7Z-syYHj2o8HaSECf5E62ZFcNZd9_4QA@mail.gmail.com>

On 8/16/2021 11:57 AM, Paul Moore wrote:
> On Fri, Aug 13, 2021 at 5:47 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 8/13/2021 1:43 PM, Paul Moore wrote:
...
> Yeah, the thought occurred to me, but we are clearly already in the
> maybe-the-assumptions-are-wrong stage so I'm not going to rely on that
> being 100%.  We definitely need to track this down before we start
> making to many more guesses about what is working and what is not.

I've been tracking down where the audit context isn't set where
we'd expect it to be, I've identified 5 cases:

	1000	AUDIT_GET 		- Get Status
	1001	AUDIT_SET 		- Set status enable/disable/auditd
	1010	AUDIT_SIGNAL_INFO
	1130	AUDIT_SERVICE_START
	1131	AUDIT_SEVICE_STOP

These are all events that relate to the audit system itself.
It seems plausible that these really aren't syscalls and hence
shouldn't be expected to have an audit_context. I will create a
patch that treats these as the special cases I believe them to be.



^ permalink raw reply

* [PATCH v4 4/4] landlock_restrict_self.2: Document new syscall
From: Mickaël Salaün @ 2021-08-18 15:59 UTC (permalink / raw)
  To: Alejandro Colomar, G . Branden Robinson, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Vincent Dagonneau, landlock, linux-man, linux-security-module,
	Mickaël Salaün
In-Reply-To: <20210818155931.484070-1-mic@digikod.net>

From: Mickaël Salaün <mic@linux.microsoft.com>

This is an adaptation of
https://www.kernel.org/doc/html/v5.13/userspace-api/landlock.html

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210818155931.484070-5-mic@digikod.net
---

Changes since v2:
* Add an EXAMPLES section referring to landlock(7).
* Change list order in the SEE ALSO section.
* Fix .IR and .BR use as explained by Alejandro Colomar.

Changes since v1:
* Replace all ".I" with ".IR", except when used for titles.
* Append punctuation to ".IR" and ".BR" when it makes sense (requested
  by Alejandro Colomar).
* Cut lines according to the semantic newline rules (requested by
  Alejandro Colomar).
* Remove roman style from ".TP" section titles (requested by Alejandro
  Colomar).
* Add comma after "i.e." and "e.g.".
* Add a "CONFORMING TO" section.
* Replace "(2)" with "()" for the described syscall name.
---
 man2/landlock_restrict_self.2 | 133 ++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)
 create mode 100644 man2/landlock_restrict_self.2

diff --git a/man2/landlock_restrict_self.2 b/man2/landlock_restrict_self.2
new file mode 100644
index 000000000000..4b10997e2fb6
--- /dev/null
+++ b/man2/landlock_restrict_self.2
@@ -0,0 +1,133 @@
+.\" Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+.\" Copyright © 2019-2020 ANSSI
+.\" Copyright © 2021 Microsoft Corporation
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH LANDLOCK_RESTRICT_SELF 2 2021-06-27 Linux "Linux Programmer's Manual"
+.SH NAME
+landlock_restrict_self \- enforce a Landlock ruleset
+.SH SYNOPSIS
+.nf
+.BR "#include <linux/landlock.h>" "  /* Definition of " LANDLOCK_* " constants */"
+.BR "#include <sys/syscall.h>" "     /* Definition of " SYS_* " constants */"
+.PP
+.BI "int syscall(SYS_landlock_restrict_self, int " ruleset_fd ,
+.BI "            __u32 " flags );
+.SH DESCRIPTION
+Once a Landlock ruleset is populated with the desired rules, the
+.BR landlock_restrict_self ()
+system call enables enforcing this ruleset on the calling thread.
+See
+.BR landlock (7)
+for a global overview.
+.PP
+A thread can be restricted with multiple rulesets that are then composed
+together to form the thread's Landlock domain.
+This can be seen as a stack of rulesets but it is implemented in a more
+efficient way.
+A domain can only be updated in such a way that the constraints of each
+past and future composed rulesets will restrict the thread and its future
+children for their entire life.
+It is then possible to gradually enforce tailored access control policies
+with multiple independant rulesets coming from different sources
+(e.g., init system configuration, user session policy,
+built-in application policy).
+However, most applications should only need one call to
+.BR landlock_restrict_self ()
+and they should avoid arbitrary numbers of such calls because of the
+composed rulesets limit.
+Instead, developers are encouraged to build a tailored ruleset thanks to
+multiple calls to
+.BR landlock_add_rule (2).
+.PP
+In order to enforce a ruleset, either the caller must have the
+.B CAP_SYS_ADMIN
+capability in its user namespace, or the thread must already have the
+.I no_new_privs
+bit set.
+As for
+.BR seccomp (2),
+this avoids scenarios where unprivileged processes can affect the behavior
+of privileged children (e.g., because of set-user-ID binaries).
+If that bit was not already set by an ancestor of this thread,
+the thread must make the following call:
+.IP
+.EX
+prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+.EE
+.PP
+.I ruleset_fd
+is a Landlock ruleset file descriptor obtained with
+.BR landlock_create_ruleset (2)
+and fully populated with a set of calls to
+.BR landlock_add_rule (2).
+.PP
+.I flags
+must be 0.
+.SH RETURN VALUE
+On success,
+.BR landlock_restrict_self ()
+returns 0.
+.SH ERRORS
+.BR landlock_restrict_self ()
+can failed for the following reasons:
+.TP
+.B EOPNOTSUPP
+Landlock is supported by the kernel but disabled at boot time.
+.TP
+.B EINVAL
+.I flags
+is not 0.
+.TP
+.B EBADF
+.I ruleset_fd
+is not a file descriptor for the current thread.
+.TP
+.B EBADFD
+.I ruleset_fd
+is not a ruleset file descriptor.
+.TP
+.B EPERM
+.I ruleset_fd
+has no read access to the underlying ruleset,
+or the calling thread is not running with
+.IR no_new_privs ,
+or it doesn't have the
+.B CAP_SYS_ADMIN
+in its user namespace.
+.TP
+.B E2BIG
+The maximum number of composed rulesets is reached for the calling thread.
+This limit is currently 64.
+.SH VERSIONS
+Landlock was added in Linux 5.13.
+.SH CONFORMING TO
+This system call is Linux-specific.
+.SH EXAMPLES
+See
+.BR landlock (7).
+.SH SEE ALSO
+.BR landlock_create_ruleset (2),
+.BR landlock_add_rule (2),
+.BR landlock (7)
-- 
2.32.0


^ permalink raw reply related

* [PATCH v4 3/4] landlock_add_rule.2: Document new syscall
From: Mickaël Salaün @ 2021-08-18 15:59 UTC (permalink / raw)
  To: Alejandro Colomar, G . Branden Robinson, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Vincent Dagonneau, landlock, linux-man, linux-security-module,
	Mickaël Salaün
In-Reply-To: <20210818155931.484070-1-mic@digikod.net>

From: Mickaël Salaün <mic@linux.microsoft.com>

This is an adaptation of
https://www.kernel.org/doc/html/v5.13/userspace-api/landlock.html

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210818155931.484070-4-mic@digikod.net
---

Changes since v2:
* Fix syscall's rule_attr pointer.
* Add an EXAMPLES section referring to landlock(7).
* Change list order in the SEE ALSO section.
* Fix .IR and .BR use as explained by Alejandro Colomar.

Changes since v1:
* Replace all ".I" with ".IR", except when used for titles.
* Append punctuation to ".IR" and ".BR" when it makes sense (requested
  by Alejandro Colomar).
* Cut lines according to the semantic newline rules (requested by
  Alejandro Colomar).
* Remove roman style from ".TP" section titles (requested by Alejandro
  Colomar).
* Add comma after "i.e." and "e.g.".
* Add a "CONFORMING TO" section.
* Replace "(2)" with "()" for the described syscall name.
---
 man2/landlock_add_rule.2 | 144 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100644 man2/landlock_add_rule.2

diff --git a/man2/landlock_add_rule.2 b/man2/landlock_add_rule.2
new file mode 100644
index 000000000000..eafb8f8201b7
--- /dev/null
+++ b/man2/landlock_add_rule.2
@@ -0,0 +1,144 @@
+.\" Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+.\" Copyright © 2019-2020 ANSSI
+.\" Copyright © 2021 Microsoft Corporation
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH LANDLOCK_ADD_RULE 2 2021-06-27 Linux "Linux Programmer's Manual"
+.SH NAME
+landlock_add_rule \- add a new Landlock rule to a ruleset
+.SH SYNOPSIS
+.nf
+.BR "#include <linux/landlock.h>" "  /* Definition of " LANDLOCK_* " constants */"
+.BR "#include <sys/syscall.h>" "     /* Definition of " SYS_* " constants */"
+.PP
+.BI "int syscall(SYS_landlock_add_rule, int " ruleset_fd ,
+.BI "            enum landlock_rule_type " rule_type ,
+.BI "            const void *" rule_attr ", __u32 " flags );
+.SH DESCRIPTION
+A Landlock rule describes an action on an object.
+An object is currently a file hierarchy, and the related filesystem actions
+are defined with a set of access rights.
+This
+.BR landlock_add_rule ()
+system call enables adding a new Landlock rule to an existing ruleset
+created with
+.BR landlock_create_ruleset (2).
+See
+.BR landlock (7)
+for a global overview.
+.PP
+.I ruleset_fd
+is a Landlock ruleset file descriptor obtained with
+.BR landlock_create_ruleset (2).
+.PP
+.I rule_type
+identifies the structure type pointed to by
+.IR rule_attr .
+Currently, Linux supports the following
+.I rule_type
+value:
+.TP
+.B LANDLOCK_RULE_PATH_BENEATH
+This defines the object type as a file hierarchy.
+In this case,
+.I rule_attr
+points to the following structure:
+.IP
+.in +4n
+.EX
+struct landlock_path_beneath_attr {
+    __u64 allowed_access;
+    __s32 parent_fd;
+} __attribute__((packed));
+.EE
+.in
+.IP
+.I allowed_access
+contains a bitmask of allowed filesystem actions for this file hierarchy
+(see
+.B Filesystem actions
+in
+.BR landlock (7)).
+.IP
+.I parent_fd
+is an opened file descriptor, preferably with the
+.I O_PATH
+flag,
+which identifies the parent directory of the file hierarchy or
+just a file.
+.PP
+.I flags
+must be 0.
+.SH RETURN VALUE
+On success,
+.BR landlock_add_rule ()
+returns 0.
+.SH ERRORS
+.BR landlock_add_rule ()
+can failed for the following reasons:
+.TP
+.B EOPNOTSUPP
+Landlock is supported by the kernel but disabled at boot time.
+.TP
+.B EINVAL
+.I flags
+is not 0, or the rule accesses are inconsistent (i.e.,
+.I rule_attr->allowed_access
+is not a subset of the ruleset handled accesses).
+.TP
+.B ENOMSG
+Empty accesses (i.e.,
+.I rule_attr->allowed_access
+is 0).
+.TP
+.B EBADF
+.I ruleset_fd
+is not a file descriptor for the current thread, or a member of
+.I rule_attr
+is not a file descriptor as expected.
+.TP
+.B EBADFD
+.I ruleset_fd
+is not a ruleset file descriptor, or a member of
+.I rule_attr
+is not the expected file descriptor type.
+.TP
+.B EPERM
+.I ruleset_fd
+has no write access to the underlying ruleset.
+.TP
+.B EFAULT
+.I rule_attr
+was not a valid address.
+.SH VERSIONS
+Landlock was added in Linux 5.13.
+.SH CONFORMING TO
+This system call is Linux-specific.
+.SH EXAMPLES
+See
+.BR landlock (7).
+.SH SEE ALSO
+.BR landlock_create_ruleset (2),
+.BR landlock_restrict_self (2),
+.BR landlock (7)
-- 
2.32.0


^ permalink raw reply related

* [PATCH v4 1/4] landlock.7: Add a new page to introduce Landlock
From: Mickaël Salaün @ 2021-08-18 15:59 UTC (permalink / raw)
  To: Alejandro Colomar, G . Branden Robinson, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Vincent Dagonneau, landlock, linux-man, linux-security-module,
	Mickaël Salaün
In-Reply-To: <20210818155931.484070-1-mic@digikod.net>

From: Mickaël Salaün <mic@linux.microsoft.com>

From the user point of view, Landlock is a set of system calls enabling
to build and enforce a set of access-control rules.  A ruleset can be
created with landlock_create_ruleset(2), populated with
landlock_add_rule(2) and enforced with landlock_restrict_self(2).  This
man page gives an overview of the whole mechanism.  Details of these
system calls are documented in their respective man pages.

This is an adaptation of
https://www.kernel.org/doc/html/v5.13/userspace-api/landlock.html

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210818155931.484070-2-mic@digikod.net
---

Changes since v3:
* Fix each…its/their as pointed out by Branden and Alejandro.

Changes since v2:
* Fix semantic newlines according to Alejandro Colomar.
* Fix minor grammar issues spotted by Alejandro.
* Fix .IR and .BR use as explained by Alejandro.

Changes since v1:
* Replace all ".I" with ".IR", except when used for titles.
* Append punctuation to ".IR" and ".BR" when it makes sense (requested
  by Alejandro Colomar).
* Cut lines according to the semantic newline rules (requested by
  Alejandro Colomar).
* Remove roman style from ".TP" section titles (requested by Alejandro
  Colomar).
* Add comma after "i.e." and "e.g.".
* Move the example in a new EXAMPLES section.
* Improve title.
* Explain the LSM acronym at first use.
---
 man7/landlock.7 | 361 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 361 insertions(+)
 create mode 100644 man7/landlock.7

diff --git a/man7/landlock.7 b/man7/landlock.7
new file mode 100644
index 000000000000..710e7feb7924
--- /dev/null
+++ b/man7/landlock.7
@@ -0,0 +1,361 @@
+.\" Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+.\" Copyright © 2019-2020 ANSSI
+.\" Copyright © 2021 Microsoft Corporation
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH LANDLOCK 7 2021-06-27 Linux "Linux Programmer's Manual"
+.SH NAME
+Landlock \- unprivileged access-control
+.SH DESCRIPTION
+Landlock is an access-control system that enables any processes to
+securely restrict themselves and their future children.
+Because Landlock is a stackable Linux Security Module (LSM),
+it makes it possible to create safe security sandboxes as new security
+layers in addition to the existing system-wide access-controls.
+This kind of sandbox is expected to help mitigate
+the security impact of bugs,
+and unexpected or malicious behaviors in applications.
+.PP
+A Landlock security policy is a set of access rights
+(e.g., open a file in read-only, make a directory, etc.)
+tied to a file hierarchy.
+Such policy can be configured and enforced by processes for themselves
+using three system calls:
+.IP \(bu 2
+.BR landlock_create_ruleset (2)
+creates a new ruleset;
+.IP \(bu
+.BR landlock_add_rule (2)
+adds a new rule to a ruleset;
+.IP \(bu
+.BR landlock_restrict_self (2)
+enforces a ruleset on the calling thread.
+.PP
+To be able to use these system calls,
+the running kernel must support Landlock and
+it must be enabled at boot time.
+.\"
+.SS Landlock rules
+A Landlock rule describes an action on an object.
+An object is currently a file hierarchy,
+and the related filesystem actions are defined with access rights (see
+.BR landlock_add_rule (2)).
+A set of rules is aggregated in a ruleset,
+which can then restrict the thread enforcing it,
+and its future children.
+.\"
+.SS Filesystem actions
+These flags enable to restrict a sandboxed process to a
+set of actions on files and directories.
+Files or directories opened before the sandboxing
+are not subject to these restrictions.
+See
+.BR landlock_add_rule (2)
+and
+.BR landlock_create_ruleset (2)
+for more context.
+.PP
+A file can only receive these access rights:
+.TP
+.B LANDLOCK_ACCESS_FS_EXECUTE
+Execute a file.
+.TP
+.B LANDLOCK_ACCESS_FS_WRITE_FILE
+Open a file with write access.
+.TP
+.B LANDLOCK_ACCESS_FS_READ_FILE
+Open a file with read access.
+.PP
+A directory can receive access rights related to files or directories.
+The following access right is applied to the directory itself,
+and the directories beneath it:
+.TP
+.B LANDLOCK_ACCESS_FS_READ_DIR
+Open a directory or list its content.
+.PP
+However,
+the following access rights only apply to the content of a directory,
+not the directory itself:
+.TP
+.B LANDLOCK_ACCESS_FS_REMOVE_DIR
+Remove an empty directory or rename one.
+.TP
+.B LANDLOCK_ACCESS_FS_REMOVE_FILE
+Unlink (or rename) a file.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_CHAR
+Create (or rename or link) a character device.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_DIR
+Create (or rename) a directory.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_REG
+Create (or rename or link) a regular file.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_SOCK
+Create (or rename or link) a UNIX domain socket.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_FIFO
+Create (or rename or link) a named pipe.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_BLOCK
+Create (or rename or link) a block device.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_SYM
+Create (or rename or link) a symbolic link.
+.\"
+.SS Layers of file path access rights
+Each time a thread enforces a ruleset on itself,
+it updates its Landlock domain with a new layer of policy.
+Indeed, this complementary policy is composed with the potentially other
+rulesets already restricting this thread.
+A sandboxed thread can then safely add more constraints to itself with a
+new enforced ruleset.
+.PP
+One policy layer grants access to a file path
+if at least one of its rules encountered on the path grants the access.
+A sandboxed thread can only access a file path
+if all its enforced policy layers grant the access
+as well as all the other system access controls
+(e.g., filesystem DAC, other LSM policies, etc.).
+.\"
+.SS Bind mounts and OverlayFS
+Landlock enables restricting access to file hierarchies,
+which means that these access rights can be propagated with bind mounts
+(cf.
+.BR mount_namespaces (7))
+but not with OverlayFS.
+.PP
+A bind mount mirrors a source file hierarchy to a destination.
+The destination hierarchy is then composed of the exact same files,
+on which Landlock rules can be tied,
+either via the source or the destination path.
+These rules restrict access when they are encountered on a path,
+which means that they can restrict access to
+multiple file hierarchies at the same time,
+whether these hierarchies are the result of bind mounts or not.
+.PP
+An OverlayFS mount point consists of upper and lower layers.
+These layers are combined in a merge directory, result of the mount point.
+This merge hierarchy may include files from the upper and lower layers,
+but modifications performed on the merge hierarchy
+only reflect on the upper layer.
+From a Landlock policy point of view,
+each of the OverlayFS layers and merge hierarchies is standalone and
+contains its own set of files and directories,
+which is different from a bind mount.
+A policy restricting an OverlayFS layer will not restrict the resulted
+merged hierarchy, and vice versa.
+Landlock users should then only think about file hierarchies they want to
+allow access to, regardless of the underlying filesystem.
+.\"
+.SS Inheritance
+Every new thread resulting from a
+.BR clone (2)
+inherits Landlock domain restrictions from its parent.
+This is similar to the
+.BR seccomp (2)
+inheritance or any other LSM dealing with tasks'
+.BR credentials (7).
+For instance, one process's thread may apply Landlock rules to itself,
+but they will not be automatically applied to other sibling threads
+(unlike POSIX thread credential changes, cf.
+.BR nptl (7)).
+.PP
+When a thread sandboxes itself,
+we have the guarantee that the related security policy
+will stay enforced on all this thread's descendants.
+This allows creating standalone and modular security policies
+per application,
+which will automatically be composed between themselves
+according to their runtime parent policies.
+.\"
+.SS Ptrace restrictions
+A sandboxed process has less privileges than a non-sandboxed process and
+must then be subject to additional restrictions
+when manipulating another process.
+To be allowed to use
+.BR ptrace (2)
+and related syscalls on a target process,
+a sandboxed process should have a subset of the target process rules,
+which means the tracee must be in a sub-domain of the tracer.
+.SH VERSIONS
+Landlock was added in Linux 5.13.
+.SH NOTES
+Landlock is enabled by
+.BR CONFIG_SECURITY_LANDLOCK .
+The
+.I lsm=lsm1,...,lsmN
+command line parameter controls the sequence of the initialization of
+Linux Security Modules.
+It must contain the string
+.I landlock
+to enable Landlock.
+If the command line parameter is not specified,
+the initialization falls back to the value of the deprecated
+.I security=
+command line parameter and further to the value of CONFIG_LSM.
+We can check that Landlock is enabled by looking for
+.I landlock: Up and running.
+in kernel logs.
+.PP
+It is currently not possible to restrict some file-related actions
+accessible through these system call families:
+.BR chdir (2),
+.BR truncate (2),
+.BR stat (2),
+.BR flock (2),
+.BR chmod (2),
+.BR chown (2),
+.BR setxattr (2),
+.BR utime (2),
+.BR ioctl (2),
+.BR fcntl (2),
+.BR access (2).
+Future Landlock evolutions will enable to restrict them.
+.SH EXAMPLES
+We first need to create the ruleset that will contain our rules.
+For this example,
+the ruleset will contain rules that only allow read actions,
+but write actions will be denied.
+The ruleset then needs to handle both of these kind of actions.
+See below for the description of filesystem actions.
+.PP
+.in +4n
+.EX
+int ruleset_fd;
+struct landlock_ruleset_attr ruleset_attr = {
+    .handled_access_fs =
+        LANDLOCK_ACCESS_FS_EXECUTE |
+        LANDLOCK_ACCESS_FS_WRITE_FILE |
+        LANDLOCK_ACCESS_FS_READ_FILE |
+        LANDLOCK_ACCESS_FS_READ_DIR |
+        LANDLOCK_ACCESS_FS_REMOVE_DIR |
+        LANDLOCK_ACCESS_FS_REMOVE_FILE |
+        LANDLOCK_ACCESS_FS_MAKE_CHAR |
+        LANDLOCK_ACCESS_FS_MAKE_DIR |
+        LANDLOCK_ACCESS_FS_MAKE_REG |
+        LANDLOCK_ACCESS_FS_MAKE_SOCK |
+        LANDLOCK_ACCESS_FS_MAKE_FIFO |
+        LANDLOCK_ACCESS_FS_MAKE_BLOCK |
+        LANDLOCK_ACCESS_FS_MAKE_SYM,
+};
+
+ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+if (ruleset_fd < 0) {
+    perror("Failed to create a ruleset");
+    return 1;
+}
+.EE
+.in
+.PP
+We can now add a new rule to this ruleset thanks to the returned file
+descriptor referring to this ruleset.
+The rule will only allow reading the file hierarchy
+.IR /usr .
+Without another rule, write actions would then be denied by the ruleset.
+To add
+.I /usr
+to the ruleset, we open it with the
+.I O_PATH
+flag and fill the
+.I struct landlock_path_beneath_attr
+with this file descriptor.
+.PP
+.in +4n
+.EX
+int err;
+struct landlock_path_beneath_attr path_beneath = {
+    .allowed_access =
+        LANDLOCK_ACCESS_FS_EXECUTE |
+        LANDLOCK_ACCESS_FS_READ_FILE |
+        LANDLOCK_ACCESS_FS_READ_DIR,
+};
+
+path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
+if (path_beneath.parent_fd < 0) {
+    perror("Failed to open file");
+    close(ruleset_fd);
+    return 1;
+}
+err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+                        &path_beneath, 0);
+close(path_beneath.parent_fd);
+if (err) {
+    perror("Failed to update ruleset");
+    close(ruleset_fd);
+    return 1;
+}
+.EE
+.in
+.PP
+We now have a ruleset with one rule allowing read access to
+.I /usr
+while denying all other handled accesses for the filesystem.
+The next step is to restrict the current thread from gaining more
+privileges
+(e.g., thanks to a set-user-ID binary).
+.PP
+.in +4n
+.EX
+if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+    perror("Failed to restrict privileges");
+    close(ruleset_fd);
+    return 1;
+}
+.EE
+.in
+.PP
+The current thread is now ready to sandbox itself with the ruleset.
+.PP
+.in +4n
+.EX
+if (landlock_restrict_self(ruleset_fd, 0)) {
+    perror("Failed to enforce ruleset");
+    close(ruleset_fd);
+    return 1;
+}
+close(ruleset_fd);
+.EE
+.in
+.PP
+If the
+.BR landlock_restrict_self (2)
+system call succeeds, the current thread is now restricted and this policy
+will be enforced on all its subsequently created children as well.
+Once a thread is landlocked, there is no way to remove its security policy;
+only adding more restrictions is allowed.
+These threads are now in a new Landlock domain,
+merge of their parent one (if any) with the new ruleset.
+.PP
+Full working code can be found in
+.UR https://git.kernel.org\:/pub\:/scm\:/linux\:/kernel\:/git\:/stable\:/linux.git\:/tree\:/samples\:/landlock\:/sandboxer.c
+.UE
+.SH SEE ALSO
+.BR landlock_create_ruleset (2),
+.BR landlock_add_rule (2),
+.BR landlock_restrict_self (2)
+.PP
+.UR https://landlock.io\:/
+.UE
-- 
2.32.0


^ permalink raw reply related

* [PATCH v4 2/4] landlock_create_ruleset.2: Document new syscall
From: Mickaël Salaün @ 2021-08-18 15:59 UTC (permalink / raw)
  To: Alejandro Colomar, G . Branden Robinson, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Vincent Dagonneau, landlock, linux-man, linux-security-module,
	Mickaël Salaün
In-Reply-To: <20210818155931.484070-1-mic@digikod.net>

From: Mickaël Salaün <mic@linux.microsoft.com>

This is an adaptation of
https://www.kernel.org/doc/html/v5.13/userspace-api/landlock.html

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210818155931.484070-3-mic@digikod.net
---

Changes since v2:
* Fix syscall signature (attr pointer).
* Add an EXAMPLES section referring to landlock(7).
* Change list order in the SEE ALSO section.
* Fix .IR and .BR use as explained by Alejandro Colomar.

Changes since v1:
* Replace all ".I" with ".IR", except when used for titles.
* Append punctuation to ".IR" and ".BR" when it makes sense (requested
  by Alejandro Colomar).
* Cut lines according to the semantic newline rules (requested by
  Alejandro Colomar).
* Remove roman style from ".TP" section titles (requested by Alejandro
  Colomar).
* Add comma after "i.e." and "e.g.".
* Add a "CONFORMING TO" section.
* Replace "(2)" with "()" for the described syscall name.
---
 man2/landlock_create_ruleset.2 | 139 +++++++++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)
 create mode 100644 man2/landlock_create_ruleset.2

diff --git a/man2/landlock_create_ruleset.2 b/man2/landlock_create_ruleset.2
new file mode 100644
index 000000000000..e1ca4bcf8c86
--- /dev/null
+++ b/man2/landlock_create_ruleset.2
@@ -0,0 +1,139 @@
+.\" Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+.\" Copyright © 2019-2020 ANSSI
+.\" Copyright © 2021 Microsoft Corporation
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH LANDLOCK_CREATE_RULESET 2 2021-06-27 Linux "Linux Programmer's Manual"
+.SH NAME
+landlock_create_ruleset \- create a new Landlock ruleset
+.SH SYNOPSIS
+.nf
+.BR "#include <linux/landlock.h>" "  /* Definition of " LANDLOCK_* " constants */"
+.BR "#include <sys/syscall.h>" "     /* Definition of " SYS_* " constants */"
+.PP
+.BI "int syscall(SYS_landlock_create_ruleset,
+.BI "            const struct landlock_ruleset_attr *" attr ,
+.BI "            size_t " size " , __u32 " flags );
+.SH DESCRIPTION
+A Landlock ruleset identifies a set of rules (i.e., actions on objects).
+This
+.BR landlock_create_ruleset ()
+system call enables creating a new file descriptor identifying a ruleset.
+This file descriptor can then be used by
+.BR landlock_add_rule (2)
+and
+.BR landlock_restrict_self (2).
+See
+.BR landlock (7)
+for a global overview.
+.PP
+.I attr
+specifies the properties of the new ruleset.
+It points to the following structure:
+.IP
+.in +4n
+.EX
+struct landlock_ruleset_attr {
+	__u64 handled_access_fs;
+};
+.EE
+.in
+.IP
+.I handled_access_fs
+is a bitmask of actions that is handled by this ruleset and should then be
+forbidden if no rule explicitly allow them
+(see
+.B Filesystem actions
+in
+.BR landlock (7)).
+This enables simply restricting ambient rights
+(e.g., global filesystem access) and is needed for compatibility reasons.
+.PP
+.I size
+must be specified as
+.I sizeof(struct landlock_ruleset_attr)
+for compatibility reasons.
+.PP
+.I flags
+must be 0 if
+.I attr
+is used.
+Otherwise,
+.I flags
+can be set to:
+.TP
+.B LANDLOCK_CREATE_RULESET_VERSION
+If
+.I attr
+is NULL and
+.I size
+is 0, then the returned value is the highest supported Landlock ABI version
+(starting at 1).
+This version can be used for a best-effort security approach,
+which is encouraged when user space is not pinned to a specific kernel
+version.
+All features documented in these man pages are available with the version
+1.
+.SH RETURN VALUE
+On success,
+.BR landlock_create_ruleset ()
+returns a new Landlock ruleset file descriptor, or a Landlock ABI version
+according to
+.IR flags .
+.SH ERRORS
+.BR landlock_create_ruleset ()
+can failed for the following reasons:
+.TP
+.B EOPNOTSUPP
+Landlock is supported by the kernel but disabled at boot time.
+.TP
+.B EINVAL
+Unknown
+.IR flags ,
+or unknown access, or too small
+.IR size .
+.TP
+.B E2BIG
+.I size
+is too big.
+.TP
+.B EFAULT
+.I attr
+was not a valid address.
+.TP
+.B ENOMSG
+Empty accesses (i.e.,
+.I attr->handled_access_fs
+is 0).
+.SH VERSIONS
+Landlock was added in Linux 5.13.
+.SH CONFORMING TO
+This system call is Linux-specific.
+.SH EXAMPLES
+See
+.BR landlock (7).
+.SH SEE ALSO
+.BR landlock_add_rule (2),
+.BR landlock_restrict_self (2),
+.BR landlock (7)
-- 
2.32.0


^ permalink raw reply related

* [PATCH v4 0/4] Add Landlock man pages
From: Mickaël Salaün @ 2021-08-18 15:59 UTC (permalink / raw)
  To: Alejandro Colomar, G . Branden Robinson, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Vincent Dagonneau, landlock, linux-man, linux-security-module,
	Mickaël Salaün

From: Mickaël Salaün <mic@linux.microsoft.com>

Hi,

These four documents give a global overview of Landlock and explain each
system calls.  This is mainly a formatting of the current kernel
documentation with some new additional details.

This fourth patch series fixes spelling issues pointed out by Alejandro
Colomar and Branden Robinson.  I also removed some recipients.

This patch series can be found in a Git repository:
https://github.com/landlock-lsm/man-pages/commits/landlock-v4

Previous version:
https://lore.kernel.org/landlock/20210730144116.332091-1-mic@digikod.net/

Regards,

Mickaël Salaün (4):
  landlock.7: Add a new page to introduce Landlock
  landlock_create_ruleset.2: Document new syscall
  landlock_add_rule.2: Document new syscall
  landlock_restrict_self.2: Document new syscall

 man2/landlock_add_rule.2       | 144 +++++++++++++
 man2/landlock_create_ruleset.2 | 139 +++++++++++++
 man2/landlock_restrict_self.2  | 133 ++++++++++++
 man7/landlock.7                | 361 +++++++++++++++++++++++++++++++++
 4 files changed, 777 insertions(+)
 create mode 100644 man2/landlock_add_rule.2
 create mode 100644 man2/landlock_create_ruleset.2
 create mode 100644 man2/landlock_restrict_self.2
 create mode 100644 man7/landlock.7


base-commit: 7127973e0c8ca36fda1f5d2d0adae04d61fa0d01
-- 
2.32.0


^ permalink raw reply

* Re: [PATCH v2] fscrypt: support trusted keys
From: Sumit Garg @ 2021-08-18  4:53 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Ahmad Fatoum, Eric Biggers, Jarkko Sakkinen, Theodore Y. Ts'o,
	Jaegeuk Kim, kernel, James Morris, Serge E. Hallyn,
	James Bottomley, David Howells, linux-fscrypt,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, linux-integrity,
	open list:SECURITY SUBSYSTEM, open list:ASYMMETRIC KEYS,
	Linux Kernel Mailing List
In-Reply-To: <f4264f0a83c1b080ad2a22d63ecf1fcca87dfebb.camel@linux.ibm.com>

On Tue, 17 Aug 2021 at 19:54, Mimi Zohar <zohar@linux.ibm.com> wrote:
>
> On Tue, 2021-08-17 at 16:13 +0200, Ahmad Fatoum wrote:
> > On 17.08.21 15:55, Mimi Zohar wrote:
> > > I have no opinion as to whether this is/isn't a valid usecase.
> >
> > So you'd be fine with merging trusted key support as is and leave encrypted
> > key support to someone who has a valid use case and wants to argue
> > in its favor?
>
> That decision as to whether it makes sense to support trusted keys
> directly, based on the new trust sources, is a decision left up to the
> maintainer(s) of the new usecase and the new trust sources maintainer
> Jarkko.
>

I would be in favor of supporting the use of trusted keys directly
when it comes to TEE as a trust source.

-Sumit

> thanks,
>
> Mimi
>

^ permalink raw reply

* Re: [PATCH v2] fscrypt: support trusted keys
From: Jarkko Sakkinen @ 2021-08-18  2:09 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Ahmad Fatoum, Eric Biggers, Theodore Y. Ts'o, Jaegeuk Kim,
	kernel, James Morris, Serge E. Hallyn, James Bottomley,
	Sumit Garg, David Howells, linux-fscrypt, linux-crypto,
	linux-integrity, linux-security-module, keyrings, linux-kernel
In-Reply-To: <f4264f0a83c1b080ad2a22d63ecf1fcca87dfebb.camel@linux.ibm.com>

On Tue, Aug 17, 2021 at 10:24:44AM -0400, Mimi Zohar wrote:
> On Tue, 2021-08-17 at 16:13 +0200, Ahmad Fatoum wrote:
> > On 17.08.21 15:55, Mimi Zohar wrote:
> > > I have no opinion as to whether this is/isn't a valid usecase.
> > 
> > So you'd be fine with merging trusted key support as is and leave encrypted
> > key support to someone who has a valid use case and wants to argue
> > in its favor?
> 
> That decision as to whether it makes sense to support trusted keys
> directly, based on the new trust sources, is a decision left up to the
> maintainer(s) of the new usecase and the new trust sources maintainer
> Jarkko.

I'm fine with "direct", as long as also "indirect" is supported.

/Jarkko

^ permalink raw reply

* Re: [PATCH 1/1] ima: check control characters in policy file path
From: Mimi Zohar @ 2021-08-17 17:45 UTC (permalink / raw)
  To: Tianxing Zhang; +Cc: linux-integrity, linux-security-module, linux-kernel
In-Reply-To: <20210814082723.261-1-anakinzhang96@gmail.com>

On Sat, 2021-08-14 at 16:27 +0800, Tianxing Zhang wrote:
> When a policy file path contains control characters like '\r' or '\b',
> invalid error messages can be printed to overwrite system messages:
> 
> $ echo -e "/\rtest 12345678" > /sys/kernel/security/ima/policy
> 
> This patch rejects policy paths with control characters.
> 
> Signed-off-by: Tianxing Zhang <anakinzhang96@gmail.com>
> ---
>  security/integrity/ima/ima_fs.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
> index 3d8e9d5db5aa..e6daa138de89 100644
> --- a/security/integrity/ima/ima_fs.c
> +++ b/security/integrity/ima/ima_fs.c
> @@ -316,6 +316,7 @@ static ssize_t ima_write_policy(struct file *file, const char __user *buf,
>  {
>  	char *data;
>  	ssize_t result;
> +	int i;
>  
>  	if (datalen >= PAGE_SIZE)
>  		datalen = PAGE_SIZE - 1;
> @@ -331,6 +332,14 @@ static ssize_t ima_write_policy(struct file *file, const char __user *buf,
>  		goto out;
>  	}
>  
> +	for (i = 0; data[i] != '\n' && data[i] != '\0'; i++) {
> +		if (iscntrl(data[i])) {
> +			pr_err_once("file path with no control characters required\n");
> +			result = -EINVAL;
> +			goto out_free;
> +		}
> +	}
> +
>  	result = mutex_lock_interruptible(&ima_write_mutex);
>  	if (result < 0)
>  		goto out_free;

The IMA audit messages already display pathnames via
audit_log_untrustedstring().  Shouldn't any change be limited to the
ima_policy_read() code path?

thanks,

Mimi


^ permalink raw reply

* Re: [PATCH v2] fscrypt: support trusted keys
From: Mimi Zohar @ 2021-08-17 14:24 UTC (permalink / raw)
  To: Ahmad Fatoum, Eric Biggers, Jarkko Sakkinen
  Cc: Theodore Y. Ts'o, Jaegeuk Kim, kernel, James Morris,
	Serge E. Hallyn, James Bottomley, Sumit Garg, David Howells,
	linux-fscrypt, linux-crypto, linux-integrity,
	linux-security-module, keyrings, linux-kernel
In-Reply-To: <b77836af-42a1-5aca-9363-d050352bd8aa@pengutronix.de>

On Tue, 2021-08-17 at 16:13 +0200, Ahmad Fatoum wrote:
> On 17.08.21 15:55, Mimi Zohar wrote:
> > I have no opinion as to whether this is/isn't a valid usecase.
> 
> So you'd be fine with merging trusted key support as is and leave encrypted
> key support to someone who has a valid use case and wants to argue
> in its favor?

That decision as to whether it makes sense to support trusted keys
directly, based on the new trust sources, is a decision left up to the
maintainer(s) of the new usecase and the new trust sources maintainer
Jarkko.

thanks,

Mimi


^ permalink raw reply

* Re: [PATCH v2] fscrypt: support trusted keys
From: Ahmad Fatoum @ 2021-08-17 14:13 UTC (permalink / raw)
  To: Mimi Zohar, Eric Biggers, Jarkko Sakkinen
  Cc: Theodore Y. Ts'o, Jaegeuk Kim, kernel, James Morris,
	Serge E. Hallyn, James Bottomley, Sumit Garg, David Howells,
	linux-fscrypt, linux-crypto, linux-integrity,
	linux-security-module, keyrings, linux-kernel
In-Reply-To: <285cb263d9c1c16f3918c98dd36074ef16568e6d.camel@linux.ibm.com>

On 17.08.21 15:55, Mimi Zohar wrote:
> On Tue, 2021-08-17 at 15:04 +0200, Ahmad Fatoum wrote:
>> Hi,
>>
>> On 12.08.21 02:54, Mimi Zohar wrote:
>>> On Wed, 2021-08-11 at 10:16 -0700, Eric Biggers wrote:
>>>
>>>> Neither of you actually answered my question, which is whether the support for
>>>> trusted keys in dm-crypt is a mistake.  I think you're saying that it is?  That
>>>> would imply that fscrypt shouldn't support trusted keys, but rather encrypted
>>>> keys -- which conflicts with Ahmad's patch which is adding support for trusted
>>>> keys.  Note that your reasoning for this is not documented at all in the
>>>> trusted-encrypted keys documentation; it needs to be (email threads don't really
>>>> matter), otherwise how would anyone know when/how to use this feature?
>>>
>>> True, but all of the trusted-encrypted key examples in the
>>> documentation are "encrypted" type keys, encrypted/decrypted based on a
>>> "trusted" type key.  There are no examples of using the "trusted" key
>>> type directly.  Before claiming that adding "trusted" key support in
>>> dm-crypt was a mistake, we should ask Ahmad why he felt dm-crypt needed
>>> to directly support "trusted" type keys.
>>
>> I wanted to persist the dm-crypt key as a sealed blob. With encrypted keys,
>> I would have to persist and unseal two blobs (load trusted key blob, load
>> encrypted key blob rooted to trusted key) with no extra benefit.
>>
>> I thus added direct support for trusted keys. Jarkko even commented on the
>> thread, but didn't voice objection to the approach (or agreement for that
>> matter), so I assumed the approach is fine.
>>
>> I can see the utility of using a single trusted key for TPMs, but for CAAM,
>> I see none and having an encrypted key for every trusted key just makes
>> it more cumbersome.
>>
>> In v1 here, I added encrypted key support as well, but dropped it for v2,
>> because I am not in a position to justify its use. Now that you and Eric
>> discussed it, should I send v3 with support for both encrypted and trusted
>> keys like with dm-crypt or how should we proceed?
> 
> With some applications, the indirection is important.   It allows the
> "encrypted" key type to be updated/re-encypted based on a new "trusted"
> key, without affecting the on disk encrypted key usage.

Those applications were already able to use the encrypted key support
in dm-crypt. For those where re-encryption/PCR-sealing isn't required,
direct trusted key support offers a simpler way to integrate.

> As much as I expected, directly using "trusted" keys is a result of the
> new trusted key sources.

More users = more use cases. You make it sound like a negative
thing.

> I have no opinion as to whether this is/isn't a valid usecase.

So you'd be fine with merging trusted key support as is and leave encrypted
key support to someone who has a valid use case and wants to argue
in its favor?

Cheers,
Ahmad

> 
> thanks,
> 
> Mimi
> 
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH v2] fscrypt: support trusted keys
From: Mimi Zohar @ 2021-08-17 13:55 UTC (permalink / raw)
  To: Ahmad Fatoum, Eric Biggers, Jarkko Sakkinen
  Cc: Theodore Y. Ts'o, Jaegeuk Kim, kernel, James Morris,
	Serge E. Hallyn, James Bottomley, Sumit Garg, David Howells,
	linux-fscrypt, linux-crypto, linux-integrity,
	linux-security-module, keyrings, linux-kernel
In-Reply-To: <a6eb6f38-b9f4-c59c-4181-2049f181e67d@pengutronix.de>

On Tue, 2021-08-17 at 15:04 +0200, Ahmad Fatoum wrote:
> Hi,
> 
> On 12.08.21 02:54, Mimi Zohar wrote:
> > On Wed, 2021-08-11 at 10:16 -0700, Eric Biggers wrote:
> > 
> >> Neither of you actually answered my question, which is whether the support for
> >> trusted keys in dm-crypt is a mistake.  I think you're saying that it is?  That
> >> would imply that fscrypt shouldn't support trusted keys, but rather encrypted
> >> keys -- which conflicts with Ahmad's patch which is adding support for trusted
> >> keys.  Note that your reasoning for this is not documented at all in the
> >> trusted-encrypted keys documentation; it needs to be (email threads don't really
> >> matter), otherwise how would anyone know when/how to use this feature?
> > 
> > True, but all of the trusted-encrypted key examples in the
> > documentation are "encrypted" type keys, encrypted/decrypted based on a
> > "trusted" type key.  There are no examples of using the "trusted" key
> > type directly.  Before claiming that adding "trusted" key support in
> > dm-crypt was a mistake, we should ask Ahmad why he felt dm-crypt needed
> > to directly support "trusted" type keys.
> 
> I wanted to persist the dm-crypt key as a sealed blob. With encrypted keys,
> I would have to persist and unseal two blobs (load trusted key blob, load
> encrypted key blob rooted to trusted key) with no extra benefit.
> 
> I thus added direct support for trusted keys. Jarkko even commented on the
> thread, but didn't voice objection to the approach (or agreement for that
> matter), so I assumed the approach is fine.
> 
> I can see the utility of using a single trusted key for TPMs, but for CAAM,
> I see none and having an encrypted key for every trusted key just makes
> it more cumbersome.
> 
> In v1 here, I added encrypted key support as well, but dropped it for v2,
> because I am not in a position to justify its use. Now that you and Eric
> discussed it, should I send v3 with support for both encrypted and trusted
> keys like with dm-crypt or how should we proceed?

With some applications, the indirection is important.   It allows the
"encrypted" key type to be updated/re-encypted based on a new "trusted"
key, without affecting the on disk encrypted key usage.

As much as I expected, directly using "trusted" keys is a result of the
new trusted key sources.  I have no opinion as to whether this is/isn't
a valid usecase.

thanks,

Mimi


^ permalink raw reply

* Re: [PATCH v2] fscrypt: support trusted keys
From: Ahmad Fatoum @ 2021-08-17 13:04 UTC (permalink / raw)
  To: Mimi Zohar, Eric Biggers, Jarkko Sakkinen
  Cc: Theodore Y. Ts'o, Jaegeuk Kim, kernel, James Morris,
	Serge E. Hallyn, James Bottomley, Sumit Garg, David Howells,
	linux-fscrypt, linux-crypto, linux-integrity,
	linux-security-module, keyrings, linux-kernel
In-Reply-To: <0e69a0aa394dd20347b06ae4e700aa17d52583ef.camel@linux.ibm.com>

Hi,

On 12.08.21 02:54, Mimi Zohar wrote:
> On Wed, 2021-08-11 at 10:16 -0700, Eric Biggers wrote:
> 
>> Neither of you actually answered my question, which is whether the support for
>> trusted keys in dm-crypt is a mistake.  I think you're saying that it is?  That
>> would imply that fscrypt shouldn't support trusted keys, but rather encrypted
>> keys -- which conflicts with Ahmad's patch which is adding support for trusted
>> keys.  Note that your reasoning for this is not documented at all in the
>> trusted-encrypted keys documentation; it needs to be (email threads don't really
>> matter), otherwise how would anyone know when/how to use this feature?
> 
> True, but all of the trusted-encrypted key examples in the
> documentation are "encrypted" type keys, encrypted/decrypted based on a
> "trusted" type key.  There are no examples of using the "trusted" key
> type directly.  Before claiming that adding "trusted" key support in
> dm-crypt was a mistake, we should ask Ahmad why he felt dm-crypt needed
> to directly support "trusted" type keys.

I wanted to persist the dm-crypt key as a sealed blob. With encrypted keys,
I would have to persist and unseal two blobs (load trusted key blob, load
encrypted key blob rooted to trusted key) with no extra benefit.

I thus added direct support for trusted keys. Jarkko even commented on the
thread, but didn't voice objection to the approach (or agreement for that
matter), so I assumed the approach is fine.

I can see the utility of using a single trusted key for TPMs, but for CAAM,
I see none and having an encrypted key for every trusted key just makes
it more cumbersome.

In v1 here, I added encrypted key support as well, but dropped it for v2,
because I am not in a position to justify its use. Now that you and Eric
discussed it, should I send v3 with support for both encrypted and trusted
keys like with dm-crypt or how should we proceed?

Cheers,
Ahmad

> 
> Mimi
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH v2 1/1] NAX LSM: Add initial support support
From: THOBY Simon @ 2021-08-17  8:47 UTC (permalink / raw)
  To: Igor Zhbanov, linux-integrity; +Cc: linux-security-module, Mimi Zohar
In-Reply-To: <800619f4-e9eb-7659-33ec-7cf7950e7c8d@gmail.com>

Hi Igor,

First a side-note on the patch: you didn't send it to the same lists as the v1:
you only sent it to linux-integrity but not linux-security-module and you didn't
CC Mimi Zohar either. I added them in the loop (I guess it makes sense to submit a
LSM to the linux-security-module, and I assumed this was probably an
omission/manipulation error on you part).


On 8/16/21 8:45 PM, Igor Zhbanov wrote:
> NAX (No Anonymous Execution) is a Linux Security Module that extends DAC
> by making impossible to make anonymous and modified pages executable for
> privileged processes.
> 
> The module intercepts anonymous executable pages created with mmap() and
> mprotect() system calls.
> 
> This module will log violations (in non-quiet mode), and it can also
> block the action or kill the offending process, depending on the enabled
> settings.

Consider using imperative form here. Something like:
"Intercept anonymous executable pages created with the mmap() and
protect() system calls. Log any violation, and either block the action or
kill the offending process, depending on the enabled settings."

(Per Documentation/process/submitting-patches.rst:
«Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.»)


> 
> See Documentation/admin-guide/LSM/NAX.rst.
> 
> Signed-off-by: Igor Zhbanov <izh1979@gmail.com>
> ---
>  Documentation/admin-guide/LSM/NAX.rst         |  65 +++
>  Documentation/admin-guide/LSM/index.rst       |   1 +
>  .../admin-guide/kernel-parameters.rst         |   1 +
>  .../admin-guide/kernel-parameters.txt         |  23 +
>  security/Kconfig                              |  11 +-
>  security/Makefile                             |   2 +
>  security/nax/Kconfig                          | 115 +++++
>  security/nax/Makefile                         |   4 +
>  security/nax/nax-lsm.c                        | 461 ++++++++++++++++++
>  9 files changed, 678 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/admin-guide/LSM/NAX.rst
>  create mode 100644 security/nax/Kconfig
>  create mode 100644 security/nax/Makefile
>  create mode 100644 security/nax/nax-lsm.c
> 
> diff --git a/Documentation/admin-guide/LSM/NAX.rst b/Documentation/admin-guide/LSM/NAX.rst
> new file mode 100644
> index 000000000000..35c1fc89b970
> --- /dev/null
> +++ b/Documentation/admin-guide/LSM/NAX.rst
> @@ -0,0 +1,65 @@
> +=======
> +NAX LSM
> +=======
> +
> +:Author: Igor Zhbanov
> +
> +NAX (No Anonymous Execution) is a Linux Security Module that extends DAC
> +by making impossible to make anonymous and modified pages executable for
> +processes. The module intercepts anonymous executable pages created with
> +mmap() and mprotect() system calls.
> +
> +To select it at boot time, specify ``security=nax`` (though this will
> +disable any other LSM).

Maybe just write it as "To select it at boot time, add 'nax' to the 'security' kernel parameter"?

> +
> +The following sysctl parameters are available:
> +
> +* ``kernel.nax.check_all``:
> + - 0: Check all processes.
> + - 1: Check only privileged processes. The privileged process is a process
> +      for which any of the following is true:
> +      - ``uid  == 0``
> +      - ``euid == 0``
> +      - ``suid == 0``
> +      - ``cap_effective`` has any capability except for the ones allowed
> +        in ``kernel.nax.allowed_caps``
> +      - ``cap_permitted`` has any capability except for the ones allowed
> +        in ``kernel.nax.allowed_caps``
> +
> + Checking of uid/euid/suid is important because a process may call seteuid(0)
> + to gain privileges (if SECURE_NO_SETUID_FIXUP secure bit is not set).
> +
> +* ``kernel.nax.allowed_caps``:
> +
> + Hexadecimal number representing the set of capabilities a non-root
> + process can possess without being considered "privileged" by NAX LSM.

Maybe you could give a concrete example to help users understand what
this is supposed to look like in practice and how to compute it.

> +
> +* ``kernel.nax.mode``:
> +
> + - 0: Only log errors (when enabled by ``kernel.nax.quiet``) (default mode)
> + - 1: Forbid unsafe pages mappings (and log when enabled)
> + - 2: Kill the violating process (and log when enabled)
> +
> +* ``kernel.nax.quiet``:
> +
> + - 0: Log violations (default)
> + - 1: Be quiet
> +
> +* ``kernel.nax.locked``:
> +
> + - 0: Changing of the module's sysctl parameters is allowed
> + - 1: Further changing of the module's sysctl parameters is forbidden
> +
> + Setting this parameter to ``1`` after initial setup during the system boot
> + will prevent the module disabling at the later time.
> +
> +There are matching kernel command-line parameters (with the same values):
> +
> +- ``nax_allowed_caps``
> +- ``nax_check_all``
> +- ``nax_mode``
> +- ``nax_quiet``
> +- ``nax_locked``
> +
> +The ``nax_locked`` command-line parameter must be specified last to avoid
> +premature setting locking.
> diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
> index a6ba95fbaa9f..e9df7fc9a461 100644
> --- a/Documentation/admin-guide/LSM/index.rst
> +++ b/Documentation/admin-guide/LSM/index.rst
> @@ -42,6 +42,7 @@ subdirectories.
>  
>     apparmor
>     LoadPin
> +   NAX
>     SELinux
>     Smack
>     tomoyo
> diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
> index 3996b54158bf..6d1a51612180 100644
> --- a/Documentation/admin-guide/kernel-parameters.rst
> +++ b/Documentation/admin-guide/kernel-parameters.rst
> @@ -131,6 +131,7 @@ parameter is applicable::
>  	MOUSE	Appropriate mouse support is enabled.
>  	MSI	Message Signaled Interrupts (PCI).
>  	MTD	MTD (Memory Technology Device) support is enabled.
> +	NAX	NAX support is enabled.
>  	NET	Appropriate network support is enabled.
>  	NUMA	NUMA support is enabled.
>  	NFS	Appropriate NFS support is enabled.
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 2991f6e692bd..a446d7bc05cc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3062,6 +3062,29 @@
>  
>  	n2=		[NET] SDL Inc. RISCom/N2 synchronous serial card
>  
> +	nax_allowed_caps= [NAX] Hexadecimal number representing the set of
> +			capabilities a non-root process can possess without
> +			being considered "privileged" by NAX LSM.

Same remark as above: maybe an example would be helpful.

> +
> +	nax_check_all=	[NAX] NAX LSM processes checking mode:
> +			0 - Check only privileged processes (default).
> +			1 - Check all processes.
> +
> +	nax_locked=	[NAX] NAX LSM settings' locking mode:
> +			0 - Changing NAX sysctl parameters is allowed.
> +			1 - Changing NAX sysctl parameters is forbidded until

"forbidden"

> +			    reboot.
> +
> +	nax_mode=	[NAX] NAX LSM violation reaction mode:
> +			0 - Only log errors (when not in quiet mode; default).
> +			1 - Forbid unsafe pages mappings (and log when
> +			    enabled).
> +			2 - Kill the violating process (and log when enabled).
> +
> +	nax_quiet=	[NAX] NAX LSM log verbosity:
> +			0 - Log messages to syslog.
> +			1 - Be quiet.
> +
>  	netdev=		[NET] Network devices parameters
>  			Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
>  			Note that mem_start is often overloaded to mean
> diff --git a/security/Kconfig b/security/Kconfig
> index 0ced7fd33e4d..771419647ae1 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -239,6 +239,7 @@ source "security/yama/Kconfig"
>  source "security/safesetid/Kconfig"
>  source "security/lockdown/Kconfig"
>  source "security/landlock/Kconfig"
> +source "security/nax/Kconfig"
>  
>  source "security/integrity/Kconfig"
>  
> @@ -278,11 +279,11 @@ endchoice
>  
>  config LSM
>  	string "Ordered list of enabled LSMs"
> -	default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> -	default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> -	default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> -	default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> -	default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
> +	default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> +	default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> +	default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> +	default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> +	default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
>  	help
>  	  A comma-separated list of LSMs, in initialization order.
>  	  Any LSMs left off this list will be ignored. This can be
> diff --git a/security/Makefile b/security/Makefile
> index 47e432900e24..5c261bbf4659 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -14,6 +14,7 @@ subdir-$(CONFIG_SECURITY_SAFESETID)    += safesetid
>  subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown
>  subdir-$(CONFIG_BPF_LSM)		+= bpf
>  subdir-$(CONFIG_SECURITY_LANDLOCK)	+= landlock
> +subdir-$(CONFIG_SECURITY_NAX)		+= nax
>  
>  # always enable default capabilities
>  obj-y					+= commoncap.o
> @@ -34,6 +35,7 @@ obj-$(CONFIG_SECURITY_LOCKDOWN_LSM)	+= lockdown/
>  obj-$(CONFIG_CGROUPS)			+= device_cgroup.o
>  obj-$(CONFIG_BPF_LSM)			+= bpf/
>  obj-$(CONFIG_SECURITY_LANDLOCK)		+= landlock/
> +obj-$(CONFIG_SECURITY_NAX)		+= nax/
>  
>  # Object integrity file lists
>  subdir-$(CONFIG_INTEGRITY)		+= integrity
> diff --git a/security/nax/Kconfig b/security/nax/Kconfig
> new file mode 100644
> index 000000000000..a0d22f53a9c2
> --- /dev/null
> +++ b/security/nax/Kconfig
> @@ -0,0 +1,115 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config SECURITY_NAX
> +	bool "NAX support"
> +	depends on SECURITY
> +	default n
> +	help
> +	  This selects NAX (No Anonymous Execution), which extends DAC
> +	  support with additional system-wide security settings beyond
> +	  regular Linux discretionary access controls. Currently available
> +	  is restriction to make anonymous and modified pages executable.

The last sentence could probably be rephrased as "Currently, the only
available behavior is restricting the execution of anonymous and modified
pages".

> +
> +	  The module can restrict either privileged or all processes,
> +	  depending on the settings. It is possible to convigure action,
> +	  performed when the viloation is detected (log, log + block,

A couple of typos here: "configure" and "violation".

> +	  log + kill).
> +
> +	  Like capabilities, this security module stacks with other LSMs.

I'm not quite sure this information is needed, especially as I understand the
longterm goal is to have all LSMs support stacking. I think this notice
should only be added when LSM stacking is *not* supported by the considered
LSM.

> +
> +	  Further information can be found in
> +	  Documentation/admin-guide/LSM/NAX.rst.
> +
> +	  If you are unsure how to answer this question, answer N.
> +
> +choice
> +	prompt "NAX violation action mode"
> +	default SECURITY_NAX_MODE_LOG
> +	depends on SECURITY_NAX
> +	help
> +	  Select the NAX violation action mode.
> +
> +	  In the default permissive mode the violations are only logged
> +	  (if logging is not suppressed). In the enforcing mode the violations
> +	  are prohibited. And in the kill mode the process is terminated.
> +
> +	  The value can be overridden at boot time with the kernel command-line
> +	  parameter "nax_mode=" (0, 1, 2) or "kernel.nax.mode=" (0, 1, 2)
> +	  sysctl parameter (if the settings are not locked).
> +
> +	config SECURITY_NAX_MODE_LOG
> +		bool "Permissive mode"
> +		help
> +		  In this mode violations are only logged (if logging is not
> +		  suppressed by the "kernel.nax.quiet" parameter). The
> +		  violating system call will not be prohibited.
> +	config SECURITY_NAX_MODE_ENFORCING
> +		bool "Enforcing mode"
> +		help
> +		  In this mode violations are prohibited and logged (if
> +		  logging is not suppressed by the "kernel.nax.quiet"
> +		  parameter). The violating system call will return -EACCES
> +		  error.
> +	config SECURITY_NAX_MODE_KILL
> +		bool "Kill mode"
> +		help
> +		  In this mode the violating process is terminated on the
> +		  first violation system call. The violation event is logged
> +		  (if logging is not suppressed by the "kernel.nax.quiet"
> +		  parameter).
> +endchoice
> +
> +config SECURITY_NAX_MODE
> +        int
> +        depends on SECURITY_NAX
> +        default 0 if SECURITY_NAX_MODE_LOG
> +        default 1 if SECURITY_NAX_MODE_ENFORCING
> +        default 2 if SECURITY_NAX_MODE_KILL
> +
> +config SECURITY_NAX_CHECK_ALL
> +	bool "Check all processes"
> +	depends on SECURITY_NAX
> +	help
> +	  If selected, NAX will check all processes. If not selected, NAX
> +	  will check only privileged processes (which is determined either
> +	  by having zero uid, euid, suid or fsuid; or by possessing
> +	  capabilities outside of allowed set).
> +
> +	  The value can also be overridden at boot time with the kernel
> +	  command-line parameter "nax_check_all=" (0, 1) or
> +	  "kernel.nax_check_all=" (0, 1) sysctl parameter (if the settings
> +	  are not locked).
> +
> +config SECURITY_NAX_ALLOWED_CAPS
> +	hex "Process capabilities ignored by NAX"
> +	default 0x0
> +        range 0x0 0xffffffffffff
> +	depends on SECURITY_NAX
> +	help
> +	  Hexadecimal number representing the set of capabilities
> +	  a non-root process can possess without being considered
> +	  "privileged" by NAX LSM.
> +
> +	  The value can be overridden at boot time with the command-line
> +	  parameter "nax_allowed_caps=" or "kernel.nax.allowed_caps=" sysctl
> +	  parameter (if the settings are not locked).
> +
> +config SECURITY_NAX_QUIET
> +	bool "Silence NAX messages"
> +	depends on SECURITY_NAX
> +	help
> +	  If selected, NAX will not print violations.
> +
> +	  The value can be overridden at boot with the command-line
> +	  parameter "nax_quiet=" (0, 1) or "kernel.nax_quiet=" (0, 1) sysctl
> +	  parameter (if the settings are not locked).
> +
> +config SECURITY_NAX_LOCKED
> +	bool "Lock NAX settings"
> +	depends on SECURITY_NAX
> +	help
> +	  If selected, it will not be possible to change any NAX LSM
> +	  settings via sysctl or the kernel command line.

Maybe the imperative form here too would be clearer:
"Prevent any update to the settings of the NAX LSM. This applies to both sysctl writes and the
kernel command line."

> +
> +	  If not selected, it can be enabled at boot time with the kernel
> +	  command-line parameter "nax_locked=1" or "kernel.nax_locked=1"
> +	  sysctl parameter (if the settings are not locked).
> diff --git a/security/nax/Makefile b/security/nax/Makefile
> new file mode 100644
> index 000000000000..9c3372210c77
> --- /dev/null
> +++ b/security/nax/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_SECURITY_NAX) := nax.o
> +
> +nax-y := nax-lsm.o
> diff --git a/security/nax/nax-lsm.c b/security/nax/nax-lsm.c
> new file mode 100644
> index 000000000000..b8cd7b528ba2
> --- /dev/null
> +++ b/security/nax/nax-lsm.c
> @@ -0,0 +1,461 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2016-2021 Open Mobile Platform LLC.
> + *
> + * Written by: Igor Zhbanov <i.zhbanov@omp.ru, izh1979@gmail.com>
> + *
> + * NAX (No Anonymous Execution) Linux Security Module
> + * This module prevents execution of the code in anonymous or modified pages.
> + * For more details, see Documentation/admin-guide/LSM/NAX.rst and
> + * Documentation/admin-guide/kernel-parameters.rst
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + */
> +
> +#define pr_fmt(fmt) "NAX: " fmt
> +
> +#include <linux/capability.h>
> +#include <linux/cred.h>
> +#include <linux/ctype.h>
> +#include <linux/lsm_hooks.h>
> +#include <linux/mman.h>
> +#include <linux/rcupdate.h>
> +#include <linux/sched.h>
> +#include <linux/securebits.h>
> +#include <linux/security.h>
> +#include <linux/sysctl.h>
> +#include <linux/uidgid.h>
> +
> +#define NAX_MODE_PERMISSIVE 0 /* Log only             */
> +#define NAX_MODE_ENFORCING  1 /* Enforce and log      */
> +#define NAX_MODE_KILL       2 /* Kill process and log */
> +
> +static int mode      = CONFIG_SECURITY_NAX_MODE,
> +	   quiet     = IS_ENABLED(CONFIG_SECURITY_NAX_QUIET),
> +	   locked    = IS_ENABLED(CONFIG_SECURITY_NAX_LOCKED),
> +	   check_all = IS_ENABLED(CONFIG_SECURITY_NAX_CHECK_ALL);
> +
> +#define ALLOWED_CAPS_HEX_LEN (_KERNEL_CAPABILITY_U32S * 8)
> +
> +static char allowed_caps_hex[ALLOWED_CAPS_HEX_LEN + 1];
> +static kernel_cap_t __rcu *allowed_caps;
> +
> +static bool
> +is_interesting_process(void)
> +{
> +	bool ret = false;
> +	const struct cred *cred;
> +	kuid_t root_uid;
> +	kernel_cap_t *caps;
> +
> +	if (check_all)
> +		return true;
> +
> +	cred = current_cred();
> +	root_uid = make_kuid(cred->user_ns, 0);
> +
> +	rcu_read_lock();
> +	caps = rcu_dereference(allowed_caps);
> +	/*
> +	 * We count a process as interesting if it any of its uid/euid/suid
> +	 * is zero (because it may call seteuid(0) to gain privileges) or
> +	 * it has any not allowed capability (even in a user namespace)
> +	 */
> +	if ((!issecure(SECURE_NO_SETUID_FIXUP) &&
> +	     (uid_eq(cred->uid,  root_uid) ||
> +	      uid_eq(cred->euid, root_uid) ||
> +	      uid_eq(cred->suid, root_uid))) ||
> +	    (!cap_issubset(cred->cap_effective, *caps)) ||
> +	    (!cap_issubset(cred->cap_permitted, *caps)))
> +		ret = true;
> +
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +static void
> +log_warn(const char *reason)
> +{
> +	if (quiet)
> +		return;
> +
> +	pr_warn_ratelimited("%s: pid=%d, uid=%u, comm=\"%s\"\n",
> +			    reason, current->pid,
> +			    from_kuid(&init_user_ns, current_cred()->uid),
> +				      current->comm);
> +}
> +
> +static void
> +kill_current_task(void)
> +{
> +	pr_warn("Killing pid=%d, uid=%u, comm=\"%s\"\n",
> +		current->pid, from_kuid(&init_user_ns, current_cred()->uid),
> +		current->comm);
> +	force_sig(SIGKILL);
> +}
> +
> +static int
> +nax_mmap_file(struct file *file, unsigned long reqprot,
> +	      unsigned long prot, unsigned long flags)
> +{
> +	int ret = 0;
> +
> +	if (mode == NAX_MODE_PERMISSIVE && quiet)
> +		return 0; /* Skip further checks in this case */
> +
> +	if (!(prot & PROT_EXEC)) /* Not executable memory */
> +		return 0;
> +
> +	if (!is_interesting_process())
> +		return 0; /* Not interesting processes can do anything */
> +
> +	if (!file) { /* Anonymous executable memory */
> +		log_warn("MMAP_ANON_EXEC");
> +		ret = -EACCES;
> +	} else if (prot & PROT_WRITE) { /* Mapping file RWX */
> +		log_warn("MMAP_FILE_WRITE_EXEC");
> +		ret = -EACCES;
> +	}
> +
> +	if (ret && mode == NAX_MODE_KILL)
> +		kill_current_task();
> +
> +	return (mode != NAX_MODE_PERMISSIVE) ? ret : 0;
> +}
> +
> +static int
> +nax_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot,
> +		  unsigned long prot)
> +{
> +	int ret = 0;
> +
> +	if (mode == NAX_MODE_PERMISSIVE && quiet)
> +		return 0; /* Skip further checks in this case */
> +
> +	if (!(prot & PROT_EXEC)) /* Not executable memory */
> +		return 0;
> +
> +	if (!is_interesting_process())
> +		return 0; /* Not interesting processes can do anything */
> +
> +	if (!(vma->vm_flags & VM_EXEC)) {
> +		if (vma->vm_start >= vma->vm_mm->start_brk &&
> +		    vma->vm_end   <= vma->vm_mm->brk) {
> +			log_warn("MPROTECT_EXEC_HEAP");
> +			ret = -EACCES;
> +		} else if (!vma->vm_file &&
> +			   ((vma->vm_start <= vma->vm_mm->start_stack &&
> +			     vma->vm_end   >= vma->vm_mm->start_stack) ||
> +			    vma_is_stack_for_current(vma))) {
> +			log_warn("MPROTECT_EXEC_STACK");
> +			ret = -EACCES;
> +		} else if (vma->vm_file && vma->anon_vma) {
> +			/*
> +			 * We are making executable a file mapping that has
> +			 * had some COW done. Since pages might have been
> +			 * written, check ability to execute the possibly
> +			 * modified content. This typically should only
> +			 * occur for text relocations.
> +			 */
> +			log_warn("MPROTECT_EXEC_MODIFIED");
> +			ret = -EACCES;
> +		}
> +	}
> +
> +	if (!vma->vm_file) { /* Anonymous executable memory */
> +		log_warn("MPROTECT_ANON_EXEC");
> +		ret = -EACCES;
> +	} else if (prot & PROT_WRITE) { /* Remapping file as RWX */
> +		log_warn("MPROTECT_FILE_WRITE_EXEC");
> +		ret = -EACCES;
> +	}

Maybe this if/else clause should be wrapped in an "if (!ret) {}" block.
Because I currently fear there are cases where this would log two errors
for the same file, like MPROTECT_EXEC_HEAP and then MPROTECT_ANON_EXEC.

> +
> +	if (ret && mode == NAX_MODE_KILL)
> +		kill_current_task();
> +
> +	return (mode != NAX_MODE_PERMISSIVE) ? ret : 0;
> +}
> +
> +static struct security_hook_list nax_hooks[] __lsm_ro_after_init = {
> +	LSM_HOOK_INIT(mmap_file, nax_mmap_file),
> +	LSM_HOOK_INIT(file_mprotect, nax_file_mprotect),
> +};
> +
> +static void
> +update_allowed_caps(kernel_cap_t *caps)
> +{
> +	*caps = cap_intersect(*caps, CAP_FULL_SET); /* Drop unsupported */
> +	rcu_assign_pointer(allowed_caps, caps);
> +	synchronize_rcu();

I'm happy to see this work to fix possible concurrency issues, and it seems perfectly
fine in this regard for me.
One remaining issue is that I think this leaks memory: we keep on kmalloc()-ing
memory in parse_and_set_caps and we never kfree it.
I'm terrible at RCU, but I think something like that could work:

+	kernel_cap_t *old = rcu_access_pointer(allowed_caps);
+	*caps = cap_intersect(*caps, CAP_FULL_SET); /* Drop unsupported */
+	rcu_assign_pointer(allowed_caps, caps);
+	synchronize_rcu();
+	kfree(old);


> +}
> +
> +static int
> +set_default_allowed_caps(void)
> +{
> +	size_t i;
> +	kernel_cap_t *caps;
> +
> +	caps = kmalloc(sizeof(*caps), GFP_KERNEL);
> +	if (!caps)
> +		return -ENOMEM;
> +
> +	CAP_FOR_EACH_U32(i)
> +		caps->cap[i] = (CONFIG_CONFIG_SECURITY_NAX_ALLOWED_CAPS >> (i * 8)) &
> +			       0xff;
> +
> +	update_allowed_caps(caps);
> +	return 0;
> +}
> +
> +static int
> +parse_and_set_caps(char *str)
> +{
> +	size_t len, i;
> +	kernel_cap_t *caps;
> +
> +	/* len is guaranteed not to exceed ALLOWED_CAPS_HEX_LEN */
> +	len = strlen(str);
> +	for (i = 0; i < len; i++)
> +		if (!isxdigit(str[i]))
> +			return -EINVAL;
> +
> +	caps = kmalloc(sizeof(*caps), GFP_KERNEL);
> +	if (!caps)
> +		return -ENOMEM;
> +
> +	CAP_FOR_EACH_U32(i) {
> +		unsigned long l;
> +
> +		if (kstrtoul(str + (len >= 8 ? len - 8 : 0), 16, &l))
> +			return -EINVAL;
> +
> +		caps->cap[i] = l;
> +		if (len < 8)
> +			break;
> +
> +		len -= 8;
> +		str[len] = '\0';
> +	}
> +
> +	update_allowed_caps(caps);
> +	return 0;
> +}
> +
> +#ifdef CONFIG_SYSCTL
> +
> +static int
> +nax_dointvec_minmax(struct ctl_table *table, int write,
> +		    void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	if (write && (!capable(CAP_SYS_ADMIN) || locked))
> +		return -EPERM;
> +
> +	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +}
> +
> +static int
> +nax_dostring(struct ctl_table *table, int write, void *buffer,
> +	     size_t *lenp, loff_t *ppos)
> +{
> +	int ret;
> +
> +	if (write) { /* A user is setting the allowed capabilities */
> +		int error;
> +		char *buf = (char *)buffer;
> +		size_t len = *lenp;
> +
> +		if (!capable(CAP_SYS_ADMIN) || locked)
> +			return -EPERM;
> +
> +		/* Do not allow trailing garbage or excessive length */
> +		if (len == ALLOWED_CAPS_HEX_LEN + 1) {
> +			if (buf[--len] != '\n')
> +				return -EINVAL;
> +		} else if (len > ALLOWED_CAPS_HEX_LEN || len <= 0) {
> +			return -EINVAL;
> +		}
> +
> +		error = proc_dostring(table, write, buffer, lenp, ppos);
> +		if (error)
> +			return error;
> +
> +		ret = parse_and_set_caps(allowed_caps_hex);
> +	} else { /* A user is getting the allowed capabilities */
> +		unsigned int i;
> +		kernel_cap_t *caps;
> +
> +		rcu_read_lock();
> +		caps = rcu_dereference(allowed_caps);
> +		CAP_FOR_EACH_U32(i)
> +			snprintf(allowed_caps_hex + i * 8, 9, "%08x",
> +				 caps->cap[CAP_LAST_U32 - i]);
> +
> +		rcu_read_unlock();
> +		ret = proc_dostring(table, write, buffer, lenp, ppos);
> +	}
> +
> +	return ret;
> +}
> +
> +struct ctl_path nax_sysctl_path[] = {
> +	{ .procname = "kernel" },
> +	{ .procname = "nax"    },
> +	{ }
> +};
> +
> +static int max_mode = NAX_MODE_KILL;
> +
> +static struct ctl_table nax_sysctl_table[] = {
> +	{
> +		.procname     = "allowed_caps",
> +		.data         = allowed_caps_hex,
> +		.maxlen       = ALLOWED_CAPS_HEX_LEN + 1,
> +		.mode         = 0644,
> +		.proc_handler = nax_dostring,
> +	}, {
> +		.procname     = "check_all",
> +		.data         = &check_all,
> +		.maxlen       = sizeof(int),
> +		.mode         = 0644,
> +		.proc_handler = nax_dointvec_minmax,
> +		.extra1       = SYSCTL_ZERO,
> +		.extra2       = SYSCTL_ONE,
> +	}, {
> +		.procname     = "locked",
> +		.data         = &locked,
> +		.maxlen       = sizeof(int),
> +		.mode         = 0644,
> +		.proc_handler = nax_dointvec_minmax,
> +		.extra1       = SYSCTL_ZERO,
> +		.extra2       = SYSCTL_ONE,
> +	}, {
> +		.procname     = "mode",
> +		.data         = &mode,
> +		.maxlen       = sizeof(int),
> +		.mode         = 0644,
> +		.proc_handler = nax_dointvec_minmax,
> +		.extra1       = SYSCTL_ZERO,
> +		.extra2       = &max_mode,
> +	}, {
> +		.procname     = "quiet",
> +		.data         = &quiet,
> +		.maxlen       = sizeof(int),
> +		.mode         = 0644,
> +		.proc_handler = nax_dointvec_minmax,
> +		.extra1       = SYSCTL_ZERO,
> +		.extra2       = SYSCTL_ONE,
> +	},
> +	{ }
> +};
> +
> +static void __init
> +nax_init_sysctl(void)
> +{
> +	if (!register_sysctl_paths(nax_sysctl_path, nax_sysctl_table))
> +		panic("NAX: sysctl registration failed.\n");
> +}
> +
> +#else /* !CONFIG_SYSCTL */
> +
> +static inline void
> +nax_init_sysctl(void)
> +{
> +
> +}
> +
> +#endif /* !CONFIG_SYSCTL */
> +
> +static int __init setup_allowed_caps(char *str)
> +{
> +	if (locked)
> +		return 1;
> +
> +	/* Do not allow trailing garbage or excessive length */
> +	if (strlen(str) > ALLOWED_CAPS_HEX_LEN) {
> +		pr_err("Invalid 'nax_allowed_caps' parameter value (%s)\n",
> +		       str);
> +		return 1;
> +	}
> +
> +	strscpy(allowed_caps_hex, str, sizeof(allowed_caps_hex));
> +	if (parse_and_set_caps(allowed_caps_hex))
> +		pr_err("Invalid 'nax_allowed_caps' parameter value (%s)\n",
> +		       str);
> +
> +	return 1;
> +}
> +__setup("nax_allowed_caps=", setup_allowed_caps);
> +
> +static int __init setup_check_all(char *str)
> +{
> +	unsigned long val;
> +
> +	if (!locked && !kstrtoul(str, 0, &val))
> +		check_all = val ? 1 : 0;
> +
> +	return 1;
> +}
> +__setup("nax_quiet=", setup_check_all);
> +
> +static int __init setup_locked(char *str)
> +{
> +	unsigned long val;
> +
> +	if (!locked && !kstrtoul(str, 0, &val))
> +		locked = val ? 1 : 0;
> +
> +	return 1;
> +}
> +__setup("nax_locked=", setup_locked);
> +
> +static int __init setup_mode(char *str)
> +{
> +	unsigned long val;
> +
> +	if (!locked && !kstrtoul(str, 0, &val)) {
> +		if (val > max_mode) {
> +			pr_err("Invalid 'nax_mode' parameter value (%s)\n",
> +			       str);
> +			val = max_mode;
> +		}
> +
> +		mode = val;
> +	}
> +
> +	return 1;
> +}
> +__setup("nax_mode=", setup_mode);
> +
> +static int __init setup_quiet(char *str)
> +{
> +	unsigned long val;
> +
> +	if (!locked && !kstrtoul(str, 0, &val))
> +		quiet = val ? 1 : 0;
> +
> +	return 1;
> +}
> +__setup("nax_quiet=", setup_quiet);
> +
> +static __init int
> +nax_init(void)
> +{
> +	int rc;
> +
> +	pr_info("Starting.\n");
> +	rc = set_default_allowed_caps();
> +	if (rc < 0)
> +		return rc;
> +
> +	security_add_hooks(nax_hooks, ARRAY_SIZE(nax_hooks), "nax");
> +	nax_init_sysctl();
> +
> +	return 0;
> +}
> +
> +DEFINE_LSM(nax) = {
> +	.name = "nax",
> +	.init = nax_init,
> +};
> 

Thanks,
Simon

^ permalink raw reply

* Re: [PATCH v28 22/25] Audit: Add record for multiple process LSM attributes
From: Paul Moore @ 2021-08-16 18:57 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: casey.schaufler, James Morris, linux-security-module, selinux,
	linux-audit, keescook, john.johansen, penguin-kernel,
	Stephen Smalley, linux-kernel, netdev
In-Reply-To: <3ebad75f-1887-bb31-db23-353bfc9c0b4a@schaufler-ca.com>

On Fri, Aug 13, 2021 at 5:47 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 8/13/2021 1:43 PM, Paul Moore wrote:
> > On Fri, Aug 13, 2021 at 2:48 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >> On 8/13/2021 8:31 AM, Paul Moore wrote:
> >>> On Thu, Aug 12, 2021 at 6:38 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>>> On 8/12/2021 1:59 PM, Paul Moore wrote:
> >>>>> On Wed, Jul 21, 2021 at 9:12 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>>>>> Create a new audit record type to contain the subject information
> >>>>>> when there are multiple security modules that require such data.
> >>> ...
> >>>
> >>>>> The local
> >>>>> audit context is a hack that is made necessary by the fact that we
> >>>>> have to audit things which happen outside the scope of an executing
> >>>>> task, e.g. the netfilter audit hooks, it should *never* be used when
> >>>>> there is a valid task_struct.
> >>>> In the existing audit code a "current context" is only needed for
> >>>> syscall events, so that's the only case where it's allocated. Would
> >>>> you suggest that I track down the non-syscall events that include
> >>>> subj= fields and add allocate a "current context" for them? I looked
> >>>> into doing that, and it wouldn't be simple.
> >>> This is why the "local context" was created.  Prior to these stacking
> >>> additions, and the audit container ID work, we never needed to group
> >>> multiple audit records outside of a syscall context into a single
> >>> audit event so passing a NULL context into audit_log_start() was
> >>> reasonable.  The local context was designed as a way to generate a
> >>> context for use in a local function scope to group multiple records,
> >>> however, for reasons I'll get to below I'm now wondering if the local
> >>> context approach is really workable ...
> >> I haven't found a place where it didn't work. What is the concern?
> > The concern is that use of a local context can destroy any hopes of
> > linking with other related records, e.g. SYSCALL and PATH records, to
> > form a single cohesive event.  If the current task_struct is valid for
> > a given function invocation then we *really* should be using current's
> > audit_context.
> >
> > However, based on our discussion here it would seem that we may have
> > some issues where current->audit_context is not being managed
> > correctly.  I'm not surprised, but I will admit to being disappointed.
>
> I'd believe that with syscall audit being a special case for other reasons
> the multiple record situation got taken care of on a case-by-case basis
> and no one really paid much attention to generality. It's understandable.
>
> >>> What does your audit config look like?  Both the kernel command line
> >>> and the output of 'auditctl -l' would be helpful.
> >> On the fedora system:
> >>
> >> BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-rc5stack+
> >> root=/dev/mapper/fedora-root ro resume=/dev/mapper/fedora-swap
> >> rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap lsm.debug
> >>
> >> -a always,exit -F arch=b64 -S bpf -F key=testsuite-1628714321-EtlWIphW
> >>
> >> On the Ubuntu system:
> >>
> >> BOOT_IMAGE=/boot/vmlinuz-5.14.0-rc1stack+
> >> root=UUID=39c25777-d413-4c2e-948c-dfa2bf259049 ro lsm.debug
> >>
> >> No rules
> > The Fedora system looks to have some audit-testsuite leftovers, but
> > that shouldn't have an impact on what we are discussing; in both cases
> > I would expect current->audit_context to be allocated and non-NULL.
>
> As would I.
>
>
> >>> I'm beginning to suspect that you have the default
> >>> we-build-audit-into-the-kernel-because-product-management-said-we-have-to-but-we-don't-actually-enable-it-at-runtime
> >>> audit configuration that is de rigueur for many distros these days.
> >> Yes, but I've also fiddled about with it so as to get better event coverage.
> >> I've run the audit-testsuite, which has got to fiddle about with the audit
> >> configuration.
> > Yes, it looks like my hunch was wrong.
> >
> >>> If that is the case, there are many cases where you would not see a
> >>> NULL current->audit_context simply because the config never allocated
> >>> one, see kernel/auditsc.c:audit_alloc().
> >> I assume you mean that I *would* see a NULL current->audit_context
> >> in the "event not enabled" case.
> > Yep, typo.
> >
> >>> Regardless, assuming that is the case we probably need to find an
> >>> alternative to the local context approach as it currently works.  For
> >>> reasons we already talked about, we don't want to use a local
> >>> audit_context if there is the possibility for a proper
> >>> current->audit_context, but we need to do *something* so that we can
> >>> group these multiple events into a single record.
> >> I tried a couple things, but neither was satisfactory.
> >>
> >>> Since this is just occurring to me now I need a bit more time to think
> >>> on possible solutions - all good ideas are welcome - but the first
> >>> thing that pops into my head is that we need to augment
> >>> audit_log_end() to potentially generated additional, associated
> >>> records similar to what we do on syscall exit in audit_log_exit().
> >> I looked into that. You need a place to save the timestamp
> >> that doesn't disappear. That's the role the audit_context plays
> >> now.
> > Yes, I've spent a few hours staring at the poorly planned struct that
> > is audit_context ;)
> >
> > Regardless, the obvious place for such a thing is audit_buffer; we can
> > stash whatever we need in there.
>
> I had considered doing that, but was afraid that moving the timestamp
> out of the audit_context might have dire consequences.

Don't move, copy.  If there is a valid context one would stash the
timestamp there, if not, we stash it in the audit_buffer.  However,
before we start messing with that too much I would like to better
understand why we aren't seeing a valid audit_context in the netlink
case at the very least.

> >>>  Of
> >>> course the audit_log_end() changes would be much more limited than
> >>> audit_log_exit(), just the LSM subject and audit container ID info,
> >>> and even then we might want to limit that to cases where the ab->ctx
> >>> value is NULL and let audit_log_exit() handle it otherwise.  We may
> >>> need to store the event type in the audit_buffer during
> >>> audit_log_start() so that we can later use that in audit_log_end() to
> >>> determine what additional records are needed.
> >>>
> >>> Regardless, let's figure out why all your current->audit_context
> >>> values are NULL
> >> That's what's maddening, and why I implemented audit_alloc_for_lsm().
> >> They aren't all NULL. Sometimes current->audit_context is NULL,
> >> sometimes it isn't, for the same event. I thought it might be a
> >> question of the netlink interface being treated specially, but
> >> that doesn't explain all the cases.
> >
> > Your netlink changes are exactly what made me think, "this is
> > obviously wrong", but now I'm wondering if a previously held
> > assumption of "current is valid and points to the calling process" in
> > the case of the kernel servicing netlink messages sent from userspace.
>
> If that's the case the subject data in the audit record is going
> to be bogus. From what I've seen that data appears to be correct.

Yeah, the thought occurred to me, but we are clearly already in the
maybe-the-assumptions-are-wrong stage so I'm not going to rely on that
being 100%.  We definitely need to track this down before we start
making to many more guesses about what is working and what is not.

> > Or rather, perhaps that assumption is still true but something is
> > causing current->audit_context to be NULL in that case.
>
> I can imagine someone deciding not to set up audit_context in
> situations like netlink because they knew that nothing following
> that would be a syscall event.

*If* the user/kernel transition happens as part of the netlink socket
send/write/etc. syscall then it *should* have all of the audit syscall
setup already done ... however, see my earlier comments about
assumptions :/

> I've been looking into the audit
> userspace and there are assumptions like that all over the place.

I've made my feelings about audit's design known quite a bit already
so I'm not going to drag all of that back up, all I'll say is that I
believe the audit design was tragically and inherently flawed in many
ways.  We've been working to resolve some of those issues in the
kernel for a little while now, but the audit userspace remains rooted
in some of those original design decisions.  Of course, these are just
my opinions, others clearly feel differently.

Regardless, and somewhat independent of our discussion here, now that
I am back to being able to dedicate a good chunk of my time to
upstream efforts, one of my priorities is to start repairing audit ...
however, I need to get past the io_uring mess first.

> > Friday the 13th indeed.
>
> I've been banging my head against this for a couple months.
> My biggest fear is that I may have learned enough about the
> audit system to make useful contributions.

As the usual refrain goes, "patches are always welcome" ... and I say
that with equal parts honesty and warning :)

Even if you aren't comfortable putting together a patch, simply
root-causing the missing audit_context setup in the netlink case would
be helpful.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 3/3] virt: Add sev_secret module to expose confidential computing secrets
From: Ard Biesheuvel @ 2021-08-16  9:56 UTC (permalink / raw)
  To: Andrew Scull
  Cc: Dov Murik, linux-efi, Borislav Petkov, Ashish Kalra,
	Brijesh Singh, Tom Lendacky, James Morris, Serge E. Hallyn,
	Andi Kleen, Dr. David Alan Gilbert, James Bottomley,
	Tobin Feldman-Fitzthum, Jim Cadden, linux-coco,
	linux-security-module, Linux Kernel Mailing List
In-Reply-To: <YRZuIIVIzMfgjtEl@google.com>

On Fri, 13 Aug 2021 at 15:05, Andrew Scull <ascull@google.com> wrote:
>
> On Mon, Aug 09, 2021 at 07:01:57PM +0000, Dov Murik wrote:
> > The new sev_secret module exposes the confidential computing (coco)
> > secret area via securityfs interface.
> >
> > When the module is loaded (and securityfs is mounted, typically under
> > /sys/kernel/security), a "coco/sev_secret" directory is created in
> > securityfs.  In it, a file is created for each secret entry.  The name
> > of each such file is the GUID of the secret entry, and its content is
> > the secret data.
> >
> > This allows applications running in a confidential computing setting to
> > read secrets provided by the guest owner via a secure secret injection
> > mechanism (such as AMD SEV's LAUNCH_SECRET command).
> >
> > Removing (unlinking) files in the "coco/sev_secret" directory will zero
> > out the secret in memory, and remove the filesystem entry.  If the
> > module is removed and loaded again, that secret will not appear in the
> > filesystem.
>
> We've also been looking into a similar secret mechanism recently in the
> context of Android and protected KVM [1]. Our secrets would come from a
> different source, likely described as a reserved-memory node in the DT,
> but would need to be exposed to userspace in the same way as the SEV
> secrets. Originally I tried using a character device, but this approach
> with securityfs feels neater to me.
>

Agreed. I particularly like how deleting the file wipes the secret from memory.

> We're also looking to pass secrets from the bootloader to Linux, outside
> of any virtualization or confidential compute context (at least a far as
> I have understood the meaning of the term). Again, this feels like it
> would be exposed to userspace in the same way.
>

Indeed.

> It would be good to be able to share the parts that would be common. I
> expect that would mean the operations for a secret file and for a
> directory of secrets at a minimum. But it might also influence the paths
> in securityfs; I see, looking back, that the "coco" directory was added
> since the RFC but would a generalized "secret" subsystem make sense? Or
> would it be preferable for each case to define their own path?
>

I think we should avoid 'secret', to be honest. Even if protected KVM
is not riding the SEV/TDX wave, I think confidential computing is
still an accurate description of its semantics.

> [1] -- https://lwn.net/Articles/836693/
>
> > +static int sev_secret_unlink(struct inode *dir, struct dentry *dentry)
> > +{
> > +     struct sev_secret *s = sev_secret_get();
> > +     struct inode *inode = d_inode(dentry);
> > +     struct secret_entry *e = (struct secret_entry *)inode->i_private;
> > +     int i;
> > +
> > +     if (e) {
> > +             /* Zero out the secret data */
> > +             memzero_explicit(e->data, secret_entry_data_len(e));
>
> Would there be a benefit in flushing these zeros?
>

Do you mean cache clean+invalidate? Better to be precise here.


> > +             e->guid = NULL_GUID;
> > +     }
> > +
> > +     inode->i_private = NULL;
> > +
> > +     for (i = 0; i < SEV_SECRET_NUM_FILES; i++)
> > +             if (s->fs_files[i] == dentry)
> > +                     s->fs_files[i] = NULL;
> > +
> > +     /*
> > +      * securityfs_remove tries to lock the directory's inode, but we reach
> > +      * the unlink callback when it's already locked
> > +      */
> > +     inode_unlock(dir);
> > +     securityfs_remove(dentry);
> > +     inode_lock(dir);
> > +
> > +     return 0;
> > +}

^ permalink raw reply

* Re: [PATCH 1/1] NAX LSM: Add initial support support
From: THOBY Simon @ 2021-08-16  7:39 UTC (permalink / raw)
  To: Igor Zhbanov
  Cc: Igor Zhbanov, linux-integrity, linux-security-module, Mimi Zohar
In-Reply-To: <CAEUiM9N-ELgr2bPvn=5P4NR8wQAKwkZ6tGFZAE8wjH16sWPZVg@mail.gmail.com>

Hi Igor,

On 8/14/21 3:39 PM, Igor Zhbanov wrote:
> Hi Simon,
> 
> пт, 13 авг. 2021 г. в 11:08, THOBY Simon <Simon.THOBY@viveris.fr>:
>> For the matter of have a kernel commandline being the result of concatenations from multiple
>> sources, I think that if any attacker is able to alter part of the command line, they can
>> already write 'lsm=' to it and completely disable NAX, thus I'm not sure 'nax_locked' should
>> impact other setup_* functions.
>>
>> I believe keeping the nax_locked parameter, but not checking for the 'locked' status in the other setup_*
>> functions should be enought, as sysctls writes will still be protected by the 'locked' variable.
> 
> I thought again about it. Currently it is possible to set parameters
> value in Kconfig, including "locked".
> So, if one needed some static configuration, that cannot be altered by
> any means, they can set
> the desired values at compilation time in Kconfig and it will be
> impossible to change it, nor by sysctl,
> nor by command-line.
> 
> But if I remove that (!locked) check, then the command-line options
> would alway be able to override
> the compile-time configuration, including unlocking the locked state.

That's a fair point, one way would probably be to replace "!locked" by
"!IS_ENABLED(CONFIG_SECURITY_NAX_LOCKED)" in the setup_*() functions, keeping
the compile-time override while preventing the commandline parameter ordering
issue we discussed.

However at this point I understand that you may find the current 'locked' usage easier,
and this whole discussion is probably nitpicking on my part.

> 
> Thank you.
> 

Thanks,
Simon

^ permalink raw reply

* Re: [PATCH 1/1] NAX LSM: Add initial support support
From: THOBY Simon @ 2021-08-16  7:31 UTC (permalink / raw)
  To: Igor Zhbanov
  Cc: Igor Zhbanov, linux-integrity, linux-security-module, Mimi Zohar
In-Reply-To: <CAEUiM9NnimMa2V2xN80kMox9gyOtUqhtPV1OPK3Ea83+JkxPpQ@mail.gmail.com>

Hi Igor,

On 8/13/21 10:10 PM, Igor Zhbanov wrote:
> Hi Simon,
> 
> ср, 28 июл. 2021 г. в 13:19, THOBY Simon <Simon.THOBY@viveris.fr>:
>> The kernel style guide mandates that all variables are only defined at the beggining of the
>> function, and not at the beggining of any block like C89.
> 
> I've checked the kernel coding style and couldn't find anything about
> variables definition
> place. Could you, please, point me to the requirement?

After trying (without results) to find it in the kernel coding style, it seems I have
accepted a maintainer preference as the official style guide without checking:
https://lore.kernel.org/linux-integrity/bd785a9d0c029cec34514d306e110bdef945c13e.camel@linux.ibm.com/
(see "Move the variable definition to the beginning of the function.")

I'm sorry about that, and for the time you may have lost trying to locate it in the style guide, which
doesn't say (as far as I could see) anything about variable declarations.

> 
> Scoping variables declaration makes the code better and more secure.

Not arguing, I agree with you on the benefits for maintainability and readability of
keeping variable declaration and use localized (which is also why I think
'allowed_caps_hex' would be better as a local variable).

> Besides, I've checked the kernel sources and see many examples of that.
> 
> Thank you.
> 

Sorry again,
Simon

^ permalink raw reply

* Re: apparmor: global buffers spin lock may get contended
From: John Johansen @ 2021-08-15  9:47 UTC (permalink / raw)
  To: Sergey Senozhatsky, Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Tomasz Figa, linux-kernel, linux-security-module
In-Reply-To: <YO2S+C7Cw7AS7bsg@google.com>

On 7/13/21 6:19 AM, Sergey Senozhatsky wrote:
> Hi,
> 
> We've notices that apparmor has switched from using per-CPU buffer pool
> and per-CPU spin_lock to a global spin_lock in df323337e507a0009d3db1ea.
> 
> This seems to be causing some contention on our build machines (with
> quite a bit of cores). Because that global spin lock is a part of the
> stat() sys call (and perhaps some other)
> 
> E.g.
> 
> -    9.29%     0.00%  clang++          [kernel.vmlinux]                        
>    - 9.28% entry_SYSCALL_64_after_hwframe                                      
>       - 8.98% do_syscall_64                                                    
>          - 7.43% __do_sys_newlstat                                            
>             - 7.43% vfs_statx                                                  
>                - 7.18% security_inode_getattr                                  
>                   - 7.15% apparmor_inode_getattr                              
>                      - aa_path_perm                                            
>                         - 3.53% aa_get_buffer                                  
>                            - 3.47% _raw_spin_lock                              
>                                 3.44% native_queued_spin_lock_slowpath        
>                         - 3.49% aa_put_buffer.part.0                          
>                            - 3.45% _raw_spin_lock                              
>                                 3.43% native_queued_spin_lock_slowpath   
> 
> Can we fix this contention?
> 

sorry this got filtered to a wrong mailbox. Yes this is something that can
be improved, and was a concern when the switch was made from per-CPU buffers
to the global pool.

We can look into doing a hybrid approach where we can per cpu cache a buffer
from the global pool. The trick will be coming up with when the cached buffer
can be returned so we don't run into the problems that lead to
df323337e507a0009d3db1ea

^ permalink raw reply

* Re: [PATCH 1/1] NAX LSM: Add initial support support
From: Igor Zhbanov @ 2021-08-14 13:39 UTC (permalink / raw)
  To: THOBY Simon
  Cc: Igor Zhbanov, linux-integrity, linux-security-module, Mimi Zohar
In-Reply-To: <a41a0116-8d05-3aea-d979-090cdbb1d52f@viveris.fr>

Hi Simon,

пт, 13 авг. 2021 г. в 11:08, THOBY Simon <Simon.THOBY@viveris.fr>:
> For the matter of have a kernel commandline being the result of concatenations from multiple
> sources, I think that if any attacker is able to alter part of the command line, they can
> already write 'lsm=' to it and completely disable NAX, thus I'm not sure 'nax_locked' should
> impact other setup_* functions.
>
> I believe keeping the nax_locked parameter, but not checking for the 'locked' status in the other setup_*
> functions should be enought, as sysctls writes will still be protected by the 'locked' variable.

I thought again about it. Currently it is possible to set parameters
value in Kconfig, including "locked".
So, if one needed some static configuration, that cannot be altered by
any means, they can set
the desired values at compilation time in Kconfig and it will be
impossible to change it, nor by sysctl,
nor by command-line.

But if I remove that (!locked) check, then the command-line options
would alway be able to override
the compile-time configuration, including unlocking the locked state.

Thank you.

^ permalink raw reply

* Re: [PATCH 0/1] ima: check control characters in policy path
From: James Bottomley @ 2021-08-14 12:47 UTC (permalink / raw)
  To: Tianxing Zhang, zohar
  Cc: linux-integrity, linux-security-module, linux-kernel
In-Reply-To: <20210814081356.293-1-anakinzhang96@gmail.com>

On Sat, 2021-08-14 at 16:13 +0800, Tianxing Zhang wrote:
> Hi,
> 
> IMA policy can be updated with /sys/kernel/security/ima/policy
> interface when CONFIG_IMA_WRITE_POLICY is set. However, kernel does
> not check the file path carefully. It only checks if the path has '/'
> prefix.
> 
> When a policy file path contains control characters like '\r' or
> '\b', invalid error messages can be printed to overwrite system
> messages.

This doesn't sound like a good idea: filesystems accept control
characters in names, so the IMA file policy has to be able to specify
them.  We can debate whether filesystems should do this, but while they
do IMA has to as well.

> For example:
> 
> $ echo -e "/\rtest invalid path: ddddddddddddddddddddd" >
> /sys/kernel/security/ima/policy
> $ dmesg
> test invalid path: ddddddddddddddddddddd (-2) 
> 
> After adding this patch, we'll be able to throw out error message:
> 
> $ echo -e "/\rtest invalid path: ddddddddddddddddddddd" >
> /sys/kernel/security/ima/policy
> -bash: echo: write error: Invalid argument
> $ dmesg
> [   11.684004] ima: invalid path (control characters are not allowed)
> [   11.684071] ima: policy update failed
> 
> Any suggestions would be appreciated, thank you.

I don't quite understand what you think the problem is.  Only root can
write IMA policies so no-one other than a legitimate administrator can
use bogus paths like the above.  If the problem is producing a bogus
log message, we do have several IMA messages that print out
measured/appraised file names ... they would be vulnerable to this
since a generic user could have created them with control character
containg file names, and your proposed patch wouldn't fix that.

Wouldn't a better solution be to have a file name print that expands
the unprintable characters?

James

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox