Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH v7 3/7] exec: Move path_noexec() check earlier
From: Mickaël Salaün @ 2020-08-13 15:31 UTC (permalink / raw)
  To: Eric W. Biederman, Kees Cook
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk, Mimi Zohar,
	Philippe Trébuchet, Scott Shell, Sean Christopherson,
	Shuah Khan, Steve Dower, Steve Grubb, Tetsuo Handa,
	Thibaut Sautereau, Vincent Strubel, kernel-hardening, linux-api,
	linux-integrity, linux-security-module, linux-fsdevel
In-Reply-To: <87a6z1m0u1.fsf@x220.int.ebiederm.org>

Kees Cook wrote this patch, which is in Andrew Morton's tree, but I
think you're talking about O_MAYEXEC, not this patch specifically.

On 11/08/2020 21:36, Eric W. Biederman wrote:
> Mickaël Salaün <mic@digikod.net> writes:
> 
>> From: Kees Cook <keescook@chromium.org>
>>
>> The path_noexec() check, like the regular file check, was happening too
>> late, letting LSMs see impossible execve()s. Check it earlier as well
>> in may_open() and collect the redundant fs/exec.c path_noexec() test
>> under the same robustness comment as the S_ISREG() check.
>>
>> My notes on the call path, and related arguments, checks, etc:
> 
> A big question arises, that I think someone already asked.

Al Viro and Jann Horn expressed such concerns for O_MAYEXEC:
https://lore.kernel.org/lkml/0cc94c91-afd3-27cd-b831-8ea16ca8ca93@digikod.net/

> 
> Why perform this test in may_open directly instead of moving
> it into inode_permission.  That way the code can be shared with
> faccessat, and any other code path that wants it?

This patch is just a refactoring.

About O_MAYEXEC, path-based LSM, IMA and IPE need to work on a struct
file, whereas inode_permission() only gives a struct inode. However,
faccessat2(2) (with extended flags) seems to be the perfect candidate if
we want to be able to check file descriptors.

> 
> That would look to provide a more maintainable kernel.

Why would it be more maintainable?

> 
> Eric
> 
> 
>> do_open_execat()
>>     struct open_flags open_exec_flags = {
>>         .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
>>         .acc_mode = MAY_EXEC,
>>         ...
>>     do_filp_open(dfd, filename, open_flags)
>>         path_openat(nameidata, open_flags, flags)
>>             file = alloc_empty_file(open_flags, current_cred());
>>             do_open(nameidata, file, open_flags)
>>                 may_open(path, acc_mode, open_flag)
>>                     /* new location of MAY_EXEC vs path_noexec() test */
>>                     inode_permission(inode, MAY_OPEN | acc_mode)
>>                         security_inode_permission(inode, acc_mode)
>>                 vfs_open(path, file)
>>                     do_dentry_open(file, path->dentry->d_inode, open)
>>                         security_file_open(f)
>>                         open()
>>     /* old location of path_noexec() test */
>>
>> Signed-off-by: Mickaël Salaün <mic@digikod.net>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> Link: https://lore.kernel.org/r/20200605160013.3954297-4-keescook@chromium.org
>> ---
>>  fs/exec.c  | 12 ++++--------
>>  fs/namei.c |  4 ++++
>>  2 files changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index bdc6a6eb5dce..4eea20c27b01 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -147,10 +147,8 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
>>  	 * and check again at the very end too.
>>  	 */
>>  	error = -EACCES;
>> -	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)))
>> -		goto exit;
>> -
>> -	if (path_noexec(&file->f_path))
>> +	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
>> +			 path_noexec(&file->f_path)))
>>  		goto exit;
>>  
>>  	fsnotify_open(file);
>> @@ -897,10 +895,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
>>  	 * and check again at the very end too.
>>  	 */
>>  	err = -EACCES;
>> -	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)))
>> -		goto exit;
>> -
>> -	if (path_noexec(&file->f_path))
>> +	if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
>> +			 path_noexec(&file->f_path)))
>>  		goto exit;
>>  
>>  	err = deny_write_access(file);
>> diff --git a/fs/namei.c b/fs/namei.c
>> index a559ad943970..ddc9b25540fe 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -2863,6 +2863,10 @@ static int may_open(const struct path *path, int acc_mode, int flag)
>>  			return -EACCES;
>>  		flag &= ~O_TRUNC;
>>  		break;
>> +	case S_IFREG:
>> +		if ((acc_mode & MAY_EXEC) && path_noexec(path))
>> +			return -EACCES;
>> +		break;
>>  	}
>>  
>>  	error = inode_permission(inode, MAY_OPEN | acc_mode);

^ permalink raw reply

* Re: [PATCH 0/3] Smack: Use the netlbl incoming cache
From: Casey Schaufler @ 2020-08-13 16:36 UTC (permalink / raw)
  To: Paul Moore; +Cc: SMACK-discuss, Casey Schaufler, linux-security-module
In-Reply-To: <CAHC9VhTq66h1LsNBuhXG6WFiKn7hCBApciG6sVawW=jDwqDDWg@mail.gmail.com>

On 8/11/2020 7:10 PM, Paul Moore wrote:
> On Tue, Aug 11, 2020 at 8:39 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> Update the Smack security module to use the Netlabel cache
>> mechanism to speed the processing of incoming labeled packets.
>> There is some refactoring of the existing code that makes it
>> simpler, and reduces duplication. The outbound packet labeling
>> is also optimized to track the labeling state of the socket.
>> Prior to this the socket label was redundantly set on each
>> packet send.
>>
>> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
>> ---
>>  security/smack/smack.h        |  19 ++--
>>  security/smack/smack_access.c |  55 ++++++----
>>  security/smack/smack_lsm.c    | 245 ++++++++++++++++++++++++------------------
>>  security/smack/smackfs.c      |  23 ++--
>>  4 files changed, 193 insertions(+), 149 deletions(-)
> FWIW, I gave this a cursory look just now and the NetLabel usage
> seemed reasonable.  Out of curiosity, have you done any before/after
> performance tests?

It's early in the benchmark process, but TCP looks to be about 6% better.
UDP numbers should be better. I'm not expecting the level of improvement
SELinux saw because the label mapping from CIPSO isn't as sophisticated
for Smack as it is for SELinux.

>   It was quite significant when we adopted it in
> SELinux, but that was some time ago, it would be nice to know that it
> is still working well and hasn't been invalidated by some other,
> unrelated change.
>

^ permalink raw reply

* [PATCH 0/2] SELinux: Measure state and hash of policy using IMA
From: Lakshmi Ramasubramanian @ 2020-08-13 17:07 UTC (permalink / raw)
  To: zohar, stephen.smalley.work, casey
  Cc: tyhicks, tusharsu, sashal, jmorris, linux-integrity, selinux,
	linux-security-module, linux-kernel

Critical data structures of security modules are currently not measured.
Therefore an attestation service, for instance, would not be able to
attest whether the security modules are always operating with the policies
and configuration that the system administrator had setup. The policies
and configuration for the security modules could be tampered with by
malware by exploiting kernel vulnerabilities or modified through some
inadvertent actions on the system. Measuring such critical data would
enable an attestation service to better assess the state of the system.

IMA subsystem measures system files, command line arguments passed to
kexec, boot aggregate, keys, etc. It can be used to measure critical
data structures of security modules as well.

This change aims to address measuring critical data structures of
SELinux when they are initialized and when they are updated at runtime.

This series is based on commit 3db0d0c276a7 ("integrity: remove
redundant initialization of variable ret") in next-integrity

This series also has a dependency on the following patch series and
should be applied in the following order:
 https://patchwork.kernel.org/patch/11709527/
 https://patchwork.kernel.org/patch/11711249/


Lakshmi Ramasubramanian (2):
  IMA: Handle early boot data measurement
  SELinux: Measure state and hash of policy using IMA

 security/integrity/ima/Kconfig               |   5 +-
 security/integrity/ima/Makefile              |   2 +-
 security/integrity/ima/ima.h                 |  37 ++--
 security/integrity/ima/ima_asymmetric_keys.c |   6 +-
 security/integrity/ima/ima_init.c            |   2 +-
 security/integrity/ima/ima_main.c            |   9 +
 security/integrity/ima/ima_policy.c          |   2 +-
 security/integrity/ima/ima_queue_data.c      | 190 +++++++++++++++++
 security/integrity/ima/ima_queue_keys.c      | 174 ----------------
 security/selinux/Makefile                    |   2 +
 security/selinux/hooks.c                     |   1 +
 security/selinux/include/security.h          |  15 ++
 security/selinux/measure.c                   | 204 +++++++++++++++++++
 security/selinux/selinuxfs.c                 |   3 +
 security/selinux/ss/services.c               |  71 ++++++-
 15 files changed, 516 insertions(+), 207 deletions(-)
 create mode 100644 security/integrity/ima/ima_queue_data.c
 delete mode 100644 security/integrity/ima/ima_queue_keys.c
 create mode 100644 security/selinux/measure.c

-- 
2.28.0


^ permalink raw reply

* [PATCH 2/2] SELinux: Measure state and hash of policy using IMA
From: Lakshmi Ramasubramanian @ 2020-08-13 17:07 UTC (permalink / raw)
  To: zohar, stephen.smalley.work, casey
  Cc: tyhicks, tusharsu, sashal, jmorris, linux-integrity, selinux,
	linux-security-module, linux-kernel
In-Reply-To: <20200813170707.2659-1-nramas@linux.microsoft.com>

SELinux configuration and policy are some of the critical data for this
security module that needs to be measured. This measurement can be used
by an attestation service, for instance, to verify if the configuration
and policies have been setup correctly and that they haven't been tampered
with at runtime.

Measure SELinux configuration, policy capabilities settings, and the hash
of the loaded policy by calling the IMA hook ima_measure_critical_data().

Sample measurement of SELinux state and hash of the policy:

10 e32e...5ac3 ima-buf sha256:86e8...4594 selinux-state-1595389364:287899386 696e697469616c697a65643d313b656e61626c65643d313b656e666f7263696e673d303b636865636b72657170726f743d313b6e6574776f726b5f706565725f636f6e74726f6c733d313b6f70656e5f7065726d733d313b657874656e6465645f736f636b65745f636c6173733d313b616c776179735f636865636b5f6e6574776f726b3d303b6367726f75705f7365636c6162656c3d313b6e6e705f6e6f737569645f7472616e736974696f6e3d313b67656e66735f7365636c6162656c5f73796d6c696e6b733d303
10 9e81...0857 ima-buf sha256:4941...68fc selinux-policy-hash-1597335667:462051628 8d1d...1834

To verify the measurement check the following:

Execute the following command to extract the measured data
from the IMA log for SELinux configuration (selinux-state).

  grep -m 1 "selinux-state" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | cut -d' ' -f 6 | xxd -r -p

The output should be the list of key-value pairs. For example,
 initialized=1;enabled=1;enforcing=0;checkreqprot=1;network_peer_controls=1;open_perms=1;extended_socket_class=1;always_check_network=0;cgroup_seclabel=1;nnp_nosuid_transition=1;genfs_seclabel_symlinks=0;

To verify the measured data with the current SELinux state:

 => enabled should be set to 1 if /sys/fs/selinux folder exists,
    0 otherwise

For other entries, compare the integer value in the files
 => /sys/fs/selinux/enforce
 => /sys/fs/selinux/checkreqprot
And, each of the policy capabilities files under
 => /sys/fs/selinux/policy_capabilities

For selinux-policy-hash, the hash of SELinux policy is included
in the IMA log entry.

To verify the measured data with the current SELinux policy run
the following commands and verify the output hash values match.

  sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1

  grep -m 1 "selinux-policy-hash" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | cut -d' ' -f 6

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reported-by: kernel test robot <lkp@intel.com> # error: implicit declaration of function 'vfree'
Reported-by: kernel test robot <lkp@intel.com> # error: implicit declaration of function 'crypto_alloc_shash'
Reported-by: kernel test robot <lkp@intel.com> # sparse: symbol 'security_read_selinux_policy' was not declared. Should it be static?
---
 security/selinux/Makefile           |   2 +
 security/selinux/hooks.c            |   1 +
 security/selinux/include/security.h |  15 ++
 security/selinux/measure.c          | 204 ++++++++++++++++++++++++++++
 security/selinux/selinuxfs.c        |   3 +
 security/selinux/ss/services.c      |  71 ++++++++--
 6 files changed, 287 insertions(+), 9 deletions(-)
 create mode 100644 security/selinux/measure.c

diff --git a/security/selinux/Makefile b/security/selinux/Makefile
index 4d8e0e8adf0b..83d512116341 100644
--- a/security/selinux/Makefile
+++ b/security/selinux/Makefile
@@ -16,6 +16,8 @@ selinux-$(CONFIG_NETLABEL) += netlabel.o
 
 selinux-$(CONFIG_SECURITY_INFINIBAND) += ibpkey.o
 
+selinux-$(CONFIG_IMA) += measure.o
+
 ccflags-y := -I$(srctree)/security/selinux -I$(srctree)/security/selinux/include
 
 $(addprefix $(obj)/,$(selinux-y)): $(obj)/flask.h
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index efa6108b1ce9..5521dfc1900b 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -7394,6 +7394,7 @@ int selinux_disable(struct selinux_state *state)
 	}
 
 	selinux_mark_disabled(state);
+	selinux_measure_state(state);
 
 	pr_info("SELinux:  Disabled at runtime.\n");
 
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index b0e02cfe3ce1..77f42d9b544b 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -222,16 +222,31 @@ static inline bool selinux_policycap_genfs_seclabel_symlinks(void)
 	return state->policycap[POLICYDB_CAPABILITY_GENFS_SECLABEL_SYMLINKS];
 }
 
+static inline bool selinux_checkreqprot(const struct selinux_state *state)
+{
+	return READ_ONCE(state->checkreqprot);
+}
+
 int security_mls_enabled(struct selinux_state *state);
 int security_load_policy(struct selinux_state *state,
 			 void *data, size_t len);
 int security_read_policy(struct selinux_state *state,
 			 void **data, size_t *len);
+int security_read_policy_kernel(struct selinux_state *state,
+				void **data, size_t *len);
 size_t security_policydb_len(struct selinux_state *state);
 
 int security_policycap_supported(struct selinux_state *state,
 				 unsigned int req_cap);
 
+#ifdef CONFIG_IMA
+extern void selinux_measure_state(struct selinux_state *selinux_state);
+#else
+static inline void selinux_measure_state(struct selinux_state *selinux_state)
+{
+}
+#endif
+
 #define SEL_VEC_MAX 32
 struct av_decision {
 	u32 allowed;
diff --git a/security/selinux/measure.c b/security/selinux/measure.c
new file mode 100644
index 000000000000..f21b7de4e2ae
--- /dev/null
+++ b/security/selinux/measure.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Measure SELinux state using IMA subsystem.
+ */
+#include <linux/vmalloc.h>
+#include <linux/ktime.h>
+#include <linux/crypto.h>
+#include <crypto/hash.h>
+#include <crypto/hash_info.h>
+#include <linux/ima.h>
+#include "security.h"
+
+/*
+ * This function creates an unique name by appending the timestamp to
+ * the given string. This string is passed as "event name" to the IMA
+ * hook to measure the given SELinux data.
+ *
+ * The data provided by SELinux to the IMA subsystem for measuring may have
+ * already been measured (for instance the same state existed earlier).
+ * But for SELinux the current data represents a state change and hence
+ * needs to be measured again. To enable this, pass an unique "Event Name"
+ * to the IMA hook so that IMA subsystem will always measure the given data.
+ *
+ * For example,
+ * At time T0 SELinux data to be measured is "foo". IMA measures it.
+ * At time T1 the data is changed to "bar". IMA measures it.
+ * At time T2 the data is changed to "foo" again. IMA will not measure it
+ * (since it was already measured) unless the event name, for instance,
+ * is different in this call.
+ */
+static char *selinux_event_name(const char *name_prefix)
+{
+	char *event_name = NULL;
+	struct timespec64 curr_time;
+
+	ktime_get_real_ts64(&curr_time);
+	event_name = kasprintf(GFP_KERNEL, "%s-%lld:%09ld", name_prefix,
+			       curr_time.tv_sec, curr_time.tv_nsec);
+	if (!event_name) {
+		pr_warn("%s: event name not allocated.\n", __func__);
+		return NULL;
+	}
+
+	return event_name;
+}
+
+static int read_selinux_state(char **state_str, int *state_str_len,
+			      struct selinux_state *state)
+{
+	char *buf, *str_fmt = "%s=%d;";
+	int i, buf_len, curr;
+
+	buf_len = snprintf(NULL, 0, str_fmt, "initialized", 0);
+	buf_len += snprintf(NULL, 0, str_fmt, "enabled", 0);
+	buf_len += snprintf(NULL, 0, str_fmt, "enforcing", 0);
+	buf_len += snprintf(NULL, 0, str_fmt, "checkreqprot", 0);
+
+	for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) {
+		buf_len += snprintf(NULL, 0, str_fmt,
+				    selinux_policycap_names[i], 0);
+	}
+	++buf_len;
+
+	buf = kzalloc(buf_len, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	curr = snprintf(buf, buf_len, str_fmt,
+			"initialized", selinux_initialized(state));
+	curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+			 "enabled", !selinux_disabled(state));
+	curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+			 "enforcing", enforcing_enabled(state));
+	curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+			 "checkreqprot", selinux_checkreqprot(state));
+
+	for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) {
+		curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+				 selinux_policycap_names[i],
+				 state->policycap[i]);
+	}
+
+	*state_str = buf;
+	*state_str_len = curr;
+
+	return 0;
+}
+
+static int selinux_hash_buffer(void *buf, size_t buf_len,
+			       void **buf_hash, int *buf_hash_len)
+{
+	struct crypto_shash *tfm;
+	struct shash_desc *desc = NULL;
+	void *digest = NULL;
+	int desc_size;
+	int digest_size;
+	int ret = 0;
+
+	tfm = crypto_alloc_shash("sha256", 0, 0);
+	if (IS_ERR(tfm))
+		return PTR_ERR(tfm);
+
+	desc_size = crypto_shash_descsize(tfm) + sizeof(*desc);
+	digest_size = crypto_shash_digestsize(tfm);
+
+	digest = kmalloc(digest_size, GFP_KERNEL);
+	if (!digest) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	desc = kzalloc(desc_size, GFP_KERNEL);
+	if (!desc) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	desc->tfm = tfm;
+
+	ret = crypto_shash_digest(desc, buf, buf_len, digest);
+	if (ret < 0)
+		goto error;
+
+	*buf_hash_len = digest_size;
+	*buf_hash = digest;
+	digest = NULL;
+
+error:
+	kfree(desc);
+	kfree(digest);
+
+	crypto_free_shash(tfm);
+
+	return ret;
+}
+
+void selinux_measure_state(struct selinux_state *state)
+{
+	void *policy = NULL;
+	void *policy_hash = NULL;
+	char *event_name = NULL;
+	char *state_str = NULL;
+	size_t policy_len;
+	int state_str_len, policy_hash_len, rc = 0;
+	bool initialized = selinux_initialized(state);
+
+	rc = read_selinux_state(&state_str, &state_str_len, state);
+	if (rc) {
+		pr_warn("%s: Failed to read selinux state.\n", __func__);
+		return;
+	}
+
+	/*
+	 * Get an unique string for measuring the current SELinux state.
+	 */
+	event_name = selinux_event_name("selinux-state");
+	if (!event_name) {
+		pr_warn("%s: Event name for state not allocated.\n",
+			__func__);
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	rc = ima_measure_critical_data(event_name, "selinux",
+				       state_str, state_str_len);
+
+	kfree(event_name);
+	event_name = NULL;
+
+	if (rc)
+		goto out;
+
+	/*
+	 * Measure SELinux policy only after initialization is completed.
+	 */
+	if (!initialized)
+		goto out;
+
+	rc = security_read_policy_kernel(state, &policy, &policy_len);
+	if (rc)
+		goto out;
+
+	rc = selinux_hash_buffer(policy, policy_len,
+				 &policy_hash, &policy_hash_len);
+	if (rc)
+		goto out;
+
+	event_name = selinux_event_name("selinux-policy-hash");
+	if (!event_name) {
+		pr_warn("%s: Event name for policy not allocated.\n",
+			__func__);
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	rc = ima_measure_critical_data(event_name, "selinux",
+				       policy_hash, policy_hash_len);
+
+out:
+	kfree(event_name);
+	kfree(state_str);
+	kfree(policy_hash);
+	vfree(policy);
+}
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 4781314c2510..6d46eaef5c92 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -173,6 +173,7 @@ static ssize_t sel_write_enforce(struct file *file, const char __user *buf,
 			from_kuid(&init_user_ns, audit_get_loginuid(current)),
 			audit_get_sessionid(current));
 		enforcing_set(state, new_value);
+		selinux_measure_state(state);
 		if (new_value)
 			avc_ss_reset(state->avc, 0);
 		selnl_notify_setenforce(new_value);
@@ -678,6 +679,8 @@ static ssize_t sel_write_checkreqprot(struct file *file, const char __user *buf,
 
 	fsi->state->checkreqprot = new_value ? 1 : 0;
 	length = count;
+
+	selinux_measure_state(fsi->state);
 out:
 	kfree(page);
 	return length;
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index ef0afd878bfc..3978c804c361 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2182,6 +2182,7 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
 		selinux_status_update_policyload(state, seqno);
 		selinux_netlbl_cache_invalidate();
 		selinux_xfrm_notify_policyload();
+		selinux_measure_state(state);
 		return 0;
 	}
 
@@ -2270,6 +2271,7 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
 	selinux_status_update_policyload(state, seqno);
 	selinux_netlbl_cache_invalidate();
 	selinux_xfrm_notify_policyload();
+	selinux_measure_state(state);
 
 	rc = 0;
 	goto out;
@@ -2941,6 +2943,7 @@ int security_set_bools(struct selinux_state *state, u32 len, int *values)
 		selnl_notify_policyload(seqno);
 		selinux_status_update_policyload(state, seqno);
 		selinux_xfrm_notify_policyload();
+		selinux_measure_state(state);
 	}
 	return rc;
 }
@@ -3720,14 +3723,23 @@ int security_netlbl_sid_to_secattr(struct selinux_state *state,
 }
 #endif /* CONFIG_NETLABEL */
 
+static int security_read_policy_len(struct selinux_state *state, size_t *len)
+{
+	if (!selinux_initialized(state))
+		return -EINVAL;
+
+	*len = security_policydb_len(state);
+	return 0;
+}
+
 /**
- * security_read_policy - read the policy.
+ * security_read_selinux_policy - read the policy.
  * @data: binary policy data
  * @len: length of data in bytes
  *
  */
-int security_read_policy(struct selinux_state *state,
-			 void **data, size_t *len)
+static int security_read_selinux_policy(struct selinux_state *state,
+					void **data, size_t *len)
 {
 	struct policydb *policydb = &state->ss->policydb;
 	int rc;
@@ -3736,12 +3748,6 @@ int security_read_policy(struct selinux_state *state,
 	if (!selinux_initialized(state))
 		return -EINVAL;
 
-	*len = security_policydb_len(state);
-
-	*data = vmalloc_user(*len);
-	if (!*data)
-		return -ENOMEM;
-
 	fp.data = *data;
 	fp.len = *len;
 
@@ -3754,5 +3760,52 @@ int security_read_policy(struct selinux_state *state,
 
 	*len = (unsigned long)fp.data - (unsigned long)*data;
 	return 0;
+}
+
+/**
+ * security_read_policy - read the policy.
+ * @data: binary policy data
+ * @len: length of data in bytes
+ *
+ */
+int security_read_policy(struct selinux_state *state,
+			 void **data, size_t *len)
+{
+	int rc;
+
+	rc = security_read_policy_len(state, len);
+	if (rc)
+		return rc;
+
+	*data = vmalloc_user(*len);
+	if (!*data)
+		return -ENOMEM;
+
+	return security_read_selinux_policy(state, data, len);
+}
+
+/**
+ * security_read_policy_kernel - read the policy.
+ * @data: binary policy data
+ * @len: length of data in bytes
+ *
+ * Allocates kernel memory for reading SELinux policy.
+ * This function is for internal use only and should not
+ * be used for returning data to user space
+ *
+ */
+int security_read_policy_kernel(struct selinux_state *state,
+				void **data, size_t *len)
+{
+	int rc;
+
+	rc = security_read_policy_len(state, len);
+	if (rc)
+		return rc;
+
+	*data = vmalloc(*len);
+	if (!*data)
+		return -ENOMEM;
 
+	return security_read_selinux_policy(state, data, len);
 }
-- 
2.28.0


^ permalink raw reply related

* [PATCH 1/2] IMA: Handle early boot data measurement
From: Lakshmi Ramasubramanian @ 2020-08-13 17:07 UTC (permalink / raw)
  To: zohar, stephen.smalley.work, casey
  Cc: tyhicks, tusharsu, sashal, jmorris, linux-integrity, selinux,
	linux-security-module, linux-kernel
In-Reply-To: <20200813170707.2659-1-nramas@linux.microsoft.com>

The current implementation of early boot measurement in
the IMA subsystem is very specific to asymmetric keys. It does not
handle measurement of other data such as Linux Security Module (LSM)
data. Since most security modules are initialized very early in
the boot cycle, data provided by those modules are not measured
by IMA. Any other subsystem that initializes early in the boot cycle
and needs IMA to measure their data would suffer from the same issue.

Update the early boot key measurement to handle any early boot data.
Change the kernel configuration CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS to
CONFIG_IMA_QUEUE_EARLY_BOOT_DATA so it can be used for enabling
early boot data measurement. Change this new configuration to support
SECURITY_SELINUX subsystem in addition to KEYS subsystem, which is
currently supported. This can be extended to include more subsystems
in the future by updating this kernel configuration.

Update the IMA hook namely ima_measure_critical_data() to utilize
early boot measurement support.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
---
 security/integrity/ima/Kconfig               |   5 +-
 security/integrity/ima/Makefile              |   2 +-
 security/integrity/ima/ima.h                 |  37 ++--
 security/integrity/ima/ima_asymmetric_keys.c |   6 +-
 security/integrity/ima/ima_init.c            |   2 +-
 security/integrity/ima/ima_main.c            |   9 +
 security/integrity/ima/ima_policy.c          |   2 +-
 security/integrity/ima/ima_queue_data.c      | 190 +++++++++++++++++++
 security/integrity/ima/ima_queue_keys.c      | 174 -----------------
 9 files changed, 229 insertions(+), 198 deletions(-)
 create mode 100644 security/integrity/ima/ima_queue_data.c
 delete mode 100644 security/integrity/ima/ima_queue_keys.c

diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 080c53545ff0..e4fb1761d64a 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -322,10 +322,9 @@ config IMA_MEASURE_ASYMMETRIC_KEYS
 	depends on ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
 	default y
 
-config IMA_QUEUE_EARLY_BOOT_KEYS
+config IMA_QUEUE_EARLY_BOOT_DATA
 	bool
-	depends on IMA_MEASURE_ASYMMETRIC_KEYS
-	depends on SYSTEM_TRUSTED_KEYRING
+	depends on SECURITY_SELINUX || (IMA_MEASURE_ASYMMETRIC_KEYS && SYSTEM_TRUSTED_KEYRING)
 	default y
 
 config IMA_SECURE_AND_OR_TRUSTED_BOOT
diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index 67dabca670e2..cbbbc9848d2f 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -13,4 +13,4 @@ ima-$(CONFIG_IMA_APPRAISE_MODSIG) += ima_modsig.o
 ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
 ima-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
 ima-$(CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS) += ima_asymmetric_keys.o
-ima-$(CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS) += ima_queue_keys.o
+ima-$(CONFIG_IMA_QUEUE_EARLY_BOOT_DATA) += ima_queue_data.o
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e65ab067e700..8822ad1ef4eb 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -228,29 +228,34 @@ extern const char *const func_tokens[];
 
 struct modsig;
 
-#ifdef CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS
+#ifdef CONFIG_IMA_QUEUE_EARLY_BOOT_DATA
 /*
- * To track keys that need to be measured.
+ * To track data that needs to be measured.
  */
-struct ima_key_entry {
+struct ima_data_entry {
 	struct list_head list;
 	void *payload;
 	size_t payload_len;
-	char *keyring_name;
+	const char *event_name;
+	const char *event_data;
+	enum ima_hooks func;
 };
-void ima_init_key_queue(void);
-bool ima_should_queue_key(void);
-bool ima_queue_key(struct key *keyring, const void *payload,
-		   size_t payload_len);
-void ima_process_queued_keys(void);
+void ima_init_data_queue(void);
+bool ima_should_queue_data(void);
+bool ima_queue_data(const char *event_name, const void *payload,
+		    size_t payload_len, const char *event_data,
+		    enum ima_hooks func);
+void ima_process_queued_data(void);
 #else
-static inline void ima_init_key_queue(void) {}
-static inline bool ima_should_queue_key(void) { return false; }
-static inline bool ima_queue_key(struct key *keyring,
-				 const void *payload,
-				 size_t payload_len) { return false; }
-static inline void ima_process_queued_keys(void) {}
-#endif /* CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS */
+static inline void ima_init_data_queue(void) {}
+static inline bool ima_should_queue_data(void) { return false; }
+static inline bool ima_queue_data(const char *event_name,
+				  const void *payload,
+				  size_t payload_len,
+				  const char *event_data,
+				  enum ima_hooks func) { return false; }
+static inline void ima_process_queued_data(void) {}
+#endif /* CONFIG_IMA_QUEUE_EARLY_BOOT_DATA */
 
 /* LIM API function definitions */
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
diff --git a/security/integrity/ima/ima_asymmetric_keys.c b/security/integrity/ima/ima_asymmetric_keys.c
index 1c68c500c26f..8f8431f8b096 100644
--- a/security/integrity/ima/ima_asymmetric_keys.c
+++ b/security/integrity/ima/ima_asymmetric_keys.c
@@ -37,8 +37,10 @@ void ima_post_key_create_or_update(struct key *keyring, struct key *key,
 	if (!payload || (payload_len == 0))
 		return;
 
-	if (ima_should_queue_key())
-		queued = ima_queue_key(keyring, payload, payload_len);
+	if (ima_should_queue_data())
+		queued = ima_queue_data(keyring->description, payload,
+					payload_len, keyring->description,
+					KEY_CHECK);
 
 	if (queued)
 		return;
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index 4902fe7bd570..892894bf4af3 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -145,7 +145,7 @@ int __init ima_init(void)
 	if (rc != 0)
 		return rc;
 
-	ima_init_key_queue();
+	ima_init_data_queue();
 
 	return rc;
 }
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 129bcaaf13e2..8b27d9dafd6b 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -861,9 +861,18 @@ int ima_measure_critical_data(const char *event_name,
 			      const char *event_data_source,
 			      const void *buf, int buf_len)
 {
+	bool queued = false;
+
 	if (!event_name || !event_data_source || !buf || !buf_len)
 		return -EINVAL;
 
+	if (ima_should_queue_data())
+		queued = ima_queue_data(event_name, buf, buf_len,
+					event_data_source, CRITICAL_DATA);
+
+	if (queued)
+		return 0;
+
 	return process_buffer_measurement(NULL, buf, buf_len, event_name,
 					  CRITICAL_DATA, 0, event_data_source);
 }
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index 8451ccb2a775..8f1155a99e89 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -906,7 +906,7 @@ void ima_update_policy(void)
 	ima_update_policy_flag();
 
 	/* Custom IMA policy has been loaded */
-	ima_process_queued_keys();
+	ima_process_queued_data();
 }
 
 /* Keep the enumeration in sync with the policy_tokens! */
diff --git a/security/integrity/ima/ima_queue_data.c b/security/integrity/ima/ima_queue_data.c
new file mode 100644
index 000000000000..93420e7670b9
--- /dev/null
+++ b/security/integrity/ima/ima_queue_data.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2019 Microsoft Corporation
+ *
+ * Author: Lakshmi Ramasubramanian (nramas@linux.microsoft.com)
+ *
+ * File: ima_queue_data.c
+ *       Enables deferred processing of data to be measured
+ */
+
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/workqueue.h>
+#include "ima.h"
+
+/*
+ * Flag to indicate whether data can be processed
+ * right away or should be queued for processing later.
+ */
+static bool ima_process_data;
+
+/*
+ * To synchronize access to the list of data that need to be measured
+ */
+static DEFINE_MUTEX(ima_data_lock);
+static LIST_HEAD(ima_queued_data);
+
+/*
+ * If custom IMA policy is not loaded then data queued up
+ * for measurement should be freed. This worker is used
+ * for handling this scenario.
+ */
+static long ima_data_queue_timeout = 300000; /* 5 Minutes */
+static void ima_data_handler(struct work_struct *work);
+static DECLARE_DELAYED_WORK(ima_data_delayed_work, ima_data_handler);
+static bool timer_expired;
+
+/*
+ * This worker function frees data that may still be
+ * queued up in case custom IMA policy was not loaded.
+ */
+static void ima_data_handler(struct work_struct *work)
+{
+	timer_expired = true;
+	ima_process_queued_data();
+}
+
+/*
+ * This function sets up a worker to free queued data in case
+ * custom IMA policy was never loaded.
+ */
+void ima_init_data_queue(void)
+{
+	schedule_delayed_work(&ima_data_delayed_work,
+			      msecs_to_jiffies(ima_data_queue_timeout));
+}
+
+static void ima_free_data_entry(struct ima_data_entry *entry)
+{
+	if (!entry)
+		return;
+
+	kvfree(entry->payload);
+	kfree(entry->event_name);
+	kfree(entry->event_data);
+	kfree(entry);
+}
+
+static void *ima_kvmemdup(const void *src, size_t len)
+{
+	void *p = kvmalloc(len, GFP_KERNEL);
+
+	if (p)
+		memcpy(p, src, len);
+	return p;
+}
+
+static struct ima_data_entry *ima_alloc_data_entry(const char *event_name,
+						   const void *payload,
+						   size_t payload_len,
+						   const char *event_data,
+						   enum ima_hooks func)
+{
+	struct ima_data_entry *entry;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		goto out;
+
+	/*
+	 * Payload size may exceed the limit supported by kmalloc.
+	 * So use kvmalloc instead.
+	 */
+	entry->payload = ima_kvmemdup(payload, payload_len);
+	entry->event_name = kstrdup(event_name, GFP_KERNEL);
+	if (event_data)
+		entry->event_data = kstrdup(event_data, GFP_KERNEL);
+
+	entry->payload_len = payload_len;
+	entry->func = func;
+
+	if (!entry->payload || !entry->event_name ||
+		(event_data && !entry->event_data))
+		goto out;
+
+	INIT_LIST_HEAD(&entry->list);
+	return entry;
+
+out:
+	integrity_audit_message(AUDIT_INTEGRITY_PCR, NULL,
+				event_name, func_measure_str(func),
+				"ENOMEM", -ENOMEM, 0, -ENOMEM);
+	ima_free_data_entry(entry);
+	return NULL;
+}
+
+bool ima_queue_data(const char *event_name, const void *payload,
+		    size_t payload_len, const char *event_data,
+		    enum ima_hooks func)
+{
+	bool queued = false;
+	struct ima_data_entry *entry;
+
+	entry = ima_alloc_data_entry(event_name, payload, payload_len,
+				     event_data, func);
+	if (!entry)
+		return false;
+
+	mutex_lock(&ima_data_lock);
+	if (!ima_process_data) {
+		list_add_tail(&entry->list, &ima_queued_data);
+		queued = true;
+	}
+	mutex_unlock(&ima_data_lock);
+
+	if (!queued)
+		ima_free_data_entry(entry);
+
+	return queued;
+}
+
+/*
+ * ima_process_queued_data() - process data queued for measurement
+ *
+ * This function sets ima_process_data to true and processes queued data.
+ * From here on data will be processed right away (not queued).
+ */
+void ima_process_queued_data(void)
+{
+	struct ima_data_entry *entry, *tmp;
+	bool process = false;
+
+	if (ima_process_data)
+		return;
+
+	/*
+	 * Since ima_process_data is set to true, any new data will be
+	 * processed immediately and not be queued to ima_queued_data list.
+	 * First one setting the ima_process_data flag to true will
+	 * process the queued data.
+	 */
+	mutex_lock(&ima_data_lock);
+	if (!ima_process_data) {
+		ima_process_data = true;
+		process = true;
+	}
+	mutex_unlock(&ima_data_lock);
+
+	if (!process)
+		return;
+
+	if (!timer_expired)
+		cancel_delayed_work_sync(&ima_data_delayed_work);
+
+	list_for_each_entry_safe(entry, tmp, &ima_queued_data, list) {
+		if (!timer_expired)
+			process_buffer_measurement(NULL, entry->payload,
+						   entry->payload_len,
+						   entry->event_name,
+						   entry->func, 0,
+						   entry->event_data);
+		list_del(&entry->list);
+		ima_free_data_entry(entry);
+	}
+}
+
+inline bool ima_should_queue_data(void)
+{
+	return !ima_process_data;
+}
diff --git a/security/integrity/ima/ima_queue_keys.c b/security/integrity/ima/ima_queue_keys.c
deleted file mode 100644
index 69a8626a35c0..000000000000
--- a/security/integrity/ima/ima_queue_keys.c
+++ /dev/null
@@ -1,174 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0+
-/*
- * Copyright (C) 2019 Microsoft Corporation
- *
- * Author: Lakshmi Ramasubramanian (nramas@linux.microsoft.com)
- *
- * File: ima_queue_keys.c
- *       Enables deferred processing of keys
- */
-
-#include <linux/workqueue.h>
-#include <keys/asymmetric-type.h>
-#include "ima.h"
-
-/*
- * Flag to indicate whether a key can be processed
- * right away or should be queued for processing later.
- */
-static bool ima_process_keys;
-
-/*
- * To synchronize access to the list of keys that need to be measured
- */
-static DEFINE_MUTEX(ima_keys_lock);
-static LIST_HEAD(ima_keys);
-
-/*
- * If custom IMA policy is not loaded then keys queued up
- * for measurement should be freed. This worker is used
- * for handling this scenario.
- */
-static long ima_key_queue_timeout = 300000; /* 5 Minutes */
-static void ima_keys_handler(struct work_struct *work);
-static DECLARE_DELAYED_WORK(ima_keys_delayed_work, ima_keys_handler);
-static bool timer_expired;
-
-/*
- * This worker function frees keys that may still be
- * queued up in case custom IMA policy was not loaded.
- */
-static void ima_keys_handler(struct work_struct *work)
-{
-	timer_expired = true;
-	ima_process_queued_keys();
-}
-
-/*
- * This function sets up a worker to free queued keys in case
- * custom IMA policy was never loaded.
- */
-void ima_init_key_queue(void)
-{
-	schedule_delayed_work(&ima_keys_delayed_work,
-			      msecs_to_jiffies(ima_key_queue_timeout));
-}
-
-static void ima_free_key_entry(struct ima_key_entry *entry)
-{
-	if (entry) {
-		kfree(entry->payload);
-		kfree(entry->keyring_name);
-		kfree(entry);
-	}
-}
-
-static struct ima_key_entry *ima_alloc_key_entry(struct key *keyring,
-						 const void *payload,
-						 size_t payload_len)
-{
-	int rc = 0;
-	const char *audit_cause = "ENOMEM";
-	struct ima_key_entry *entry;
-
-	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-	if (entry) {
-		entry->payload = kmemdup(payload, payload_len, GFP_KERNEL);
-		entry->keyring_name = kstrdup(keyring->description,
-					      GFP_KERNEL);
-		entry->payload_len = payload_len;
-	}
-
-	if ((entry == NULL) || (entry->payload == NULL) ||
-	    (entry->keyring_name == NULL)) {
-		rc = -ENOMEM;
-		goto out;
-	}
-
-	INIT_LIST_HEAD(&entry->list);
-
-out:
-	if (rc) {
-		integrity_audit_message(AUDIT_INTEGRITY_PCR, NULL,
-					keyring->description,
-					func_measure_str(KEY_CHECK),
-					audit_cause, rc, 0, rc);
-		ima_free_key_entry(entry);
-		entry = NULL;
-	}
-
-	return entry;
-}
-
-bool ima_queue_key(struct key *keyring, const void *payload,
-		   size_t payload_len)
-{
-	bool queued = false;
-	struct ima_key_entry *entry;
-
-	entry = ima_alloc_key_entry(keyring, payload, payload_len);
-	if (!entry)
-		return false;
-
-	mutex_lock(&ima_keys_lock);
-	if (!ima_process_keys) {
-		list_add_tail(&entry->list, &ima_keys);
-		queued = true;
-	}
-	mutex_unlock(&ima_keys_lock);
-
-	if (!queued)
-		ima_free_key_entry(entry);
-
-	return queued;
-}
-
-/*
- * ima_process_queued_keys() - process keys queued for measurement
- *
- * This function sets ima_process_keys to true and processes queued keys.
- * From here on keys will be processed right away (not queued).
- */
-void ima_process_queued_keys(void)
-{
-	struct ima_key_entry *entry, *tmp;
-	bool process = false;
-
-	if (ima_process_keys)
-		return;
-
-	/*
-	 * Since ima_process_keys is set to true, any new key will be
-	 * processed immediately and not be queued to ima_keys list.
-	 * First one setting the ima_process_keys flag to true will
-	 * process the queued keys.
-	 */
-	mutex_lock(&ima_keys_lock);
-	if (!ima_process_keys) {
-		ima_process_keys = true;
-		process = true;
-	}
-	mutex_unlock(&ima_keys_lock);
-
-	if (!process)
-		return;
-
-	if (!timer_expired)
-		cancel_delayed_work_sync(&ima_keys_delayed_work);
-
-	list_for_each_entry_safe(entry, tmp, &ima_keys, list) {
-		if (!timer_expired)
-			process_buffer_measurement(NULL, entry->payload,
-						   entry->payload_len,
-						   entry->keyring_name,
-						   KEY_CHECK, 0,
-						   entry->keyring_name);
-		list_del(&entry->list);
-		ima_free_key_entry(entry);
-	}
-}
-
-inline bool ima_should_queue_key(void)
-{
-	return !ima_process_keys;
-}
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH 2/2] SELinux: Measure state and hash of policy using IMA
From: Stephen Smalley @ 2020-08-13 17:42 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian, zohar, casey
  Cc: tyhicks, tusharsu, sashal, jmorris, linux-integrity, selinux,
	linux-security-module, linux-kernel, paul Moore
In-Reply-To: <20200813170707.2659-3-nramas@linux.microsoft.com>

On 8/13/20 1:07 PM, Lakshmi Ramasubramanian wrote:

> SELinux configuration and policy are some of the critical data for this
> security module that needs to be measured. This measurement can be used
> by an attestation service, for instance, to verify if the configuration
> and policies have been setup correctly and that they haven't been tampered
> with at runtime.
>
> Measure SELinux configuration, policy capabilities settings, and the hash
> of the loaded policy by calling the IMA hook ima_measure_critical_data().
>
> Sample measurement of SELinux state and hash of the policy:
>
> 10 e32e...5ac3 ima-buf sha256:86e8...4594 selinux-state-1595389364:287899386 696e697469616c697a65643d313b656e61626c65643d313b656e666f7263696e673d303b636865636b72657170726f743d313b6e6574776f726b5f706565725f636f6e74726f6c733d313b6f70656e5f7065726d733d313b657874656e6465645f736f636b65745f636c6173733d313b616c776179735f636865636b5f6e6574776f726b3d303b6367726f75705f7365636c6162656c3d313b6e6e705f6e6f737569645f7472616e736974696f6e3d313b67656e66735f7365636c6162656c5f73796d6c696e6b733d303
> 10 9e81...0857 ima-buf sha256:4941...68fc selinux-policy-hash-1597335667:462051628 8d1d...1834
>
> To verify the measurement check the following:
>
> Execute the following command to extract the measured data
> from the IMA log for SELinux configuration (selinux-state).
>
>    grep -m 1 "selinux-state" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | cut -d' ' -f 6 | xxd -r -p
>
> The output should be the list of key-value pairs. For example,
>   initialized=1;enabled=1;enforcing=0;checkreqprot=1;network_peer_controls=1;open_perms=1;extended_socket_class=1;always_check_network=0;cgroup_seclabel=1;nnp_nosuid_transition=1;genfs_seclabel_symlinks=0;
>
> To verify the measured data with the current SELinux state:
>
>   => enabled should be set to 1 if /sys/fs/selinux folder exists,
>      0 otherwise
>
> For other entries, compare the integer value in the files
>   => /sys/fs/selinux/enforce
>   => /sys/fs/selinux/checkreqprot
> And, each of the policy capabilities files under
>   => /sys/fs/selinux/policy_capabilities
>
> For selinux-policy-hash, the hash of SELinux policy is included
> in the IMA log entry.
>
> To verify the measured data with the current SELinux policy run
> the following commands and verify the output hash values match.
>
>    sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1
>
>    grep -m 1 "selinux-policy-hash" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | cut -d' ' -f 6
>
> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> Reported-by: kernel test robot <lkp@intel.com> # error: implicit declaration of function 'vfree'
> Reported-by: kernel test robot <lkp@intel.com> # error: implicit declaration of function 'crypto_alloc_shash'
> Reported-by: kernel test robot <lkp@intel.com> # sparse: symbol 'security_read_selinux_policy' was not declared. Should it be static?
> ---
>   
> diff --git a/security/selinux/measure.c b/security/selinux/measure.c
> new file mode 100644
> index 000000000000..f21b7de4e2ae
> --- /dev/null
> +++ b/security/selinux/measure.c
> @@ -0,0 +1,204 @@
> +static int selinux_hash_buffer(void *buf, size_t buf_len,
> +			       void **buf_hash, int *buf_hash_len)
> +{
> +	struct crypto_shash *tfm;
> +	struct shash_desc *desc = NULL;
> +	void *digest = NULL;
> +	int desc_size;
> +	int digest_size;
> +	int ret = 0;
> +
> +	tfm = crypto_alloc_shash("sha256", 0, 0);
> +	if (IS_ERR(tfm))
> +		return PTR_ERR(tfm);
Can we make the algorithm selectable via kernel parameter and/or writing 
to a new selinuxfs node?

^ permalink raw reply

* Re: [PATCH 2/2] SELinux: Measure state and hash of policy using IMA
From: Lakshmi Ramasubramanian @ 2020-08-13 17:52 UTC (permalink / raw)
  To: Stephen Smalley, zohar, casey
  Cc: tyhicks, tusharsu, sashal, jmorris, linux-integrity, selinux,
	linux-security-module, linux-kernel, paul Moore
In-Reply-To: <5f738fd8-fe28-5358-b3d8-b671b45caa7f@gmail.com>

On 8/13/20 10:42 AM, Stephen Smalley wrote:

>> diff --git a/security/selinux/measure.c b/security/selinux/measure.c
>> new file mode 100644
>> index 000000000000..f21b7de4e2ae
>> --- /dev/null
>> +++ b/security/selinux/measure.c
>> @@ -0,0 +1,204 @@
>> +static int selinux_hash_buffer(void *buf, size_t buf_len,
>> +                   void **buf_hash, int *buf_hash_len)
>> +{
>> +    struct crypto_shash *tfm;
>> +    struct shash_desc *desc = NULL;
>> +    void *digest = NULL;
>> +    int desc_size;
>> +    int digest_size;
>> +    int ret = 0;
>> +
>> +    tfm = crypto_alloc_shash("sha256", 0, 0);
>> +    if (IS_ERR(tfm))
>> +        return PTR_ERR(tfm);
> Can we make the algorithm selectable via kernel parameter and/or writing 
> to a new selinuxfs node?

I can add a kernel parameter to select this hash algorithm.

  -lakshmi


^ permalink raw reply

* Re: [PATCH 2/2] SELinux: Measure state and hash of policy using IMA
From: Stephen Smalley @ 2020-08-13 17:58 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Mimi Zohar, Casey Schaufler, Tyler Hicks, tusharsu, sashal,
	James Morris, linux-integrity, SElinux list, LSM List,
	linux-kernel, paul Moore
In-Reply-To: <7315b7e8-2c53-2555-bc2e-aae42e16aaa2@linux.microsoft.com>

On Thu, Aug 13, 2020 at 1:52 PM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
>
> On 8/13/20 10:42 AM, Stephen Smalley wrote:
>
> >> diff --git a/security/selinux/measure.c b/security/selinux/measure.c
> >> new file mode 100644
> >> index 000000000000..f21b7de4e2ae
> >> --- /dev/null
> >> +++ b/security/selinux/measure.c
> >> @@ -0,0 +1,204 @@
> >> +static int selinux_hash_buffer(void *buf, size_t buf_len,
> >> +                   void **buf_hash, int *buf_hash_len)
> >> +{
> >> +    struct crypto_shash *tfm;
> >> +    struct shash_desc *desc = NULL;
> >> +    void *digest = NULL;
> >> +    int desc_size;
> >> +    int digest_size;
> >> +    int ret = 0;
> >> +
> >> +    tfm = crypto_alloc_shash("sha256", 0, 0);
> >> +    if (IS_ERR(tfm))
> >> +        return PTR_ERR(tfm);
> > Can we make the algorithm selectable via kernel parameter and/or writing
> > to a new selinuxfs node?
>
> I can add a kernel parameter to select this hash algorithm.

Also can we provide a Kconfig option for the default value like IMA does?

^ permalink raw reply

* Re: [PATCH 2/2] SELinux: Measure state and hash of policy using IMA
From: Lakshmi Ramasubramanian @ 2020-08-13 18:03 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Mimi Zohar, Casey Schaufler, Tyler Hicks, tusharsu, sashal,
	James Morris, linux-integrity, SElinux list, LSM List,
	linux-kernel, paul Moore
In-Reply-To: <CAEjxPJ6sZdm2w=bbkL0uJyEkHw0gCT_y812WQBZPtLCJzO6r3A@mail.gmail.com>

On 8/13/20 10:58 AM, Stephen Smalley wrote:
> On Thu, Aug 13, 2020 at 1:52 PM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
>>
>> On 8/13/20 10:42 AM, Stephen Smalley wrote:
>>
>>>> diff --git a/security/selinux/measure.c b/security/selinux/measure.c
>>>> new file mode 100644
>>>> index 000000000000..f21b7de4e2ae
>>>> --- /dev/null
>>>> +++ b/security/selinux/measure.c
>>>> @@ -0,0 +1,204 @@
>>>> +static int selinux_hash_buffer(void *buf, size_t buf_len,
>>>> +                   void **buf_hash, int *buf_hash_len)
>>>> +{
>>>> +    struct crypto_shash *tfm;
>>>> +    struct shash_desc *desc = NULL;
>>>> +    void *digest = NULL;
>>>> +    int desc_size;
>>>> +    int digest_size;
>>>> +    int ret = 0;
>>>> +
>>>> +    tfm = crypto_alloc_shash("sha256", 0, 0);
>>>> +    if (IS_ERR(tfm))
>>>> +        return PTR_ERR(tfm);
>>> Can we make the algorithm selectable via kernel parameter and/or writing
>>> to a new selinuxfs node?
>>
>> I can add a kernel parameter to select this hash algorithm.
> 
> Also can we provide a Kconfig option for the default value like IMA does?
> 

Would we need both - Kconfig and kernel param?

The other option is to provide an IMA function to return the current 
hash algorithm used for measurement. That way a consistent hash 
algorithm can be employed by both IMA and the callers. Would that be better?

  -lakshmi


^ permalink raw reply

* Re: [PATCH 2/2] SELinux: Measure state and hash of policy using IMA
From: Stephen Smalley @ 2020-08-13 18:13 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Mimi Zohar, Casey Schaufler, Tyler Hicks, tusharsu, sashal,
	James Morris, linux-integrity, SElinux list, LSM List,
	linux-kernel, paul Moore
In-Reply-To: <e935c06f-09e2-a2f7-f97f-768bc017f477@linux.microsoft.com>

On Thu, Aug 13, 2020 at 2:03 PM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
>
> On 8/13/20 10:58 AM, Stephen Smalley wrote:
> > On Thu, Aug 13, 2020 at 1:52 PM Lakshmi Ramasubramanian
> > <nramas@linux.microsoft.com> wrote:
> >>
> >> On 8/13/20 10:42 AM, Stephen Smalley wrote:
> >>
> >>>> diff --git a/security/selinux/measure.c b/security/selinux/measure.c
> >>>> new file mode 100644
> >>>> index 000000000000..f21b7de4e2ae
> >>>> --- /dev/null
> >>>> +++ b/security/selinux/measure.c
> >>>> @@ -0,0 +1,204 @@
> >>>> +static int selinux_hash_buffer(void *buf, size_t buf_len,
> >>>> +                   void **buf_hash, int *buf_hash_len)
> >>>> +{
> >>>> +    struct crypto_shash *tfm;
> >>>> +    struct shash_desc *desc = NULL;
> >>>> +    void *digest = NULL;
> >>>> +    int desc_size;
> >>>> +    int digest_size;
> >>>> +    int ret = 0;
> >>>> +
> >>>> +    tfm = crypto_alloc_shash("sha256", 0, 0);
> >>>> +    if (IS_ERR(tfm))
> >>>> +        return PTR_ERR(tfm);
> >>> Can we make the algorithm selectable via kernel parameter and/or writing
> >>> to a new selinuxfs node?
> >>
> >> I can add a kernel parameter to select this hash algorithm.
> >
> > Also can we provide a Kconfig option for the default value like IMA does?
> >
>
> Would we need both - Kconfig and kernel param?
>
> The other option is to provide an IMA function to return the current
> hash algorithm used for measurement. That way a consistent hash
> algorithm can be employed by both IMA and the callers. Would that be better?

This is why I preferred just passing the serialized policy buffer to
IMA and letting it handle the hashing.  But apparently that approach
wouldn't fly.  IMA appears to support both a Kconfig option for
selecting a default algorithm and a kernel parameter for overriding
it.  I assume the idea is that the distros can pick a reasonable
default and then the end users can override that if they have specific
requirements.  I'd want the same for SELinux.  If IMA is willing to
export its hash algorithm to external components, then I'm willing to
reuse that but not sure if that's a layering violation.

^ permalink raw reply

* Re: [PATCH v20 05/12] LSM: Infrastructure management of the superblock
From: Stephen Smalley @ 2020-08-13 18:39 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Stephen Smalley, Casey Schaufler, Kees Cook, John Johansen,
	linux-kernel, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, James Morris, Jann Horn, Jeff Dike,
	Jonathan Corbet, Michael Kerrisk, Richard Weinberger,
	Serge E . Hallyn, Shuah Khan, Vincent Dagonneau, kernel-hardening,
	linux-api, linux-arch, linux-doc, Linux FS Devel, linux-kselftest,
	LSM List, x86
In-Reply-To: <03f522c0-414c-434b-a0d1-57c3b17fa67f@digikod.net>

On Thu, Aug 13, 2020 at 10:17 AM Mickaël Salaün <mic@digikod.net> wrote:
>
>
> On 12/08/2020 21:16, Stephen Smalley wrote:
> > On 8/2/20 5:58 PM, Mickaël Salaün wrote:
> >> From: Casey Schaufler <casey@schaufler-ca.com>
> >>
> >> Move management of the superblock->sb_security blob out
> >> of the individual security modules and into the security
> >> infrastructure. Instead of allocating the blobs from within
> >> the modules the modules tell the infrastructure how much
> >> space is required, and the space is allocated there.
> >>
> >> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
> >> Reviewed-by: Kees Cook <keescook@chromium.org>
> >> Reviewed-by: John Johansen <john.johansen@canonical.com>
> >> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
> >> Reviewed-by: Mickaël Salaün <mic@digikod.net>
> >> Link:
> >> https://lore.kernel.org/r/20190829232935.7099-2-casey@schaufler-ca.com
> >> ---
> >>
> >> Changes since v17:
> >> * Rebase the original LSM stacking patch from v5.3 to v5.7: I fixed some
> >>    diff conflicts caused by code moves and function renames in
> >>    selinux/include/objsec.h and selinux/hooks.c .  I checked that it
> >>    builds but I didn't test the changes for SELinux nor SMACK.
> >
> > You shouldn't retain Signed-off-by and Reviewed-by lines from an earlier
> > patch if you made non-trivial changes to it (even more so if you didn't
> > test them).
>
> I think I made trivial changes according to the original patch. But
> without reply from other people with Signed-off-by or Reviewed-by
> (Casey, Kees, John), I'll remove them. I guess you don't want your
> Reviewed-by to be kept, so I'll remove it, except if you want to review
> this patch (or the modified part).

At the very least your Reviewed-by line is wrong - yours should be
Signed-off-by because the patch went through you and you modified it.
I'll try to take a look as time permits but FYI you should this
address (already updated in MAINTAINERS) going forward.

^ permalink raw reply

* Re: [PATCH v20 08/12] landlock: Add syscall implementations
From: Mickaël Salaün @ 2020-08-13 20:29 UTC (permalink / raw)
  To: Arnd Bergmann, Michael Kerrisk
  Cc: linux-kernel, Al Viro, Andy Lutomirski, Anton Ivanov,
	Casey Schaufler, James Morris, Jann Horn, Jeff Dike,
	Jonathan Corbet, Kees Cook, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200802215903.91936-9-mic@digikod.net>

Arnd and Michael, what do you think about these new syscalls?


On 02/08/2020 23:58, Mickaël Salaün wrote:
> These 4 system calls are designed to be used by unprivileged processes
> to sandbox themselves:
> * landlock_get_features(2): Gets the supported features (required for
>   backward and forward compatibility, and best-effort security).
> * landlock_create_ruleset(2): Creates a ruleset and returns its file
>   descriptor.
> * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
>   ruleset, identified by the dedicated file descriptor.
> * landlock_enforce_ruleset(2): Enforces a ruleset on the current thread
>   and its future children (similar to seccomp).  This syscall has the
>   same usage restrictions as seccomp(2): the caller must have the
>   no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
>   namespace.
> 
> All these syscalls have an options argument (not currently used) to
> enable extensibility.
> 
> Here are the motivations for these new syscalls:
> * A sandboxed process may not have access to file systems, including
>   /dev, /sys or /proc, but it should still be able to add more
>   restrictions to itself.
> * Neither prctl(2) nor seccomp(2) (which was used in a previous version)
>   fit well with the current definition of a Landlock security policy.
> * It is quite easy to whitelist this syscall with seccomp-bpf to enable
>   all processes to use it.  It is also easy to filter specific commands
>   or options to restrict a process to a subset of Landlock features.
> 
> All variable attributes are checked at build time to ensure that they
> don't contain holes and that they are aligned the same way for each
> architecture.  The struct landlock_attr_features contains __u32
> options_* fields which is enough to store 32-bits syscall arguments, and
> __u16 size_attr_* fields which is enough for the maximal struct size
> (i.e. page size) passed through the landlock syscall.  The other fields
> can have __u64 type for flags and bitfields, and __s32 type for file
> descriptors.
> 
> See the user and kernel documentation for more details (provided by a
> following commit): Documentation/security/landlock/
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: James Morris <jmorris@namei.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> ---
> 
> Changes since v19:
> * Replace the landlock(2) syscall with 4 syscalls (one for each
>   command): landlock_get_features(2), landlock_create_ruleset(2),
>   landlock_add_rule(2) and landlock_enforce_ruleset(2) (suggested by
>   Arnd Bergmann).
>   https://lore.kernel.org/lkml/56d15841-e2c1-2d58-59b8-3a6a09b23b4a@digikod.net/
> * Return EOPNOTSUPP (instead of ENOPKG) when Landlock is disabled.
> * Add two new fields to landlock_attr_features to fit with the new
>   syscalls: last_rule_type and last_target_type.  This enable to easily
>   identify which types are supported.
> * Pack landlock_attr_path_beneath struct because of the removed
>   ruleset_fd.
> * Update documentation and fix spelling.
> 
> Changes since v18:
> * Remove useless include.
> * Remove LLATTR_SIZE() which was only used to shorten lines. Cf. commit
>   bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column warning").
> 
> Changes since v17:
> * Synchronize syscall declaration.
> * Fix comment.
> 
> Changes since v16:
> * Add a size_attr_features field to struct landlock_attr_features for
>   self-introspection, and move the access_fs field to be more
>   consistent.
> * Replace __aligned_u64 types of attribute fields with __u16, __s32,
>   __u32 and __u64, and check at build time that these structures does
>   not contain hole and that they are aligned the same way (8-bits) on
>   all architectures.  This shrinks the size of the userspace ABI, which
>   may be appreciated especially for struct landlock_attr_features which
>   could grow a lot in the future.  For instance, struct
>   landlock_attr_features shrinks from 72 bytes to 32 bytes.  This change
>   also enables to remove 64-bits to 32-bits conversion checks.
> * Switch syscall attribute pointer and size arguments to follow similar
>   syscall argument order (e.g. bpf, clone3, openat2).
> * Set LANDLOCK_OPT_* types to 32-bits.
> * Allow enforcement of empty ruleset, which enables deny-all policies.
> * Fix documentation inconsistency.
> 
> Changes since v15:
> * Do not add file descriptors referring to internal filesystems (e.g.
>   nsfs) in a ruleset.
> * Replace is_user_mountable() with in-place clean checks.
> * Replace EBADR with EBADFD in get_ruleset_from_fd() and
>   get_path_from_fd().
> * Remove ruleset's show_fdinfo() for now.
> 
> Changes since v14:
> * Remove the security_file_open() check in get_path_from_fd(): an
>   opened FD should not be restricted here, and even less with this hook.
>   As a result, it is now allowed to add a path to a ruleset even if the
>   access to this path is not allowed (without O_PATH). This doesn't
>   change the fact that enforcing a ruleset can't grant any right, only
>   remove some rights.  The new layer levels add more consistent
>   restrictions.
> * Check minimal landlock_attr_* size/content. This fix the case when
>   no data was provided and e.g., FD 0 was interpreted as ruleset_fd.
>   Now this leads to a returned -EINVAL.
> * Fix credential double-free error case.
> * Complete struct landlock_attr_size with size_attr_enforce.
> * Fix undefined reference to syscall when Landlock is not selected.
> * Remove f.file->f_path.mnt check (suggested by Al Viro).
> * Add build-time checks.
> * Move ABI checks from fs.c .
> * Constify variables.
> * Fix spelling.
> * Add comments.
> 
> Changes since v13:
> * New implementation, replacing the dependency on seccomp(2) and bpf(2).
> ---
>  include/linux/syscalls.h      |   8 +
>  include/uapi/linux/landlock.h | 166 ++++++++++
>  kernel/sys_ni.c               |   6 +
>  security/landlock/Makefile    |   2 +-
>  security/landlock/syscall.c   | 554 ++++++++++++++++++++++++++++++++++
>  5 files changed, 735 insertions(+), 1 deletion(-)
>  create mode 100644 security/landlock/syscall.c
> 
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b951a87da987..8373fb28eb44 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -1005,6 +1005,14 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
>  				       siginfo_t __user *info,
>  				       unsigned int flags);
>  asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags);
> +asmlinkage long sys_landlock_get_features(struct landlock_attr_features __user *features_ptr,
> +		size_t features_size, __u32 options);
> +asmlinkage long sys_landlock_create_ruleset(const struct landlock_attr_ruleset __user *ruleset_ptr,
> +		size_t ruleset_size, __u32 options);
> +asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
> +		const void __user *rule_ptr, size_t rule_size, __u32 options);
> +asmlinkage long sys_landlock_enforce_ruleset(int ruleset_fd, enum landlock_target_type target_type,
> +		int target_fd, __u32 options);
>  
>  /*
>   * Architecture-specific system calls
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 5141185e6487..eb2a5e0d61a4 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -9,6 +9,172 @@
>  #ifndef _UAPI__LINUX_LANDLOCK_H__
>  #define _UAPI__LINUX_LANDLOCK_H__
>  
> +#include <linux/types.h>
> +
> +#if 0
> +/**
> + * DOC: options_intro
> + *
> + * These options may be used as second argument of sys_landlock().  Each
> + * command have a dedicated set of options, represented as bitmasks.  For two
> + * different commands, their options may overlap.  Each command have at least
> + * one option defining the used attribute type.  This also enables to always
> + * have a usable &struct landlock_attr_features (i.e. filled with bits).
> + */
> +#endif
> +
> +/**
> + * enum landlock_rule_type - Landlock rule type
> + *
> + * Argument of sys_landlock_add_rule().
> + */
> +enum landlock_rule_type {
> +	/**
> +	 * @LANDLOCK_RULE_PATH_BENEATH: Type of a &struct
> +	 * landlock_attr_path_beneath .
> +	 */
> +	LANDLOCK_RULE_PATH_BENEATH = 1,
> +};
> +
> +/**
> + * enum landlock_target_type - Landlock target type
> + *
> + * Argument of sys_landlock_enforce_ruleset().
> + */
> +enum landlock_target_type {
> +	/**
> +	 * @LANDLOCK_TARGET_CURRENT_THREAD: Enforce a ruleset on the thread
> +	 * asking for (i.e. seccomp-like).
> +	 */
> +	LANDLOCK_TARGET_CURRENT_THREAD = 1,
> +};
> +
> +/**
> + * struct landlock_attr_features - Receives the supported features
> + *
> + * This struct should be allocated by user space but it will be filled by the
> + * kernel to indicate the subset of Landlock features effectively handled by
> + * the running kernel.  This enables backward compatibility for applications
> + * which are developed on a newer kernel than the one running the application.
> + * This helps avoid hard errors that may entirely disable the use of Landlock
> + * features because some of them may not be supported.  Indeed, because
> + * Landlock is a security feature, even if the kernel doesn't support all the
> + * requested features, user space applications should still use the subset
> + * which is supported by the running kernel.  Indeed, a partial security policy
> + * can still improve the security of the application and better protect the
> + * user (i.e. best-effort approach).  The %LANDLOCK_CMD_GET_FEATURES command
> + * and &struct landlock_attr_features are future-proof because the future
> + * unknown fields requested by user space (i.e. a larger &struct
> + * landlock_attr_features) can still be filled with zeros.
> + *
> + * The Landlock commands will fail if an unsupported option or access is
> + * requested.  By firstly requesting the supported options and accesses, it is
> + * quite easy for the developer to binary AND these returned bitmasks with the
> + * used options and accesses from the attribute structs (e.g. &struct
> + * landlock_attr_ruleset), and even infer the supported Landlock commands.
> + * Indeed, because each command must support at least one option, the options_*
> + * fields are always filled if the related commands are supported.  The
> + * supported attributes are also discoverable thanks to the size_* fields.  All
> + * this data enable to create applications doing their best to sandbox
> + * themselves regardless of the running kernel.
> + */
> +struct landlock_attr_features {
> +	/**
> +	 * @options_get_features: Options supported by
> +	 * sys_landlock_get_features().
> +	 */
> +	__u32 options_get_features;
> +	/**
> +	 * @options_create_ruleset: Options supported by
> +	 * sys_landlock_create_ruleset().
> +	 */
> +	__u32 options_create_ruleset;
> +	/**
> +	 * @options_add_rule: Options supported by sys_landlock_add_rule().
> +	 */
> +	__u32 options_add_rule;
> +	/**
> +	 * @options_enforce_ruleset: Options supported by
> +	 * sys_landlock_enforce_ruleset().
> +	 */
> +	__u32 options_enforce_ruleset;
> +	/**
> +	 * @access_fs: Subset of file system access supported by the running
> +	 * kernel, used in &landlock_attr_ruleset.handled_access_fs and
> +	 * &landlock_attr_path_beneath.allowed_access .  Cf. `Filesystem
> +	 * flags`_.
> +	 */
> +	__u64 access_fs;
> +	/**
> +	 * @size_attr_features: Size of the &struct landlock_attr_features
> +	 * (current struct) as known by the kernel (i.e. ``sizeof(struct
> +	 * landlock_attr_features)``).
> +	 */
> +	__u16 size_attr_features;
> +	/**
> +	 * @size_attr_ruleset: Size of the &struct landlock_attr_ruleset as
> +	 * known by the kernel (i.e. ``sizeof(struct
> +	 * landlock_attr_ruleset)``).
> +	 */
> +	__u16 size_attr_ruleset;
> +	/**
> +	 * @size_attr_path_beneath: Size of the &struct
> +	 * landlock_attr_path_beneath as known by the kernel (i.e.
> +	 * ``sizeof(struct landlock_attr_path_beneath)``).
> +	 */
> +	__u16 size_attr_path_beneath;
> +	/**
> +	 * @last_rule_type: Indicate the last entry of &enum
> +	 * landlock_rule_type.
> +	 */
> +	__u8 last_rule_type;
> +	/**
> +	 * @last_target_type: Indicate the last entry of &enum
> +	 * landlock_target_type.
> +	 */
> +	__u8 last_target_type;
> +};
> +
> +/**
> + * struct landlock_attr_ruleset- Defines a new ruleset
> + *
> + * Used as first attribute for the %LANDLOCK_CMD_CREATE_RULESET command and
> + * with the %LANDLOCK_OPT_CREATE_RULESET option.
> + */
> +struct landlock_attr_ruleset {
> +	/**
> +	 * @handled_access_fs: Bitmask of actions (cf. `Filesystem flags`_)
> +	 * that is handled by this ruleset and should then be forbidden if no
> +	 * rule explicitly allow them.  This is needed for backward
> +	 * compatibility reasons.  The user space code should check the
> +	 * effectively supported actions thanks to %LANDLOCK_CMD_GET_FEATURES
> +	 * and &struct landlock_attr_features, and then adjust the arguments of
> +	 * the next calls to sys_landlock() accordingly.
> +	 */
> +	__u64 handled_access_fs;
> +};
> +
> +/**
> + * struct landlock_attr_path_beneath - Defines a path hierarchy
> + */
> +struct landlock_attr_path_beneath {
> +	/**
> +	 * @allowed_access: Bitmask of allowed actions for this file hierarchy
> +	 * (cf. `Filesystem flags`_).
> +	 */
> +	__u64 allowed_access;
> +	/**
> +	 * @parent_fd: File descriptor, open with ``O_PATH``, which identify
> +	 * the parent directory of a file hierarchy, or just a file.
> +	 */
> +	__s32 parent_fd;
> +	/*
> +	 * This struct is packed to enable to append future members without
> +	 * requiring to have dummy reserved members.
> +	 * Cf. security/landlock/syscall.c:build_check_abi()
> +	 */
> +} __attribute__((packed));
> +
>  /**
>   * DOC: fs_access
>   *
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 3b69a560a7ac..d0f3cf333d86 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -264,6 +264,12 @@ COND_SYSCALL(request_key);
>  COND_SYSCALL(keyctl);
>  COND_SYSCALL_COMPAT(keyctl);
>  
> +/* security/landlock/syscall.c */
> +COND_SYSCALL(landlock_get_features);
> +COND_SYSCALL(landlock_create_ruleset);
> +COND_SYSCALL(landlock_add_rule);
> +COND_SYSCALL(landlock_enforce_ruleset);
> +
>  /* arch/example/kernel/sys_example.c */
>  
>  /* mm/fadvise.c */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index 92e3d80ab8ed..4388494779ec 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -1,4 +1,4 @@
>  obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
>  
> -landlock-y := setup.o object.o ruleset.o \
> +landlock-y := setup.o syscall.o object.o ruleset.o \
>  	cred.o ptrace.o fs.o
> diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c
> new file mode 100644
> index 000000000000..7bf4dc175dee
> --- /dev/null
> +++ b/security/landlock/syscall.c
> @@ -0,0 +1,554 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock LSM - System call and user space interfaces
> + *
> + * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
> + * Copyright © 2018-2020 ANSSI
> + */
> +
> +#include <asm/current.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/build_bug.h>
> +#include <linux/capability.h>
> +#include <linux/compiler_types.h>
> +#include <linux/dcache.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/fs.h>
> +#include <linux/limits.h>
> +#include <linux/mount.h>
> +#include <linux/path.h>
> +#include <linux/sched.h>
> +#include <linux/security.h>
> +#include <linux/stddef.h>
> +#include <linux/syscalls.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include <uapi/linux/landlock.h>
> +
> +#include "cred.h"
> +#include "fs.h"
> +#include "ruleset.h"
> +#include "setup.h"
> +
> +/**
> + * copy_struct_if_any_from_user - Safe future-proof argument copying
> + *
> + * Extend copy_struct_from_user() to handle NULL @src, which allows for future
> + * use of @src even if it is not used right now.
> + *
> + * @dst: Kernel space pointer or NULL.
> + * @ksize: Actual size of the data pointed to by @dst.
> + * @ksize_min: Minimal required size to be copied.
> + * @src: User space pointer or NULL.
> + * @usize: (Alleged) size of the data pointed to by @src.
> + */
> +static int copy_struct_if_any_from_user(void *const dst, const size_t ksize,
> +		const size_t ksize_min, const void __user *const src,
> +		const size_t usize)
> +{
> +	int ret;
> +
> +	/* Checks kernel buffer size inconsistencies. */
> +	if (dst) {
> +		if (WARN_ON_ONCE(ksize == 0))
> +			return -EFAULT;
> +	} else {
> +		if (WARN_ON_ONCE(ksize != 0))
> +			return -EFAULT;
> +	}
> +
> +	/* Checks minimal size. */
> +	if (WARN_ON_ONCE(ksize < ksize_min))
> +		return -EFAULT;
> +	if (usize < ksize_min)
> +		return -EINVAL;
> +
> +	/* Handles empty user buffer. */
> +	if (!src) {
> +		if (usize != 0)
> +			return -EFAULT;
> +		if (dst)
> +			memset(dst, 0, ksize);
> +		return 0;
> +	}
> +
> +	/* Checks user buffer size inconsistency and limit. */
> +	if (usize == 0)
> +		return -ENODATA;
> +	if (usize > PAGE_SIZE)
> +		return -E2BIG;
> +
> +	/* Copies user buffer and fills with zeros. */
> +	if (dst)
> +		return copy_struct_from_user(dst, ksize, src, usize);
> +
> +	/* Checks unknown user data. */
> +	ret = check_zeroed_user(src, usize);
> +	if (ret <= 0)
> +		return ret ?: -E2BIG;
> +	return 0;
> +}
> +
> +/* Features */
> +
> +/*
> + * This function only contains arithmetic operations with constants, leading to
> + * BUILD_BUG_ON().  The related code is evaluated and checked at build time,
> + * but it is then ignored thanks to compiler optimizations.
> + */
> +static void build_check_abi(void)
> +{
> +	size_t size_features, size_ruleset, size_path_beneath;
> +
> +	/*
> +	 * For each user space ABI structures, first checks that there is no
> +	 * hole in them, then checks that all architectures have the same
> +	 * struct size.
> +	 */
> +	size_features = sizeof_field(struct landlock_attr_features, options_get_features);
> +	size_features += sizeof_field(struct landlock_attr_features, options_create_ruleset);
> +	size_features += sizeof_field(struct landlock_attr_features, options_add_rule);
> +	size_features += sizeof_field(struct landlock_attr_features, options_enforce_ruleset);
> +	size_features += sizeof_field(struct landlock_attr_features, access_fs);
> +	size_features += sizeof_field(struct landlock_attr_features, size_attr_features);
> +	size_features += sizeof_field(struct landlock_attr_features, size_attr_ruleset);
> +	size_features += sizeof_field(struct landlock_attr_features, size_attr_path_beneath);
> +	size_features += sizeof_field(struct landlock_attr_features, last_rule_type);
> +	size_features += sizeof_field(struct landlock_attr_features, last_target_type);
> +	BUILD_BUG_ON(sizeof(struct landlock_attr_features) != size_features);
> +	BUILD_BUG_ON(sizeof(struct landlock_attr_features) != 32);
> +
> +	size_ruleset = sizeof_field(struct landlock_attr_ruleset, handled_access_fs);
> +	BUILD_BUG_ON(sizeof(struct landlock_attr_ruleset) != size_ruleset);
> +	BUILD_BUG_ON(sizeof(struct landlock_attr_ruleset) != 8);
> +
> +	size_path_beneath = sizeof_field(struct landlock_attr_path_beneath, allowed_access);
> +	size_path_beneath += sizeof_field(struct landlock_attr_path_beneath, parent_fd);
> +	BUILD_BUG_ON(sizeof(struct landlock_attr_path_beneath) != size_path_beneath);
> +	BUILD_BUG_ON(sizeof(struct landlock_attr_path_beneath) != 12);
> +}
> +
> +/**
> + * sys_landlock_get_features - Identify the supported Landlock features
> + *
> + * @features_ptr: Pointer to a &struct landlock_attr_features to be filled by
> + *		  the supported features.
> + * @features_size: Size of the pointed &struct landlock_attr_features (needed
> + *		   for backward and forward compatibility).
> + * @options: Must be 0.
> + *
> + * This system call enables to ask the kernel for supported Landlock features.
> + * This is important to build user space code compatible with older and newer
> + * kernels.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0;
> + * - ENODATA, E2BIG or EFAULT: @features_ptr or @feature_size inconsistencies.
> + */
> +SYSCALL_DEFINE3(landlock_get_features,
> +		struct landlock_attr_features __user *const, features_ptr,
> +		const size_t, features_size, const __u32, options)
> +{
> +	size_t data_size, fill_size;
> +	const struct landlock_attr_features supported = {
> +		.options_get_features = 0,
> +		.options_create_ruleset = 0,
> +		.options_add_rule = 0,
> +		.options_enforce_ruleset = 0,
> +		.access_fs = _LANDLOCK_ACCESS_FS_MASK,
> +		.size_attr_features = sizeof(struct landlock_attr_features),
> +		.size_attr_ruleset = sizeof(struct landlock_attr_ruleset),
> +		.size_attr_path_beneath = sizeof(struct landlock_attr_path_beneath),
> +		.last_rule_type = LANDLOCK_RULE_PATH_BENEATH,
> +		.last_target_type = LANDLOCK_TARGET_CURRENT_THREAD,
> +	};
> +
> +	BUILD_BUG_ON(!__same_type(supported.access_fs,
> +		((struct landlock_attr_ruleset *)NULL)->handled_access_fs));
> +	BUILD_BUG_ON(!__same_type(supported.access_fs,
> +		((struct landlock_attr_path_beneath *)NULL)->allowed_access));
> +	build_check_abi();
> +
> +	/*
> +	 * Enables user space to identify if Landlock is disabled, thanks to a
> +	 * specific error code.
> +	 */
> +	if (!landlock_initialized)
> +		return -EOPNOTSUPP;
> +
> +	/* No option for now. */
> +	if (options)
> +		return -EINVAL;
> +
> +	/* Checks argument consistency. */
> +	if (features_size == 0)
> +		return -ENODATA;
> +	if (features_size > PAGE_SIZE)
> +		return -E2BIG;
> +
> +	/* Copy a subset of features to user space. */
> +	data_size = min(sizeof(supported), features_size);
> +	if (copy_to_user(features_ptr, &supported, data_size))
> +		return -EFAULT;
> +
> +	/* Fills with zeros. */
> +	fill_size = features_size - data_size;
> +	if (fill_size > 0 && clear_user((void __user *)features_ptr + data_size, fill_size))
> +		return -EFAULT;
> +	return 0;
> +}
> +
> +/* Ruleset handling */
> +
> +static int fop_ruleset_release(struct inode *const inode,
> +		struct file *const filp)
> +{
> +	struct landlock_ruleset *ruleset = filp->private_data;
> +
> +	landlock_put_ruleset(ruleset);
> +	return 0;
> +}
> +
> +static ssize_t fop_dummy_read(struct file *const filp, char __user *const buf,
> +		const size_t size, loff_t *const ppos)
> +{
> +	/* Dummy handler to enable FMODE_CAN_READ. */
> +	return -EINVAL;
> +}
> +
> +static ssize_t fop_dummy_write(struct file *const filp,
> +		const char __user *const buf, const size_t size,
> +		loff_t *const ppos)
> +{
> +	/* Dummy handler to enable FMODE_CAN_WRITE. */
> +	return -EINVAL;
> +}
> +
> +/*
> + * A ruleset file descriptor enables to build a ruleset by adding (i.e.
> + * writing) rule after rule, without relying on the task's context.  This
> + * reentrant design is also used in a read way to enforce the ruleset on the
> + * current task.
> + */
> +static const struct file_operations ruleset_fops = {
> +	.release = fop_ruleset_release,
> +	.read = fop_dummy_read,
> +	.write = fop_dummy_write,
> +};
> +
> +/**
> + * sys_landlock_create_ruleset - Create a new ruleset
> + *
> + * @ruleset_ptr: Pointer to a &struct landlock_attr_ruleset identifying the
> + *		 scope of the new ruleset.
> + * @ruleset_size: Size of the pointed &struct landlock_attr_ruleset (needed for
> + *		  backward and forward compatibility).
> + * @options: Must be 0.
> + *
> + * This system call enables to create a new Landlock ruleset, and returns the
> + * related file descriptor on success.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0, or unknown access, or too small @ruleset_size;
> + * - ENODATA, E2BIG or EFAULT: @ruleset_ptr or @ruleset_size inconsistencies;
> + * - ENOMSG: empty &landlock_attr_ruleset.handled_access_fs.
> + */
> +SYSCALL_DEFINE3(landlock_create_ruleset,
> +		const struct landlock_attr_ruleset __user *const, ruleset_ptr,
> +		const size_t, ruleset_size, const __u32, options)
> +{
> +	struct landlock_attr_ruleset attr_ruleset;
> +	struct landlock_ruleset *ruleset;
> +	int err, ruleset_fd;
> +
> +	if (!landlock_initialized)
> +		return -EOPNOTSUPP;
> +
> +	/* No option for now. */
> +	if (options)
> +		return -EINVAL;
> +
> +	/* Copies raw user space buffer. */
> +	err = copy_struct_if_any_from_user(&attr_ruleset, sizeof(attr_ruleset),
> +			offsetofend(typeof(attr_ruleset), handled_access_fs),
> +			ruleset_ptr, ruleset_size);
> +	if (err)
> +		return err;
> +
> +	/* Checks content (and 32-bits cast). */
> +	if ((attr_ruleset.handled_access_fs | _LANDLOCK_ACCESS_FS_MASK) !=
> +			_LANDLOCK_ACCESS_FS_MASK)
> +		return -EINVAL;
> +
> +	/* Checks arguments and transforms to kernel struct. */
> +	ruleset = landlock_create_ruleset(attr_ruleset.handled_access_fs);
> +	if (IS_ERR(ruleset))
> +		return PTR_ERR(ruleset);
> +
> +	/* Creates anonymous FD referring to the ruleset. */
> +	ruleset_fd = anon_inode_getfd("landlock-ruleset", &ruleset_fops,
> +			ruleset, O_RDWR | O_CLOEXEC);
> +	if (ruleset_fd < 0)
> +		landlock_put_ruleset(ruleset);
> +	return ruleset_fd;
> +}
> +
> +/*
> + * Returns an owned ruleset from a FD. It is thus needed to call
> + * landlock_put_ruleset() on the return value.
> + */
> +static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
> +		const fmode_t mode)
> +{
> +	struct fd ruleset_f;
> +	struct landlock_ruleset *ruleset;
> +	int err;
> +
> +	ruleset_f = fdget(fd);
> +	if (!ruleset_f.file)
> +		return ERR_PTR(-EBADF);
> +
> +	/* Checks FD type and access right. */
> +	err = 0;
> +	if (ruleset_f.file->f_op != &ruleset_fops)
> +		err = -EBADFD;
> +	else if (!(ruleset_f.file->f_mode & mode))
> +		err = -EPERM;
> +	if (!err) {
> +		ruleset = ruleset_f.file->private_data;
> +		landlock_get_ruleset(ruleset);
> +	}
> +	fdput(ruleset_f);
> +	return err ? ERR_PTR(err) : ruleset;
> +}
> +
> +/* Path handling */
> +
> +/*
> + * @path: Must call put_path(@path) after the call if it succeeded.
> + */
> +static int get_path_from_fd(const s32 fd, struct path *const path)
> +{
> +	struct fd f;
> +	int err = 0;
> +
> +	BUILD_BUG_ON(!__same_type(fd,
> +		((struct landlock_attr_path_beneath *)NULL)->parent_fd));
> +
> +	/* Handles O_PATH. */
> +	f = fdget_raw(fd);
> +	if (!f.file)
> +		return -EBADF;
> +	/*
> +	 * Only allows O_PATH file descriptor: enables to restrict ambient
> +	 * filesystem access without requiring to open and risk leaking or
> +	 * misusing a file descriptor.  Forbid internal filesystems (e.g.
> +	 * nsfs), including pseudo filesystems that will never be mountable
> +	 * (e.g. sockfs, pipefs).
> +	 */
> +	if (!(f.file->f_mode & FMODE_PATH) ||
> +			(f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
> +			(f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
> +			d_is_negative(f.file->f_path.dentry) ||
> +			IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
> +		err = -EBADFD;
> +		goto out_fdput;
> +	}
> +	path->mnt = f.file->f_path.mnt;
> +	path->dentry = f.file->f_path.dentry;
> +	path_get(path);
> +
> +out_fdput:
> +	fdput(f);
> +	return err;
> +}
> +
> +/**
> + * sys_landlock_add_rule - Add a new rule to a ruleset
> + *
> + * @ruleset_fd: File descriptor tied to the ruleset which should be extended
> + *		with the new rule.
> + * @rule_type: Identify the structure type pointed to by @rule_ptr.
> + * @rule_ptr: Pointer to a rule (the currently only supported rule is &struct
> + *	      landlock_attr_path_beneath).
> + * @rule_size: Size of the struct pointed to by @rule_ptr.
> + * @options: Must be 0.
> + *
> + * This system call enables to define a new rule and add it to an existing
> + * ruleset.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0, or inconsistent access in the rule (i.e.
> + *   &landlock_attr_path_beneath.allowed_access is not a subset of the rule's
> + *   accesses), or too small @rule_size (according to the underlying rule
> + *   type);
> + * - EBADF: @ruleset_fd is not a file descriptor for the current thread;
> + * - EBADFD: @ruleset_fd is not a ruleset file descriptor;
> + * - EPERM: @ruleset_fd has no write access to the underlying ruleset;
> + * - ENODATA, E2BIG or EFAULT: @rule_ptr or @rule_size inconsistencies;
> + */
> +SYSCALL_DEFINE5(landlock_add_rule,
> +		const int, ruleset_fd, const enum landlock_rule_type, rule_type,
> +		const void __user *const, rule_ptr, const size_t, rule_size,
> +		const __u32, options)
> +{
> +	struct landlock_attr_path_beneath attr_path_beneath;
> +	struct path path;
> +	struct landlock_ruleset *ruleset;
> +	int err;
> +
> +	if (!landlock_initialized)
> +		return -EOPNOTSUPP;
> +
> +	/* No option for now. */
> +	if (options)
> +		return -EINVAL;
> +
> +	if (rule_type != LANDLOCK_RULE_PATH_BENEATH)
> +		return -EINVAL;
> +
> +	/* Copies raw user space buffer. */
> +	err = copy_struct_if_any_from_user(&attr_path_beneath,
> +			sizeof(attr_path_beneath),
> +			offsetofend(typeof(attr_path_beneath), allowed_access),
> +			rule_ptr, rule_size);
> +	if (err)
> +		return err;
> +
> +	/* Gets and checks the ruleset. */
> +	ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_WRITE);
> +	if (IS_ERR(ruleset))
> +		return PTR_ERR(ruleset);
> +
> +	/*
> +	 * Checks that allowed_access matches the @ruleset constraints
> +	 * (ruleset->fs_access_mask is automatically upgraded to 64-bits).
> +	 * Allows empty allowed_access i.e., deny @ruleset->fs_access_mask .
> +	 */
> +	if ((attr_path_beneath.allowed_access | ruleset->fs_access_mask) !=
> +			ruleset->fs_access_mask) {
> +		err = -EINVAL;
> +		goto out_put_ruleset;
> +	}
> +
> +	/* Gets and checks the new rule. */
> +	err = get_path_from_fd(attr_path_beneath.parent_fd, &path);
> +	if (err)
> +		goto out_put_ruleset;
> +
> +	/* Imports the new rule. */
> +	err = landlock_append_fs_rule(ruleset, &path,
> +			attr_path_beneath.allowed_access);
> +	path_put(&path);
> +
> +out_put_ruleset:
> +	landlock_put_ruleset(ruleset);
> +	return err;
> +}
> +
> +/* Enforcement */
> +
> +/**
> + * sys_landlock_enforce_ruleset - Enforce a ruleset
> + *
> + * @ruleset_fd: File descriptor tied to the ruleset to merge with the target.
> + * @target_type: Identify which type of target to enforce the ruleset on,
> + *		 currently only the current thread is supported (i.e.
> + *		 seccomp-like).
> + * @target_fd: Must be -1.
> + * @options: Must be 0.
> + *
> + * This system call enables to enforce a Landlock ruleset on the current
> + * thread.  Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in its
> + * namespace or be running with no_new_privs.  This avoids scenarios where
> + * unprivileged tasks can affect the behavior of privileged children.
> + *
> + * Possible returned errors are:
> + *
> + * - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
> + * - EINVAL: @options is not 0, or @target_type is not
> + *   %LANDLOCK_TARGET_CURRENT_THREAD, or @target_fd is not -1;
> + * - EBADF: @ruleset_fd is not a file descriptor for the current thread;
> + * - EBADFD: @ruleset_fd is not a ruleset file descriptor;
> + * - EPERM: @ruleset_fd has no read access to the underlying ruleset, or the
> + *   current thread is not running with no_new_privs (or doesn't have
> + *   CAP_SYS_ADMIN in its namespace).
> + */
> +SYSCALL_DEFINE4(landlock_enforce_ruleset,
> +		const int, ruleset_fd, const enum landlock_target_type, target_type,
> +		const int, target_fd, const __u32, options)
> +{
> +	struct landlock_ruleset *new_dom, *ruleset;
> +	struct cred *new_cred;
> +	struct landlock_cred_security *new_llcred;
> +	int err;
> +
> +	if (!landlock_initialized)
> +		return -EOPNOTSUPP;
> +
> +	/* No option for now. */
> +	if (options)
> +		return -EINVAL;
> +
> +	/* Only target the current thread for now. */
> +	if (target_type != LANDLOCK_TARGET_CURRENT_THREAD)
> +		return -EINVAL;
> +	if (target_fd != -1)
> +		return -EINVAL;
> +
> +	/*
> +	 * Similar checks as for seccomp(2), except that an -EPERM may be
> +	 * returned.
> +	 */
> +	if (!task_no_new_privs(current)) {
> +		err = security_capable(current_cred(), current_user_ns(),
> +				CAP_SYS_ADMIN, CAP_OPT_NOAUDIT);
> +		if (err)
> +			return err;
> +	}
> +
> +	/* Gets and checks the ruleset. */
> +	ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_READ);
> +	if (IS_ERR(ruleset))
> +		return PTR_ERR(ruleset);
> +
> +	/* Prepares new credentials. */
> +	new_cred = prepare_creds();
> +	if (!new_cred) {
> +		err = -ENOMEM;
> +		goto out_put_ruleset;
> +	}
> +	new_llcred = landlock_cred(new_cred);
> +
> +	/*
> +	 * There is no possible race condition while copying and manipulating
> +	 * the current credentials because they are dedicated per thread.
> +	 */
> +	new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset);
> +	if (IS_ERR(new_dom)) {
> +		err = PTR_ERR(new_dom);
> +		goto out_put_creds;
> +	}
> +
> +	/* Replaces the old (prepared) domain. */
> +	landlock_put_ruleset(new_llcred->domain);
> +	new_llcred->domain = new_dom;
> +
> +	landlock_put_ruleset(ruleset);
> +	return commit_creds(new_cred);
> +
> +out_put_creds:
> +	abort_creds(new_cred);
> +	return err;
> +
> +out_put_ruleset:
> +	landlock_put_ruleset(ruleset);
> +	return err;
> +}
> 

^ permalink raw reply

* Re: [PATCH 0/3] Smack: Use the netlbl incoming cache
From: Casey Schaufler @ 2020-08-14  0:32 UTC (permalink / raw)
  To: Paul Moore; +Cc: SMACK-discuss, linux-security-module, Casey Schaufler
In-Reply-To: <fba5a257-599f-b2e5-d4bf-57269fc7734b@schaufler-ca.com>

On 8/13/2020 9:36 AM, Casey Schaufler wrote:
> On 8/11/2020 7:10 PM, Paul Moore wrote:
>> On Tue, Aug 11, 2020 at 8:39 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>> Update the Smack security module to use the Netlabel cache
>>> mechanism to speed the processing of incoming labeled packets.
>>> There is some refactoring of the existing code that makes it
>>> simpler, and reduces duplication. The outbound packet labeling
>>> is also optimized to track the labeling state of the socket.
>>> Prior to this the socket label was redundantly set on each
>>> packet send.
>>>
>>> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
>>> ---
>>>  security/smack/smack.h        |  19 ++--
>>>  security/smack/smack_access.c |  55 ++++++----
>>>  security/smack/smack_lsm.c    | 245 ++++++++++++++++++++++++------------------
>>>  security/smack/smackfs.c      |  23 ++--
>>>  4 files changed, 193 insertions(+), 149 deletions(-)
>> FWIW, I gave this a cursory look just now and the NetLabel usage
>> seemed reasonable.  Out of curiosity, have you done any before/after
>> performance tests?
> It's early in the benchmark process, but TCP looks to be about 6% better.
> UDP numbers should be better. I'm not expecting the level of improvement
> SELinux saw because the label mapping from CIPSO isn't as sophisticated
> for Smack as it is for SELinux.

UDP looks like a 12% improvement, which I had expected.
On the whole, worth the effort.


^ permalink raw reply

* Re: [PATCH 0/3] Smack: Use the netlbl incoming cache
From: Paul Moore @ 2020-08-14  2:03 UTC (permalink / raw)
  To: Casey Schaufler; +Cc: SMACK-discuss, linux-security-module
In-Reply-To: <b1fcf225-83f4-dbed-8d70-f2e0194e70db@schaufler-ca.com>

On August 13, 2020 8:32:16 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 8/13/2020 9:36 AM, Casey Schaufler wrote:
>> On 8/11/2020 7:10 PM, Paul Moore wrote:
>>> On Tue, Aug 11, 2020 at 8:39 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>> Update the Smack security module to use the Netlabel cache
>>>> mechanism to speed the processing of incoming labeled packets.
>>>> There is some refactoring of the existing code that makes it
>>>> simpler, and reduces duplication. The outbound packet labeling
>>>> is also optimized to track the labeling state of the socket.
>>>> Prior to this the socket label was redundantly set on each
>>>> packet send.
>>>>
>>>> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
>>>> ---
>>>> security/smack/smack.h        |  19 ++--
>>>> security/smack/smack_access.c |  55 ++++++----
>>>> security/smack/smack_lsm.c    | 245 ++++++++++++++++++++++++------------------
>>>> security/smack/smackfs.c      |  23 ++--
>>>> 4 files changed, 193 insertions(+), 149 deletions(-)
>>> FWIW, I gave this a cursory look just now and the NetLabel usage
>>> seemed reasonable.  Out of curiosity, have you done any before/after
>>> performance tests?
>> It's early in the benchmark process, but TCP looks to be about 6% better.
>> UDP numbers should be better. I'm not expecting the level of improvement
>> SELinux saw because the label mapping from CIPSO isn't as sophisticated
>> for Smack as it is for SELinux.
>
> UDP looks like a 12% improvement, which I had expected.
> On the whole, worth the effort.

Great, thanks for the follow-up.

--
paul moore
www.paul-moore.com




^ permalink raw reply

* Re: file metadata via fs API
From: Lennart Poettering @ 2020-08-14  7:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Whitehouse, David Howells, Miklos Szeredi, linux-fsdevel,
	Al Viro, Karel Zak, Jeff Layton, Miklos Szeredi, Nicolas Dichtel,
	Christian Brauner, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wgz5H-xYG4bOrHaEtY7rvFA1_6+mTSpjrgK8OsNbfF+Pw@mail.gmail.com>

On Mi, 12.08.20 12:50, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> On Wed, Aug 12, 2020 at 12:34 PM Steven Whitehouse <swhiteho@redhat.com> wrote:
> >
> > The point of this is to give us the ability to monitor mounts from
> > userspace.
>
> We haven't had that before, I don't see why it's suddenly such a big deal.
>
> The notification side I understand. Polling /proc files is not the answer.
>
> But the whole "let's design this crazy subsystem for it" seems way
> overkill. I don't see anybody caring that deeply.
>
> It really smells like "do it because we can, not because we must".

With my systemd maintainer hat on (and of other userspace stuff),
there's a couple of things I really want from the kernel because it
would fix real problems for us:

1. we want mount notifications that don't require to scan
   /proc/self/mountinfo entirely again every time things change, over
   and over again, simply because that doesn't scale. We have various
   bugs open about this performance bottleneck, I could point you to,
   but I figure it's easy to see why this currently doesn't scale...

2. We want an unpriv API to query (and maybe set) the fs UUID, like we
   have nowadays for the fs label FS_IOC_[GS]ETFSLABEL

3. We want an API to query time granularity of file systems
   timestamps. Otherwise it's so hard in userspace to reproducibly
   re-generate directory trees. We need to know for example that some
   fs only has 2s granularity (like fat).

4. Similar, we want to know if an fs is case-sensitive for file
   names. Or case-preserving. And which charset it accepts for filenames.

5. We want to know if a file system supports access modes, xattrs,
   file ownership, device nodes, symlinks, hardlinks, fifos, atimes,
   btimes, ACLs and so on. All these things currently can only be
   figured out by changing things and reading back if it worked. Which
   sucks hard of course.

6. We'd like to know the max file size on a file system.

7. Right now it's hard to figure out mount options used for the fs
   backing some file: you can now statx() the file, determine the
   mnt_id by that, and then search that in /proc/self/mountinfo, but
   it's slow, because again we need to scan the whole file until we
   find the entry we need. And that can be huge IRL.

8. Similar: we quite often want to know submounts of a mount. It would
   be great if for that kind of information (i.e. list of mnt_ids
   below some other mnt_id) we wouldn't have to scan the whole of
   /p/s/mi again. In many cases in our code we operate recursively,
   and want to know the mounts below some specific dir, but currently
   pay performance price for it if the number of file systems on the
   host is huge. This doesn't sound like a biggie, but actually is a
   biggie. In systemd we spend a lot of time scaninng /p/s/mi...

9. How are file locks implemented on this fs? Are they local only, and
   orthogonal to remote locks? Are POSIX and BSD locks possibly merged
   at the backend? Do they work at all?

I don't really care too much how an API for this looks like, but let
me just say that I am not a fan of APIs that require allocating an fd
for querying info about an fd. This 'feels' a bit too recursive: if
you expose information about some fd in some magic procfs subdir, or
even in some virtual pseudo-file below the file's path then this means
we have to allocate a new fd to figure out things or the first fd, and
if we'd know the same info for that, we'd theoretically recurse
down. Now of course, most likely IRL we wouldn't actually recurse down,
but it is still smelly. In particular if fd limits are tight. I mean,
I really don't care if you expose non-file-system stuff via the fs, if
that's what you want, but I think exposing *fs* metainfo in the *fs*,
it's just ugly.

I generally detest APIs that have no chance to ever returning multiple
bits of information atomically. Splitting up querying of multiple
attributes into multiple system calls means they couldn't possibly be
determined in a congruent way. I much prefer APIs where we provide a
struct to fill in and do a single syscall, and at least for some
fields we'd know afterwards that the fields were filled in together
and are congruent with each other.

I am a fan of the statx() system call I must say. If we had something
like this for the file system itself I'd be quite happy, it could tick
off many of the requests I list above.

Hope this is useful,

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Lennart Poettering @ 2020-08-14  8:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, Miklos Szeredi, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Linux API, Ian Kent, LSM, Linux Kernel Mailing List
In-Reply-To: <CAHk-=wiPx0UJ6Q1X=azwz32xrSeKnTJcH8enySwuuwnGKkHoPA@mail.gmail.com>

On Mi, 12.08.20 11:18, Linus Torvalds (torvalds@linux-foundation.org) wrote:

> On Tue, Aug 11, 2020 at 5:05 PM David Howells <dhowells@redhat.com> wrote:
> >
> > Well, the start of it was my proposal of an fsinfo() system call.
>
> Ugh. Ok, it's that thing.
>
> This all seems *WAY* over-designed - both your fsinfo and Miklos' version.
>
> What's wrong with fstatfs()? All the extra magic metadata seems to not
> really be anything people really care about.
>
> What people are actually asking for seems to be some unique mount ID,
> and we have 16 bytes of spare information in 'struct statfs64'.

statx() exposes a `stx_mnt_id` field nowadays. So that's easy and
quick to get nowadays. It's just so inefficient matching that up with
/proc/self/mountinfo then. And it still won't give you any of the fs
capability bits (time granularity, max file size, features, …),
because the kernel doesn't expose that at all right now.

OTOH I'd already be quite happy if struct statfs64 would expose
f_features, f_max_fsize, f_time_granularity, f_charset_case_handling
fields or so.

Lennart

--
Lennart Poettering, Berlin

^ permalink raw reply

* Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)
From: Chuck Lever @ 2020-08-14 14:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mimi Zohar, James Morris, Deven Bowers, Pavel Machek, Sasha Levin,
	snitzer, dm-devel, tyhicks, agk, Paul Moore, Jonathan Corbet,
	nramas, serge, pasha.tatashin, Jann Horn, linux-block, Al Viro,
	Jens Axboe, mdsakib, open list, eparis, linux-security-module,
	linux-audit, linux-fsdevel, linux-integrity, jaskarankhurana
In-Reply-To: <1597331416.3708.26.camel@HansenPartnership.com>



> On Aug 13, 2020, at 11:10 AM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> On Thu, 2020-08-13 at 10:42 -0400, Chuck Lever wrote:
>>> On Aug 12, 2020, at 11:51 AM, James Bottomley <James.Bottomley@Hans
>>> enPartnership.com> wrote:
>>> On Wed, 2020-08-12 at 10:15 -0400, Chuck Lever wrote:
>>>>> On Aug 11, 2020, at 11:53 AM, James Bottomley
>>>>> <James.Bottomley@HansenPartnership.com> wrote:
>>>>> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
> [...]
>>>>>>>> The client would have to reconstruct that tree again if
>>>>>>>> memory pressure caused some or all of the tree to be
>>>>>>>> evicted, so perhaps an on-demand mechanism is preferable.
>>>>>>> 
>>>>>>> Right, but I think that's implementation detail.  Probably
>>>>>>> what we need is a way to get the log(N) verification hashes
>>>>>>> from the server and it's up to the client whether it caches
>>>>>>> them or not.
>>>>>> 
>>>>>> Agreed, these are implementation details. But see above about
>>>>>> the trustworthiness of the intermediate hashes. If they are
>>>>>> conveyed on an untrusted network, then they can't be trusted
>>>>>> either.
>>>>> 
>>>>> Yes, they can, provided enough of them are asked for to
>>>>> verify.  If you look at the simple example above, suppose I
>>>>> have cached H11 and H12, but I've lost the entire H2X layer.  I
>>>>> want to verify B3 so I also ask you for your copy of H24.  Then
>>>>> I generate H23 from B3 and Hash H23 and H24.  If this doesn't
>>>>> hash to H12 I know either you supplied me the wrong block or
>>>>> lied about H24.  However, if it all hashes correctly I know you
>>>>> supplied me with both the correct B3 and the correct H24.
>>>> 
>>>> My point is there is a difference between a trusted cache and an
>>>> untrusted cache. I argue there is not much value in a cache where
>>>> the hashes have to be verified again.
>>> 
>>> And my point isn't about caching, it's about where the tree comes
>>> from. I claim and you agree the client can get the tree from the
>>> server a piece at a time (because it can path verify it) and
>>> doesn't have to generate it itself.
>> 
>> OK, let's focus on where the tree comes from. It is certainly
>> possible to build protocol to exchange parts of a Merkle tree.
> 
> Which is what I think we need to extend IMA to do.
> 
>> The question is how it might be stored on the server.
> 
> I think the only thing the server has to guarantee to store is the head
> hash, possibly signed.
> 
>> There are some underlying assumptions about the metadata storage
>> mechanism that should be stated up front.
>> 
>> Current forms of IMA metadata are limited in size and stored in a
>> container that is read and written in a single operation. If we stick
>> with that container format, I don't see a way to store a Merkle tree
>> in there for all file sizes.
> 
> Well, I don't think you need to.  The only thing that needs to be
> stored is the head hash.  Everything else can be reconstructed.  If you
> asked me to implement it locally, I'd probably put the head hash in an
> xattr but use a CAM based cache for the merkel trees and construct the
> tree on first access if it weren't already in the cache.

The contents of the security.ima xattr might be modeled after
EVM_IMA_DIGSIG:

- a format enumerator (used by all IMA metadata formats)
- the tree's unit size
- a fingerprint of the signer's certificate
  - digest algorithm name and full digest
- the root hash, always signed
  - signing algorithm name and signature

The rest of the hash tree is always stored somewhere else or
constructed on-demand.

My experience of security communities both within and outside the
IETF is that they would insist on always having a signature.

If one doesn't care about signing, a self-signed certificate can be
automatically provisioned when ima-evm-utils is installed that can
be used for those cases. That would make the signature process
invisible to any administrator who doesn't care about signed
metadata.

Because storage in NFS would cross trust boundaries, it would have
to require the use of a signed root hash. I don't want to be in the
position where copying a file with an unsigned root hash into NFS
makes it unreadable because of a change in policy.


> However, the above isn't what fs-verity does: it stores the tree in a
> hidden section of the file.  That's why I don't think we'd mandate
> anything about tree storage.  Just describe the partial retrieval
> properties we'd like and leave the rest as an implementation detail.

I'm starting to consider how much compatibility with fs-verity is
required. There are several forms of hash-tree, and a specification
of the IMA metadata format would need to describe exactly how to
form the tree root. If we want compatibility with fs-verity, then
it is reasonable to assume that this IMA metadata format might be
required to use the same hash tree construction algorithm that
fs-verity uses.

The original Merkle tree concept was patented 40 years ago. I'm not
clear yet on whether the patent encumbers the use of Merkle trees
in any way, but since their usage seems pretty widespread in P2P
and BitCoin applications, I'm guessing the answer to that is
favorable. More research needed.

There is an implementation used by several GNU utilities that is
available as a piece of GPL code. It could be a potential blocker
if that was the tree algorithm that fs-verity uses -- as discussed
in the other thread.

Apparently there are some known weaknesses in older hash tree
algorithms, including at least one CVE. We could choose a recent
algorithm, but perhaps there needs to be a degree of extensibility
in case that algorithm needs to be updated due to a subsequent
security issue.

Tree construction could include a few items besides file content to
help secure the hash further. For instance the file's size and mtime,
as well as the depth of the tree, could be included in the signature.
But that depends on whether it can be done while maintaining
compatibility with fs-verity.

I would feel better if someone with more domain expertise chimed in.


>> Thus it seems to me that we cannot begin to consider the tree-on-the-
>> server model unless there is a proposed storage mechanism for that
>> whole tree. Otherwise, the client must have the primary role in
>> unpacking and verifying the tree.
> 
> Well, as I said,  I don't think you need to store the tree.

We basically agree there.


> You certainly could decide to store the entire tree (as fs-verity does) if
> it fitted your use case, but it's not required.  Perhaps even in my
> case I'd make the CAM based cache persistent, like android's dalvik
> cache.
> 
> James
> 
> 
>> Storing only the tree root in the metadata means the metadata format
>> is nicely bounded in size.

--
Chuck Lever
chucklever@gmail.com




^ permalink raw reply

* Re: file metadata via fs API (was: [GIT PULL] Filesystem Information)
From: Linus Torvalds @ 2020-08-14 17:05 UTC (permalink / raw)
  To: Jeffrey E Altman
  Cc: David Howells, Miklos Szeredi, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <679456f1-5867-4017-b1d6-95197d2fa81b@auristor.com>

On Wed, Aug 12, 2020 at 8:53 PM Jeffrey E Altman <jaltman@auristor.com> wrote:
>
> For the AFS community, fsinfo offers a method of exposing some server
> and volume properties that are obtained via "path ioctls" in OpenAFS and
> AuriStorFS.  Some example of properties that might be exposed include
> answers to questions such as:

Note that several of the questions you ask aren't necessarily
mount-related at all.

Doing it by mount ends up being completely the wrong thing.

For example, at a minimum, these guys may well be per-directory (or
even possibly per-file):

>  * where is a mounted volume hosted? which fileservers, named by uuid
>  * what is the block size? 1K, 4K, ...
>  * are directories just-send-8, case-sensitive, case-preserving, or
>    case-insensitive?
>  * if not just-send-8, what character set is used?
>  * if Unicode, what normalization rules? etc.
>  * what volume security policy (authn, integ, priv) is assigned, if any?
>  * what is the replication policy, if any?
>  * what is the volume encryption policy, if any?

and trying to solve this with some kind of "mount info" is pure garbage.

Honestly, I really think you may want an extended [f]statfs(), not
some mount tracking.

                 Linus

^ permalink raw reply

* Re: [PATCH RESEND] device_cgroup: Fix RCU list debugging warning
From: Stephen Rothwell @ 2020-08-17  2:07 UTC (permalink / raw)
  To: Amol Grover, James Morris
  Cc: Serge E. Hallyn, Paul E. McKenney, linux-kernel,
	linux-kernel-mentees, Joel Fernandes, Madhuparna Bhowmik,
	linux-security-module
In-Reply-To: <20200608041734.GA10911@mail.hallyn.com>

[-- Attachment #1: Type: text/plain, Size: 2603 bytes --]

Hi all,

On Sun, 7 Jun 2020 23:17:34 -0500 "Serge E. Hallyn" <serge@hallyn.com> wrote:
> On Sun, Jun 07, 2020 at 12:08:40PM -0700, Paul E. McKenney wrote:
> > On Sun, Jun 07, 2020 at 06:23:40AM +1000, Stephen Rothwell wrote:  
> > > 
> > > On Mon, 6 Apr 2020 16:29:50 +0530 Amol Grover <frextrite@gmail.com> wrote:  
> > > >
> > > > exceptions may be traversed using list_for_each_entry_rcu()
> > > > outside of an RCU read side critical section BUT under the
> > > > protection of decgroup_mutex. Hence add the corresponding
> > > > lockdep expression to fix the following false-positive
> > > > warning:
> > > > 
> > > > [    2.304417] =============================
> > > > [    2.304418] WARNING: suspicious RCU usage
> > > > [    2.304420] 5.5.4-stable #17 Tainted: G            E
> > > > [    2.304422] -----------------------------
> > > > [    2.304424] security/device_cgroup.c:355 RCU-list traversed in non-reader section!!
> > > > 
> > > > Signed-off-by: Amol Grover <frextrite@gmail.com>
> > > > ---
> > > >  security/device_cgroup.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> > > > index 7d0f8f7431ff..b7da9e0970d9 100644
> > > > --- a/security/device_cgroup.c
> > > > +++ b/security/device_cgroup.c
> > > > @@ -352,7 +352,8 @@ static bool match_exception_partial(struct list_head *exceptions, short type,
> > > >  {
> > > >  	struct dev_exception_item *ex;
> > > >  
> > > > -	list_for_each_entry_rcu(ex, exceptions, list) {
> > > > +	list_for_each_entry_rcu(ex, exceptions, list,
> > > > +				lockdep_is_held(&devcgroup_mutex)) {
> > > >  		if ((type & DEVCG_DEV_BLOCK) && !(ex->type & DEVCG_DEV_BLOCK))
> > > >  			continue;
> > > >  		if ((type & DEVCG_DEV_CHAR) && !(ex->type & DEVCG_DEV_CHAR))
> > > 
> > > I have been carrying the above patch in linux-next for some time now.
> > > I have been carrying it because it fixes problems for syzbot (see the
> > > third warning in
> > > https://lore.kernel.org/linux-next/CACT4Y+YnjK+kq0pfb5fe-q1bqe2T1jq_mvKHf--Z80Z3wkyK1Q@mail.gmail.com/).
> > > Is there some reason it has not been applied to some tree?  
> > 
> > The RCU changes on which this patch depends have long since made it to
> > mainline, so it can go up any tree.  I can take it if no one else will,
> > but it might be better going in via the security tree.
> 
> James, do you mind pulling it in?

I am still carrying this patch.  Has it been superceded, or is it still
necessary?

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH] watch_queue: Limit the number of watches a user can hold
From: David Howells @ 2020-08-17 10:07 UTC (permalink / raw)
  To: torvalds
  Cc: Jarkko Sakkinen, dhowells, keyrings, linux-security-module,
	linux-kernel

Impose a limit on the number of watches that a user can hold so that they
can't use this mechanism to fill up all the available memory.

This is done by putting a counter in user_struct that's incremented when a
watch is allocated and decreased when it is released.  If the number
exceeds the RLIMIT_NOFILE limit, the watch is rejected with EAGAIN.

This can be tested by the following means:

 (1) Create a watch queue and attach it to fd 5 in the program given - in
     this case, bash:

	keyctl watch_session /tmp/nlog /tmp/gclog 5 bash

 (2) In the shell, set the maximum number of files to, say, 99:

	ulimit -n 99

 (3) Add 200 keyrings:

	for ((i=0; i<200; i++)); do keyctl newring a$i @s || break; done

 (4) Try to watch all of the keyrings:

	for ((i=0; i<200; i++)); do echo $i; keyctl watch_add 5 %:a$i || break; done

     This should fail when the number of watches belonging to the user hits
     99.

 (5) Remove all the keyrings and all of those watches should go away:

	for ((i=0; i<200; i++)); do keyctl unlink %:a$i; done

 (6) Kill off the watch queue by exiting the shell spawned by
     watch_session.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---

 include/linux/sched/user.h |    3 +++
 kernel/watch_queue.c       |    8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 917d88edb7b9..a8ec3b6093fc 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -36,6 +36,9 @@ struct user_struct {
     defined(CONFIG_NET) || defined(CONFIG_IO_URING)
 	atomic_long_t locked_vm;
 #endif
+#ifdef CONFIG_WATCH_QUEUE
+	atomic_t nr_watches;	/* The number of watches this user currently has */
+#endif
 
 	/* Miscellaneous per-user rate limit */
 	struct ratelimit_state ratelimit;
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index f74020f6bd9d..0ef8f65bd2d7 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -393,6 +393,7 @@ static void free_watch(struct rcu_head *rcu)
 	struct watch *watch = container_of(rcu, struct watch, rcu);
 
 	put_watch_queue(rcu_access_pointer(watch->queue));
+	atomic_dec(&watch->cred->user->nr_watches);
 	put_cred(watch->cred);
 }
 
@@ -452,6 +453,13 @@ int add_watch_to_object(struct watch *watch, struct watch_list *wlist)
 	watch->cred = get_current_cred();
 	rcu_assign_pointer(watch->watch_list, wlist);
 
+	if (atomic_inc_return(&watch->cred->user->nr_watches) >
+	    task_rlimit(current, RLIMIT_NOFILE)) {
+		atomic_dec(&watch->cred->user->nr_watches);
+		put_cred(watch->cred);
+		return -EAGAIN;
+	}
+
 	spin_lock_bh(&wqueue->lock);
 	kref_get(&wqueue->usage);
 	kref_get(&watch->usage);



^ permalink raw reply related

* Re: file metadata via fs API
From: Steven Whitehouse @ 2020-08-17 11:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, Miklos Szeredi, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <CAHk-=wgz5H-xYG4bOrHaEtY7rvFA1_6+mTSpjrgK8OsNbfF+Pw@mail.gmail.com>

Hi,

On 12/08/2020 20:50, Linus Torvalds wrote:
> On Wed, Aug 12, 2020 at 12:34 PM Steven Whitehouse <swhiteho@redhat.com> wrote:
>> The point of this is to give us the ability to monitor mounts from
>> userspace.
> We haven't had that before, I don't see why it's suddenly such a big deal.
>
> The notification side I understand. Polling /proc files is not the answer.
>
> But the whole "let's design this crazy subsystem for it" seems way
> overkill. I don't see anybody caring that deeply.
>
> It really smells like "do it because we can, not because we must".
>
> Who the hell cares about monitoring mounts at a kHz frequencies? If
> this is for MIS use, you want a nice GUI and not wasting CPU time
> polling.
>
> I'm starting to ignore the pull requests from David Howells, because
> by now they have had the same pattern for a couple of years now:
> esoteric new interfaces that seem overdesigned for corner-cases that
> I'm not seeing people clamoring for.
>
> I need (a) proof this is actualyl something real users care about and
> (b) way more open discussion and implementation from multiple parties.
>
> Because right now it looks like a small in-cabal of a couple of people
> who have wild ideas but I'm not seeing the wider use of it.
>
> Convince me otherwise. AGAIN. This is the exact same issue I had with
> the notification queues that I really wanted actual use-cases for, and
> feedback from actual outside users.
>
> I really think this is engineering for its own sake, rather than
> responding to actual user concerns.
>
>                 Linus
>

I've been hesitant to reply to this immediately, because I can see that 
somehow there is a significant disconnect between what you expect to 
happen, and what has actually happened in this case. Have pondered this 
for a few days, I hope that the best way forward might be to explore 
where the issues are, with the intention of avoiding a repeat in the 
future. Sometimes email is a difficult medium for these kinds of 
communication, and face to face is better, but with the lack of 
conferences/travel at the moment, that option is not open in the near 
future.

The whole plan here, leading towards the ability to get a "dump plus 
updates" view of mounts in the kernel has been evolving over time. It 
has been discussed at LSF over a number of years [1] and in fact the new 
mount API which was merged recently - I wonder if this is what you are 
referring to above as:

> I'm starting to ignore the pull requests from David Howells, because
> by now they have had the same pattern for a couple of years now

was originally proposed by Al, and also worked on by Miklos[2] in 2017 
and others. Community discussion resulted in that becoming a 
prerequisite for the later notifications/fsinfo work. This was one of 
the main reasons that David picked it up[3] to work on, but not the only 
reason. That did also appear to be logical, in that cleaning up the way 
in which arguments were handled during mount would make it much easier 
to create future generic code to handle them.

That said, the overall aim here is to solve the problem and if there are 
better solutions available then I'm sure that everyone is very open to 
those. I agree very much that monitoring at kHz frequencies is not 
useful, but at the same time, there are cases which can generate large 
amounts of mount changes in a very short time period. We want to be 
reasonably efficient, but not to over-optimise, and sometimes that is a 
fine line. We also don't want to block mounts if the notifications queue 
fills up, so some kind of resync operation would be required in the 
queue overflows. The notifications and fsinfo were designed very much as 
two sides of the same coin, but submitted separately for ease of review 
more than anything else.

You recently requested some details of real users for the notifications, 
and (I assumed) by extension fsinfo too. Ian wrote these emails [4][5] 
in direct response to your request. That is what we thought you were 
looking for, so if that isn't not quite what you meant, perhaps you 
could clarify a bit more. Again, apologies if we've misinterpreted what 
you were asking for.

You also mention "...it looks like a small in-cabal of a couple of 
people..." and I hope that it doesn't look that way, it is certainly not 
our intention. There have been a fair number of people involved, and 
we've done our best to ensure that the development is guided by the 
potential users, such as autofs, AFS and systemd. If there are others 
out there with use cases, and particularly so if the use case is a GUI 
file manager type application who'd like to get involved, then please 
do. We definitely want to see involvement from end users, since there is 
no point in spending a large effort creating something that is then 
never used. As you pointed that out above, this kind of application was 
very much part of the original motivation, but we had started with the 
other users since there were clearly defined use cases that could 
demonstrate significant performance gains in those cases.

So hopefully that helps to give a bit more background about where we are 
and how we got here. Where we go next will no doubt depend on the 
outcome of the current discussions, and any guidance you can give around 
how we should have better approached this would be very helpful at this 
stage,

Steve.


[1] https://lwn.net/Articles/718803/

[2] https://lwn.net/Articles/718638/

[3] https://lwn.net/Articles/753473/

[4] https://lkml.org/lkml/2020/6/2/1182

[5] 
https://lore.kernel.org/linux-fsdevel/8eb2e52f1cbdbb8bcf5c5205a53bdc9aaa11a071.camel@themaw.net/



^ permalink raw reply

* Re: file metadata via fs API
From: Linus Torvalds @ 2020-08-17 17:15 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: David Howells, Miklos Szeredi, linux-fsdevel, Al Viro, Karel Zak,
	Jeff Layton, Miklos Szeredi, Nicolas Dichtel, Christian Brauner,
	Lennart Poettering, Linux API, Ian Kent, LSM,
	Linux Kernel Mailing List
In-Reply-To: <94f907f0-996e-0456-db8a-7823e2ef3d3f@redhat.com>

On Mon, Aug 17, 2020 at 4:33 AM Steven Whitehouse <swhiteho@redhat.com> wrote:
>
> That said, the overall aim here is to solve the problem and if there are
> better solutions available then I'm sure that everyone is very open to
> those. I agree very much that monitoring at kHz frequencies is not
> useful, but at the same time, there are cases which can generate large
> amounts of mount changes in a very short time period.

So the thing is, I absolutely believe in the kernel _notifying_ about
changes so that people don't need to poll. It's why I did merge the
notification queues, although I wanted to make sure that those worked.

> You recently requested some details of real users for the notifications,
> and (I assumed) by extension fsinfo too.

No, fsinfo wasn't on the table there. To me, notifications are a
completely separate issue, because you *can* get the information from
existing sources (ie things like /proc/mounts etc), and notification
seemed to be the much more fundamental issue.

If you poll for changes, parsing something like /proc/mounts is
obviously very heavy indeed. I don't find that particularly
controversial. Plus the notification queues had other uses, even if it
wasn't clear how many or who would use them.

But honestly, the actual fsinfo thing seems (a) overdesigned and (b)
broken. I've now had two different people say how they want to use it
to figure out whether a filesystem supports certain things that aren't
even per-filesystem things in the first place.

And this feature is clearly controversial, with actual discussion about it.

And I find the whole thing confusing and over-engineered. If this was
a better statfs(), that would be one thing. But it is designed to be
this monstoer thing that does many different things, and I find it
distasteful.  Yes, you can query "extended statfs" kind of data with
it and get the per-file attributes. I find it really annoying how the
vfs layer calls to the filesystems, that then call back to the vfs
layer to fill things in, but I guess we have that nasty pattern from
stat() already. I'd rather have the VFS layer just fill in all the
default values and the stuff it already knows about, and then maybe
have the filesystem callback fill in the ones the vfs *doesn't* know
about, but whatever.

But then you can *also* query odd things like mounts that aren't even
visible, and the topology, and completely random error state.

So it has this very complex "random structures of random things"
implementation. It's a huge sign of over-design and "I don't know what
the hell I want to expose, so I'll make this generic thing that can
expose anything, and then I start adding random fields".

Some things are per-file, some things are per-mount, and some things
are per-namespace and cross mount boundaries.

And honestly, the "random binary interfaces" just turns me off a lot.

A simple and straightforward struct? Sure. But this random "whatever
goes" thing? No.

                 Linus

^ permalink raw reply

* Re: [PATCH 2/3] IMA: add policy to support measuring critical data from kernel components
From: Mimi Zohar @ 2020-08-17 20:43 UTC (permalink / raw)
  To: Tushar Sugandhi, stephen.smalley.work, casey, gmazyland
  Cc: tyhicks, sashal, jmorris, linux-integrity, selinux,
	linux-security-module, linux-kernel, dm-devel, nramas
In-Reply-To: <20200812193102.18636-3-tusharsu@linux.microsoft.com>

On Wed, 2020-08-12 at 12:31 -0700, Tushar Sugandhi wrote:
> There would be several candidate kernel components suitable for IMA
> measurement. Not all of them would be enlightened for IMA measurement.
> Also, system administrators may not want to measure data for all of
> them, even when they are enlightened for IMA measurements. An IMA policy
> specific to various kernel components is needed to measure their
> respective critical data.
> 
> Add a new IMA policy CRITICAL_DATA+data_sources to support measuring
> various critical kernel components. This policy would enable the
> system administrators to limit the measurement to the components,
> if the components are enlightened for IMA measurement.

"enlightened", really?  Please find a different term, maybe something
like "supported".

Before posting a patch set, please look at the patches line by line,
like anyone reviewing the code needs to do.  Please minimize code
change.   Unnecessary formatting changes are unacceptible.   For
example, like the "#define", below, or in 3/3 the
"process_buffer_measurement()" change from void to int.

scripts/Lindent isn't as prevalent as it used to be, but it's still
included in Documentation/process/coding-style.rst.  Use it as a guide.

Mimi


^ permalink raw reply

* Re: [PATCH v6 1/3] Add a new LSM-supporting anonymous inode interface
From: Lokesh Gidra @ 2020-08-17 21:10 UTC (permalink / raw)
  To: Al Viro
  Cc: James Morris, Stephen Smalley, Casey Schaufler, Eric Biggers,
	Serge E. Hallyn, Paul Moore, Eric Paris, Daniel Colascione,
	Kees Cook, Eric W. Biederman, KP Singh, David Howells,
	Thomas Cedeno, Anders Roxell, Sami Tolvanen, Matthew Garrett,
	Aaron Goidel, Randy Dunlap, Joel Fernandes (Google), YueHaibing,
	Christian Brauner, Alexei Starovoitov, Alexey Budankov,
	Adrian Reber, Aleksa Sarai, Linux FS Devel, linux-kernel,
	LSM List, SElinux list, Kalesh Singh, Calin Juravle,
	Suren Baghdasaryan, Nick Kralevich, Jeffrey Vander Stoep,
	kernel-team, Daniel Colascione
In-Reply-To: <20200807230225.GW1236603@ZenIV.linux.org.uk>

On Fri, Aug 7, 2020 at 4:02 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Aug 07, 2020 at 03:49:39PM -0700, Lokesh Gidra wrote:
>
> > The new functions accept an optional context_inode parameter that
> > callers can use to provide additional contextual information to
> > security modules, e.g., indicating that one anonymous struct file is a
> > logical child of another, allowing a security model to propagate
> > security information from one to the other.
>
> What the hell is "logical child" and what are the lifetime rules implied
> by that relationship?

context_inode provides the security context required by the security
modules for granting/denying permission to create an anon inode of the
same type.

In case of userfaultfd, the relationship between the context_inode and
the created inode is described as that of ‘logical child’ because the
context_inode (userfaultfd inode of the parent process) provides the
security context required for creation of child process’ userfaultfd
inode. But there is no relationship beyond this point. Therefore, no
reference to context_inode is held anywhere.

^ permalink raw reply

* Re: [PATCH 2/2] SELinux: Measure state and hash of policy using IMA
From: Mimi Zohar @ 2020-08-17 21:31 UTC (permalink / raw)
  To: Stephen Smalley, Lakshmi Ramasubramanian
  Cc: Casey Schaufler, Tyler Hicks, tusharsu, sashal, James Morris,
	linux-integrity, SElinux list, LSM List, linux-kernel, paul Moore
In-Reply-To: <CAEjxPJ7uWee5jjALtQ3azMvKRMk8pxFiYByWmYVhjgJiMNZ8ww@mail.gmail.com>

On Thu, 2020-08-13 at 14:13 -0400, Stephen Smalley wrote:
> On Thu, Aug 13, 2020 at 2:03 PM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
> > On 8/13/20 10:58 AM, Stephen Smalley wrote:
> > > On Thu, Aug 13, 2020 at 1:52 PM Lakshmi Ramasubramanian
> > > <nramas@linux.microsoft.com> wrote:
> > > > On 8/13/20 10:42 AM, Stephen Smalley wrote:
> > > > 
> > > > > > diff --git a/security/selinux/measure.c b/security/selinux/measure.c
> > > > > > new file mode 100644
> > > > > > index 000000000000..f21b7de4e2ae
> > > > > > --- /dev/null
> > > > > > +++ b/security/selinux/measure.c
> > > > > > @@ -0,0 +1,204 @@
> > > > > > +static int selinux_hash_buffer(void *buf, size_t buf_len,
> > > > > > +                   void **buf_hash, int *buf_hash_len)
> > > > > > +{
> > > > > > +    struct crypto_shash *tfm;
> > > > > > +    struct shash_desc *desc = NULL;
> > > > > > +    void *digest = NULL;
> > > > > > +    int desc_size;
> > > > > > +    int digest_size;
> > > > > > +    int ret = 0;
> > > > > > +
> > > > > > +    tfm = crypto_alloc_shash("sha256", 0, 0);
> > > > > > +    if (IS_ERR(tfm))
> > > > > > +        return PTR_ERR(tfm);
> > > > > Can we make the algorithm selectable via kernel parameter and/or writing
> > > > > to a new selinuxfs node?
> > > > 
> > > > I can add a kernel parameter to select this hash algorithm.
> > > 
> > > Also can we provide a Kconfig option for the default value like IMA does?
> > > 
> > 
> > Would we need both - Kconfig and kernel param?
> > 
> > The other option is to provide an IMA function to return the current
> > hash algorithm used for measurement. That way a consistent hash
> > algorithm can be employed by both IMA and the callers. Would that be better?
> 
> This is why I preferred just passing the serialized policy buffer to
> IMA and letting it handle the hashing.  But apparently that approach
> wouldn't fly.  IMA appears to support both a Kconfig option for
> selecting a default algorithm and a kernel parameter for overriding
> it.  I assume the idea is that the distros can pick a reasonable
> default and then the end users can override that if they have specific
> requirements.  I'd want the same for SELinux.  If IMA is willing to
> export its hash algorithm to external components, then I'm willing to
> reuse that but not sure if that's a layering violation.

With the new ima_measure_critical_data() hook, I agree with you and
Casey it doesn't make sense for each caller to have to write their own
function.  Casey suggested exporting IMA's hash function or defining a
new common hash function.   There's nothing specific to IMA.  Should
the common hash function be prefixed with "security_"?

Like when we add a new security hook call, the new LSM call is separate
from any other change.   Please break up this patch with the SELinux
specific pieces separated from the ima_measure_critical_data() call as
much as possible.

thanks,

Mimi


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox